-
Thursday, October 4, 2012:
Speaker:
Liu Yang
(CMU).
Topic:
Active Property Testing
One motivation for property testing of boolean functions is the idea that
testing can provide a fast preprocessing step before
learning. However, in most machine learning applications, it is not possible
to request for labels of arbitrary examples constructed by an algorithm.
Instead, the dominant query paradigm
in applied machine learning, called active learning, is one where the algorithm
may query for labels, but only on points in a given (polynomial-sized)
unlabeled sample, drawn from some underlying distribution D.
In this work, we bring this well-studied model to the domain of testing.
We develop both general results for this active testing model as
well as efficient testing algorithms for several important properties for learning,
demonstrating that testing can still yield substantial benefits in
this restricted setting. For example, we show that testing unions of
d intervals can be done with O(1) label requests in our setting,
whereas it is known to require Omega(d) labeled examples for
learning (and Omega(sqrt{d}) for passive testing [KR00] where
the algorithm must pay for every example drawn from D). In
fact, our results for testing unions of intervals also yield
improvements on prior work in both the classic query model (where any
point in the domain can be queried) and the passive testing model as
well. For the problem of testing linear separators in R^n over the
Gaussian distribution, we show that both active and passive testing
can be done with O(sqrt{n}) queries, substantially less than the
Omega(n) needed for learning, with near-matching lower bounds. We
also present a general combination result in this model for building
testable properties out of others, which we then use to provide
testers for a number of assumptions used in semi-supervised learning.
In addition to the above results, we also develop a general
notion of the testing dimension of a given property with respect
to a given distribution, that we show characterizes (up to
constant factors) the intrinsic number of label requests needed to
test that property. We develop such notions for both the active
and passive testing models. We then use these dimensions to prove a
number of lower bounds, including for linear separators and the class
of dictator functions.
Joint work with Nina Balcan, Eric Blais and Avrim Blum
-
Thursday, November 29, 2012:
Speaker:
Dan Feldman
(MIT).
Topic:
Learning patterns in Big data from small data using core-sets
When we need to solve an optimization problem we usually use the best available
algorithm/software or try to improve it. In recent years we have started
exploring a different approach: instead of improving the algorithm, reduce the
input data and run the existing algorithm on the reduced data to obtain the
desired output much faster on a streaming input, using a manageable amount of
memory, and in parallel (say, using Hadoop, cloud service, or GPUs).
A core-set for a given problem is a semantic compression of its input, in the
sense that a solution for the problem with the (small) coreset as input yields
a provable approximate solution to the problem with the original (Big) data.
In this talk I will describe how we applied this magical paradigm to obtain
algorithmic achievements with performance guarantees in iDiary: a system that
combines robotics, sensor networks, computer vision, differential privacy, and
text mining. It turns large signals collected from smart-phones or robots
sensors into textual descriptions of the trajectories. The system features a
user interface similar to Google Search that allows users to type text queries
on their activities (e.g., "Where did I buy books?") and receive textual
-
Thursday, December 6, 2012:
Speaker:
Madhav Jha
(Penn State).
Topic:
Testing and Reconstruction of Lipschitz Functions with Applications to
Data Privacy.
A function f : D → R is Lipschitz if d_R(f(x), f(y)) ≤ d_D(x, y) for
all x, y in D, where d_R and d_D denote the distance metrics on the
range and domain of f , respectively. We initiate the study of testing
and local reconstruction of the Lipschitz property of functions. A
property tester has to distinguish functions with the property (in
this case, Lipschitz) from functions that differ from every function
with the property on many values. A local filter reconstructs a
desired property (in this case, Lipschitz) in the following sense:
given an arbitrary function f and a query x, it returns g(x), where
the resulting function g satisfies the property, changing f only when
necessary. If f has the property, g must be equal to f .
We design efficient testers and local reconstructors for functions
over domains of the form {1, . . . , n}^d , equipped with ell_1
distance, and give corresponding impossibility results. The algorithms
we design have applications to program analysis and data privacy. The
application to privacy is based on the fact that a function f of
entries in a database of sensitive information can be released with
noise of magnitude proportional to a Lipschitz constant of f, while
preserving the privacy of individuals whose data is stored in the
database (Dwork, McSherry, Nissim and Smith, TCC 2006). We give a
differentially private mechanism, based on local filters, for
releasing a function f when a purported Lipschitz constant of f is
provided by a distrusted client.
Based on a FOCS 2011 paper with S. Raskhodnikova and RANDOM 2012
papers with P. Awasthi, M. Molinaro and S. Raskhodnikova.
-
Wednesday, December 12, 2012:
Speaker:
Grigory Yaroslavtsev
(Penn State).
Topic:
Beating the Direct Sum Theorem in Communication Complexity
A direct sum theorem for two parties and a function f states that the communication cost of solving k copies of f simultaneously with error probability 1/3 is at least k * R_{1/3}(f), where R_{1/3}(f) is the communication required to solve a single copy of f with error probability 1/3. We improve this for a natural family of functions f, showing that the 1-way communication required to solve k copies of f simultaneously with error probability 1/3 is (k R_{1/k}(f)). Since R_{1/k}(f) may be as large as R_{1/3}(f) * log k, we asymptotically beat the standard direct sum bound for such functions, showing that the trivial upper bound of solving each of the k copies of f with probability 1 – O(1/k) and taking a union bound is optimal.
Our results imply optimal communication/space lower bounds for several sketching problems in a setting when the algorithm should be correct on a sequence of k queries.
-
Thursday, January 3, 2013:
Speaker:
Morteza Zadimoghaddam
(MIT).
Topic:
Learning Disjunctions: Near-Optimal Trade-off Between Mistakes and "I Don't Know"s
We develop polynomial-time online algorithms for learning
disjunctions. In this model, we are given an online adversarial
sequence of inputs for an unknown disjunction function of the form
f(x_1, x_2, ..., x_n) = OR_{i \in S} x_i , and for each such input, we
must guess "true", "false", or "I don't know". On the algorithm side,
we show how make at most \epsilon n mistakes while answering "I don't
know" at most (1/\epsilon)^{2^{O(1/\epsilon)}} n times. Furthermore,
we show how to make O(nlog(log(n))/log(n)) mistakes while answering "I
don't know" O(n^2 log(log(n))) times. We also show that any algorithm
making o(n/log(n)) mistakes must answer "I don't know" a
superpolynomial number of times.
This is a joint work with Erik Demaine.
-
Wednesday, February 6, 2013:
Speaker:
Raghu Meka
(IAS).
Topic:
Beating the Union Bound via Geometric Techniques
The union bound is one of the mainstays of the probabilistic
method and analysis of randomized algorithms. However, this simplistic
approach does not give the full picture for many important cases, with
Lovasz local lemma being a particularly salient example. In this talk I
will discuss two recent results that go beyond the union bound via
geometric techniques:
1) Optimal size, explicit eps-nets for computing Gaussian integrals.
Besides being interesting on its own, our result has applications to
computing the supremum of Gaussian processes and cover times of graphs.
2) A new elementary and constructive proof of Spencer's celebrated six
standard deviations suffice theorem. We introduce a new 'entropic'
rounding technique that can be used in a variety of algorithmic settings and
appears to have much broader potential.
For both problems, our methods critically exploit certain geometric and
symmetry properties of the Gaussian space.
-
Wednesday, February 20, 2013:
Speaker:
Andrew V. Goldberg
(Microsoft Research -- Silicon Valley).
Topic:
Highway Dimension: From Practice to Theory and Back
Computing driving directions has motivated many shortest path heuristics that answer queries on continental-scale networks, with tens of millions of intersections, in real time. The underlying algorithms are highly practical and intuitive, but until recently there was no theoretical explanation of their practical performance. We review some of these algorithms and give the first theoretical analysis for them on a non-trivial class of networks.
For our analysis, we introduce the notion of highway dimension, which is a strengtherning of the doubling dimension. The analysis gives good bounds for networks with low highway dimension. Our analysis explores an unexpected relationship to VC-dimension, which leads to better algorithms.
We also show that the hub labeling algorithm achieves a better theoretical time bound. This motivates a heuristic implementation of this algorithm. Our experimenters show that the implementation outperforms the fastest previous codes, validating the theoretical prediction.
Joint work with Ittai Abraham, Daniel Delling, Amos Fiat, and Renato Werneck.
-
Friday, March 8, 2013:
Speaker:
Sidharth (Sid) Jaggi
(The Chinese University of Hong Kong).
Topic:
Robust sparse recovery and applications: order-optimal measurements and complexity
Sparse recovery problems are usually in the setting in which a vector x with ambient dimension n has only k "significant" entries. This vector x is "compressively measured" (with possibly "noisy" measurements) as y, where y has just m ≪ n components, and the goal is to reconstruct (a good approximation to) x.
We consider three sparse recovery problems:
- Compressive Sensing: A linear problem in which y is a (possibly noisy) linear transform of x, a vector in R^n.
- Group Testing: A non-linear problem in which y is a binary vector comprising of (possibly noisy) disjunctive measurements of a binary x vector.
- Network tomography: A linear problem in which x, a vector in R^n, denotes network parameters (such as edge/node delays) to be estimated via constrained, end-to-end measurements (such as path delays).
In each of these cases we present sparse recovery algorithms that have good reconstruction performance, and have information-theoretically order-optimal decoding complexity and number of measurements (O(k) measurements and decoding complexity for compressive sensing and network tomography, and O(k log(n)) measurements and decoding complexity for group testing.)
-
Thursday, March 21, 2013, at 2pm:
Speaker:
Vijay Vazirani
(Georgia Tech).
Topic:
Non-separable utilities are as easy as separable ones -- if they are Leontief-free
Conventional wisdom has it that computing an equilibrium for a
market under non-separable utilities is difficult -- even if
restricted to the piecewise-linear, concave (PLC) case, for which
irrationality sets in and no fast algorithms are known. In contrast,
recent results have shown that markets under separable PLC utilities
have rational equilibria and admit practical algorithms.
Using the powerful machinery of LCP formulations and complementary
pivot algorithms, we show that a subclass of non-separable PLC, which
we call Leontief-free PLC, is just as easy as separable PLC.
Furthermore, adding even one Leontief-type constraint renders it
difficult.
The same applies for production, where by separable PLC production we
mean using one raw good to produce one finished good in a
piecewise-linear, concave manner.
Based on joint work with Jugal Garg and Ruta Mehta.
-
Wednesday, April 10, 2013:
Speaker:
Ping Li
(Cornell).
Topic:
Efficient Compressed Sensing with L0 Projections
Many applications concern sparse signals, for example, detecting anomalies from the differences between consecutive images taken by surveillance cameras. In general, anomaly events are sparse. This talk focuses on the problem of recovering a K-sparse signal in N dimensions (coordinates). Classical theories in compressed sensing say the required number of measurement is M = O(K log N). In our most recent work on L0 projections, we show that an idealized algorithm needs about M = 5K measurements, regardless of N. In particular, 3 measurements suffice when K = 2 nonzeros. Practically, our method is very fast, accurate, and very robust against measurement noises. Even when there are no sufficient measurements, the algorithm can still accurately reconstruct a significant portion of the nonzero coordinates, without catastrophic failures (unlike popular methods such as linear programming). This is joint work with Cun-Hui Zhang at Rutgers University.
-
Wednesday, April 17, 2013:
Speaker:
Arnab Bhattacharyya
(DIMACS/Rutgers).
Topic:
Every locally characterized affine-invariant property is testable
Let F = F_p for any fixed prime p >= 2. An affine-invariant property is a property of functions on F^n that is closed under taking affine transformations of the domain. We prove that all affine-invariant property having local characterizations are testable. In fact, we show a proximity-oblivious test for any such property P, meaning that there is a test that, given an input function f, makes a constant number of queries to f, always accepts if f satisfies P, and rejects with positive probability if the distance between f and P is nonzero. More generally, we show that any affine-invariant property that is closed under taking restrictions to subspaces and has bounded complexity is testable.
We also prove that any property that can be described as the property of decomposing into a known structure of low-degree polynomials is locally characterized and is, hence, testable. For example, whether a function is a product of two degree-d polynomials, whether a function splits into a product of d linear polynomials, and whether a function has low rank are all examples of degree-structural properties and are therefore locally characterized.
Our results depend on a new Gowers inverse theorem by Tao and Ziegler for low characteristic fields that decomposes any polynomial with large Gowers norm into a function of low-degree non-classical polynomials. We establish a new equidistribution result for high rank non-classical polynomials that drives the proofs of both the testability results and the local characterization of degree-structural properties.
Joint work with Eldar Fischer, Hamed Hatami, Pooya Hatami, and Shachar Lovett.
-
Wednesday, April 24, 2013 at 3pm:
Speaker:
Michael Forbes
(MIT).
Topic:
Explicit Noether Normalization for Simultaneous Conjugation via Polynomial Identity Testing
Polynomial identity testing asks whether a given algebraic circuit C computes the identically zero polynomial. Randomized algorithms are known for this problem, via the Schwartz-Zippel lemma, and in recent years there has been much effort to find a deterministic algorithm, even for restricted types of circuits C. Such algorithms would have several algorithmic applications, as well as implications for circuit lower bounds.
Mulmuley recently observed that the "Noether Normalization Lemma" of commutative-algebra/algebraic-geometry can be "derandomized" assuming a deterministic algorithm for polynomial identity testing, and argued that this is further evidence that constructing such algorithms are hard. He also conjectured that the specific case of Noether Normalization (for "matrices under simultaneous conjugation") would still be hard. In this work, we derandomize this specific case of Noether Normalization, using a recent polynomial identity testing algorithm of the authors for the class of read-once oblivious algebraic branching programs.
Joint work with Amir Shpilka.
-
Wednesday, May 8, 2013:
Speaker:
Zeyuan Allen Zhu
(MIT).
Topic:
A Simple, Combinatorial and Nearly-Linear-Time Algorithm for Solving Ax=b When A is Symmetric Diagonally Dominant
Fast algorithms for solving linear systems Ax=b when A is symmetric diagonally dominant (SDD) or graph Laplacian have found broad applications across both the theory and practice of computer science, playing a foundational role in scientific computing, machine learning, random processes, image processing, network analysis, and computational biology, among others. More recently, they have emerged as a powerful tool in the design of graph algorithms, enabling researchers to break longstanding algorithmic barriers for a wide range of fundamental graph problems, including finding maximum flows and multicommodity flows, generating random spanning trees, sparsifying graphs, routing in distributed systems, and finding sparse cuts and balanced separators.
Finding fast solvers for these systems has been an active area of research, culminating in algorithms that solve them in nearly-linear time. These solvers are quite complex, and developing them required multiple fundamental innovations in spectral and combinatorial graph theory, graph algorithms, and computational linear algebra, which were used to construct and analyze an intricate recursively preconditioned iterative solver.
In this talk, I will give a very simple combinatorial algorithm that solves SDD systems in nearly-linear time. It uses very little of the machinery that previously appeared to be necessary for a such an algorithm. It does not require recursive preconditioning, or even the Chebyshev Method or Conjugate Gradient. After constructing a "nice" spanning tree of a graph associated to the linear system, the entire algorithm consists of the repeated application of a simple (non-recursive) update rule, which it implements using a lightweight data structure. The algorithm is numerically stable and can be implemented without the increased bit-precision required by previous solvers. As such, the algorithm presented has the fastest known running time under the standard unit-cost RAM model.
This is joint work with Jonathan Kelner, Lorenzo Orecchia and Aaron Sidford.
-
Wednesday, May 15, 2013:
Speaker:
Or Meir
(Stanford).
Topic:
IP = PSPACE using error correcting codes
The IP theorem, which asserts that IP = PSPACE (Lund et. al., and
Shamir, in J. ACM 39(4)), is one of the major achievements of
complexity theory. The known proofs of the theorem are based on the
arithmetization technique, which transforms a quantified Boolean
formula into a related polynomial. The intuition that underlies the
use of polynomials is commonly explained by the fact that polynomials
constitute good error correcting codes. However, the known proofs seem
tailored to the use of polynomials, and do not generalize to arbitrary
error correcting codes.
In this work, we show that the IP theorem can be proved by using
general error correcting codes. We believe that this establishes a
rigorous basis for the aforementioned intuition, and sheds further
light on the IP theorem.
-
Wednesday, May 22, 2013:
Speaker:
Vladimir Braverman
(Johns Hopkins University).
Topic:
Approximating Large Frequency Moments with Pick-and-Drop Sampling
Given data stream $D = \{p_1,p_2,...,p_m\}$ of size $m$ of numbers from
$\{1,..., n\}$, the frequency of $i$ is defined as $f_i = |\{j: p_j =
i\}|$. The $k$-th \emph{frequency moment} of $D$ is defined as $F_k =
\sum_{i=1}^n f_i^k$. We consider the problem of approximating frequency
moments in insertion-only streams for $k\ge 3$. For any constant $c$ we
show an $O(n^{1-2/k}\log(n)\log^{(c)}(n))$ upper bound on the space
complexity of the problem. Here $\log^{(c)}(n)$ is the iterative $\log$
function. To simplify the presentation, we make the following assumptions:
$n$ and $m$ are polynomially far; approximation error $\epsilon$ and
parameter $k$ are constants. We observe a natural bijection between
streams and special matrices. Our main technical contribution is a
non-uniform sampling method on matrices. We call our method a
\emph{pick-and-drop sampling}; it samples a heavy element (i.e., element
$i$ with frequency $\Omega(F_k)$) with probability $\Omega(1/n^{1-2/k})$
and gives approximation $\tilde{f_i} \ge (1-\epsilon)f_i$. In addition,
the estimations never exceed the real values, that is $ \tilde{f_j} \le
f_j$ for all $j$. As a result, we reduce the space complexity of finding a
heavy element to $O(n^{1-2/k}\log(n))$ bits. We apply our method of
recursive sketches and resolve the problem with
$O(n^{1-2/k}\log(n)\log^{(c)}(n))$ bits.
-
Friday, July 12, 2013:
Speaker:
Atri Rudra
(University at Buffalo).
Topic:
One algorithm to rule them all: One join query at a time
We present a recent algorithm (PODS 2012) that is the
first provably optimal (worst-case) algorithm to compute database joins.
As a special case, we show that this join algorithm implies (i) The
first algorithmic versions of some well-known geometric inequalities due
to Loomis and Whitney (and their generalizations by Bollobas and
Thomason); (ii) Algorithmically list recoverable codes that work with
parameters that no known algorithmic list recovery result work with
(e.g. those based on the Reed-Solomon codes) and an application of this
result in designing sublinear time decodable compressed sensing schemes;
(iii) Worst-case optimal algorithm to list all occurrences of any fixed
hypergraph H in a given large hypergraph G.
We believe that this algorithm should find many more applications. (If
time permits, I'll also mention some followup work on instance optimal
join algorithms.)
This talk will focus on (i) and (ii) and is based on joint works with
Gilbert, Ngo, Nguyen, Porat, Re and Strauss.