Mihai Pătraşcu
Submitted.
We show that a large fraction of the data-structure lower bounds known
today in fact follow by reduction from the communication complexity of
lopsided (asymmetric) set disjointness! This includes lower bounds for:
- high-dimensional problems, where the goal is to show large space
lower bounds.
- constant-dimensional geometric problems, where the goal is to
bound the query time for space O(n ·polylgn).
- dynamic problems, where we are looking for a trade-off between
query and update time. (In this case, our bounds are slightly weaker
than the originals, losing a lglgn factor.)
Our reductions also imply the following new results:
- an W(lgn / lglgn) bound for 4-dimensional range
reporting, given space O(n·polylgn). This is very timely,
since a recent result [Nekrich, SoCG'07] solved 3D reporting in
near-constant time, raising the prospect that higher dimensions
could also be easy.
- a tight space lower bound for the partial match problem, for
constant query time.
- the first lower bound for reachability oracles.
|
Mihai Pătraşcu
Submitted.
We can represent an array of n values from {0,1,2} using é n log 2 3 ù bits (arithmetic coding), but then we cannot
retrieve a single element efficiently. Instead, we can encode every
block of t elements using étlog 2 3 ù bits, and bound
the retrieval time by t. This gives a linear trade-off between the
redundancy of the representation and the query time.
In fact, this type of linear trade-off is ubiquitous in known succinct
data structures, and in data compression. The folk wisdom is that if
we want to waste one bit per block, the encoding is so constrained
that it cannot help the query in any way. Thus, the only thing a query
can do is to read the entire block and unpack it.
We break this limitation and show how to use recursion to improve
redundancy. It turns out that if a block is encoded with two (!) bits
of redundancy, we can decode a single element, and answer many other
interesting queries, in time logarithmic in the block size.
Our technique allows us to revisit classic problems in succinct data
structures, and give surprising new upper bounds. We also construct a
locally-decodable version of arithmetic coding.
|
Indyk [FOCS'98] has shown that approximate nearest-neighbor search
under the l¥ metric admits the following interesting
trade-off: for any s > 1, one can build a decision tree of size O(n s)
to answer nearest neighbor queries with approximation O(log s logd).
We now show the following communication complexity lower bound: to
find a nearest neighbor with approximation O(log s logd),
either the querier sends W(s logn) bits, or the database
needs to send n W(1) bits. This implies a decision-tree
lower bound matching Indyk's upper bound. In the stronger cell-probe
model, it implies that any data structure with constant query
complexity requires space n W(s). Thus, it seems approximation
Q(loglogd) is inherent for polynomial-space data structures.
Technically, our proof breaks away from the richness technique,
which was used to establish lower bounds for the l1 or l2
metrics. In fact, we show that any lower bound based on product
distribution (like the richness technique) cannot show hardness for
approximation 3 or higher.
|
Dynamic connectivity is an extremely well-studied problem, but so
far the most compelling progress has been confined to the
edge-update model: maintain an understanding of connectivity in an
undirected graph, subject to edge insertions and deletions. In this
paper, we study two equally natural and fundamental problems which are
even more challenging.
Subgraph connectivity asks to maintain an understanding of
connectivity under node updates: updates can turn nodes on and off,
and queries refer to the subgraph induced by on nodes. (For
instance, this is closer to applications in networks of routers,
where node faults may occur.)
We describe a data structure supporting vertex updates in
[O\tilde](m 2/3) amortized time, where m denotes the number of
edges in the graph. This is a major improvement over the previous result
[Chan, STOC'02], which required fast matrix multiplication and had an
update time of O(m 0.94). The new data structure is also
surprisingly simple.
Geometric connectivity asks to maintain a dynamic set of n
geometric objects, and query connectivity in their intersection
graph. (For instance, the intersection graph of balls describes
connectivity in a network of sensors with bounded transmission radius.)
Previously, nontrivial fully dynamic results were known only for
special cases like axis-parallel line segments and rectangles. We not
only provide similarly improved update times, [O\tilde](n 2/3), for
these special cases, but show how to obtain sublinear update bounds
for virtually all families of geometric objects which allow
sublinear-time range queries. In particular, we obtain the
first sublinear update time for arbitrary 2D line segments:
[O\tilde](n 9/10); for d-dimensional simplices:
[O\tilde](n 1-\frac1d(2d+1)); and for d-dimensional balls:
[O\tilde](n 1-\frac1(d+1)(2d+3)).
|
We present the first sublinear-time algorithms for computing order statistics in the Farey sequence
and for the related problem of ranking.
Our algorithms achieve a running times of nearly O(n 2/3), which is
a significant improvement over the previous algorithms taking time O(n).
We also initiate the study of a more general problem: counting
primitive lattice points inside planar shapes. For rational polygons
containing the origin, we obtain a running time proportional to
D 6/7, where D is the diameter of the polygon.
|
|
We show that any algorithm computing the median of a stream presented in
random order, using polylog(n) space, requires an optimal
W(lglgn) passes. This resolves an open question from
the seminal paper on streaming by Munro and Paterson [FOCS 1978].
|
Mihai Pătraşcu
and Mikkel Thorup
48th IEEE Symposium on Foundations of Computer Science ( FOCS 2007).
Understanding how a single edge deletion can affect the connectivity
of a graph amounts to finding the graph bridges. But when faced with
d > 1 deletions, can we establish as easily how the connectivity
changes? When planning for an emergency, we want to understand the
structure of our network ahead of time, and respond swiftly when an
emergency actually happens.
We describe a linear-space representation of graphs which enables us
to determine how a batch of edge updates can impact the graph. Given a
set of d edge updates, in time O(d polylg(n)) we can obtain
the number of connected components, the size of each component, and a fast
oracle for answering connectivity queries in the updated graph. The
initial representation is polynomial-time constructible.
|
It is well known that n integers in the range [1,n c] can be
sorted in O(n) time in the RAM model using radix sorting. More
generally, integers in any range [1,U] can be sorted in
O(n Ö{lglgn}) time. However, these algorithms use
O(n) words of extra memory. Is this necessary?
We present a simple, stable, integer sorting algorithm for words of
size O(lgn), which works in O(n) time and uses only O(1)
words of extra memory on a RAM. This is the integer sorting case
most useful in practice. We extend this result with same bounds to
the case when the keys are read-only, which is of theoretical
interest. Another interesting question is the case of arbitrary U.
Here we present a black-box transformation from any RAM sorting
algorithm to a sorting algorithm which uses only O(1) extra space
and has the same running time. This settles the complexity of
in-place sorting in terms of the complexity of sorting.
|
Mihai Pătraşcu
39th ACM Symposium on Theory of Computing ( STOC 2007).
Proving lower bounds for range queries has been an active topic of
research since the late 70s, but so far nearly all results have been
limited to the (rather restrictive) semigroup model. We consider one
of the most basic range problem, orthogonal range counting in two
dimensions, and show almost optimal bounds in the group model and
the (holy grail) cell-probe model.
Specifically, we show the following bounds, which were known in the
semigroup model, but are major improvements in the more general
models:
- In the group and cell-probe models, a static data structure of
size nlgO(1) n requires W(lgn / lglgn) time per
query. This is an exponential improvement over previous bounds,
and matches known upper bounds.
- In the group model, a dynamic data structure takes time
W( (\fraclgnlglgn)2 ) per operation.
This is close to the O(lg2 n) upper bound, whereas the
previous lower bound was W(lgn).
Proving such (static and dynamic) bounds in the group model has been
regarded as an important challenge at least since [Fredman, JACM
1982] and [Chazelle, FOCS 1986].
|
Timothy Chan
and Mihai Pătraşcu
39th ACM Symposium on Theory of Computing ( STOC 2007).
We reexamine fundamental problems from computational geometry in the
word RAM model, where input coordinates are integers that fit in a
machine word. We develop a new algorithm for offline point location,
a two-dimensional analog of sorting where one needs to order points
with respect to segments. This result implies, for example, that the
Voronoi diagram of n points in the plane can be constructed in
(randomized) time n·2 O(Ö{lglgn}). Similar bounds
hold for numerous other geometric problems, such as
three-dimensional convex hulls, planar Euclidean minimum spanning
trees, line segment intersection, and triangulation of non-simple
polygons.
In FOCS'06, we developed a data structure for online point
location, which implied a bound of O(n \fraclgnlglgn) for
Voronoi diagrams and the other problems. Our current bounds are
dramatically better, and a convincing improvement over the classic
O(nlgn) algorithms. As in the field of integer sorting, the main
challenge is to find ways to manipulate information, while avoiding
the online problem (in that case, predecessor search).
|
Erik Demaine
and Mihai Pătraşcu
23rd ACM Symposium on Computational Geometry ( SoCG 2007).
The dynamic convex hull problem was recently solved in O(lgn) time
per operation, and this result is best possible in models of computation
with bounded branching (e.g., algebraic computation trees).
From a data structures point of view, however, such models are considered
unrealistic because they hide intrinsic notions of information in the input.
In the standard word-RAM and cell-probe models of computation,
we prove that the optimal query time for dynamic convex hulls is, in fact,
Q(\fraclgnlglgn), for polylogarithmic update time
(and word size).
Our lower bound is based on a reduction from the marked-ancestor problem,
and is one of the first data structural lower bounds for a nonorthogonal
geometric problem.
Our upper bounds follow a recent trend of attacking nonorthogonal geometric
problems from an information-theoretic perspective that has proved
central to advanced data structures. Interestingly, our upper bounds are
the first to successfully apply this perspective to dynamic geometric data
structures, and require substantially different ideas from previous work.
|
|
We consider the problem of detecting network faults. Our focus is
on detection schemes that send probes both proactively and
non-adaptively. Such schemes are particularly relevant to all-optical
networks, due to these networks' operational characteristics and
strict performance requirements. This fault diagnosis problem
motivates a new technical framework that we introduce: group testing
with graph-based constraints. Using this framework, we develop several
new probing schemes to detect network faults. The efficiency of our
schemes often depends on the network topology; in many cases we can
show that our schemes are optimal or near-optimal by providing tight
lower bounds.
|
Mihai Pătraşcu
and Mikkel Thorup
18th ACM/SIAM Symposium on Discrete Algorithms ( SODA 2007).
Recently [STOC'06], we presented a new technique for proving
cell-probe lower bounds for static data structures with deterministic
queries. This was the first technique which could prove a bound higher
than communication complexity, and it gave the first separation between
data structures with linear and polynomial space. In this paper, we
extend the technique to randomized query algorithms.
Our main application is the problem of searching predecessors in
a static set of n integers, each contained in a w-bit word. Our
trade-off lower bounds are tight for any combination of parameters.
For small space, i.e. n 1+o(1), proving such lower bounds was
inherently impossible through known techniques. An interesting new
consequence is that for near linear space, the classic van Emde Boas
search time of O(lgw) cannot be improved, even if we allow
randomization. This is a separation from polynomial space, since Beame
and Fich [STOC'02] give a predecessor search time of O(lgw / lglg w) using quadratic space.
With a novel reduction, we show a tight W(lglgn)
lower bound for 2-dimensional range queries, even in rank space where
no superconstant lower bound was known. We also improve the best lower
bound for the approximate nearest neighbor problem, when small space
is available.
|
Timothy Chan
and Mihai Pătraşcu
SIAM Journal on Computing ( SICOMP), submitted. Special issue.
Based on two independent papers by each author, achieving similar results.
Given a planar subdivision whose coordinates are integers bounded by
U £ 2 w, we present a linear-space data structure that can answer
point location queries in O(min{lgn/lglgn, Ö{lgU/lglgU}}) time
on the unit-cost RAM with word size w. This is the
first result to beat the standard Q(lgn) bound for infinite
precision models.
As a consequence, we obtain the first o(nlgn) (randomized) algorithms
for many fundamental problems in computational geometry for
arbitrary integer input on the word RAM, including:
constructing the convex hull of a
three-dimensional point set, computing the Voronoi diagram or the Euclidean
minimum spanning
tree of a planar point set, triangulating a polygon with holes, and
finding intersections among a set of line segments.
Higher-dimensional extensions and applications are also discussed.
Though computational geometry with bounded precision input has been
investigated for a long time, improvements have been limited largely
to problems of an orthogonal flavor. Our results surpass this
long-standing limitation, answering, for example, a question of
Willard (SODA'92).
|
| » |
Mihai Pătraşcu
47th IEEE Symposium on Foundations of Computer Science ( FOCS 2006).
We consider the static planar point location problem in an arbitrary
polygonal subdivision given by n segments. We assume points come
from the [u] 2 grid, and consider algorithms for the RAM with
words of O(lgu) bits. We give the first solution to the problem
which can surpass the traditional query time of O(lg n). Specifically, we can obtain a query time of O( Ö{lgu}).
Though computational geometry on a grid has been investigated for a
long time (including for this problem), it is generally not known
how to make good use of a bounded universe in problems of such
nonorthogonal flavor. Our result shows this limitation can be
surpassed, at least for planar point location.
A result by Timothy Chan, appearing independently in FOCS'06, also
achieves sublogarithmic query times. Combining the two results, we
obtain the following bound. For any S ³ 2, the exists a data
structure using space O(n·S) which supports queries in time:
|
O |
æ è
|
min
| |
ì í
î
|
\fraclgnlglgn, | Ö
|
\fraclgulglgu
|
, \fraclgulgS |
ü ý
þ
|
ö ø
|
|
|
|
Mihai Pătraşcu
and Mikkel Thorup
SIAM Journal on Computing ( SICOMP), submitted. Special issue.
47th IEEE Symposium on Foundations of Computer Science ( FOCS 2006).
We convert cell-probe lower bounds for polynomial space into
stronger lower bound for near-linear space. Our technique applies to
any lower bound proved through the richness method. For example, it
applies to partial match, and to near-neighbor problems, either for
randomized exact search, or for deterministic approximate search
(which are thought to exhibit the curse of dimensionality). These
problems are motivated by search in large data bases, so near-linear
space is the most relevant regime.
Typically, richness has been used to imply W(d / lgn) lower
bounds for polynomial-space data structures, where d is the number
of bits of a query. This is the highest lower bound provable through
the classic reduction to communication complexity. However, for
space n lg O(1) n, we now obtain bounds of W(d / lgd).
This is a significant improvement for natural values of d, such as
lg O(1) n. In the most important case of d = Q(lgn),
we have the first superconstant lower bound. From a
complexity-theoretic perspective, our lower bounds are the highest
known for any static data-structure problem, significantly
improving on previous records.
|
Alex Andoni,
Piotr Indyk,
and Mihai Pătraşcu
47th IEEE Symposium on Foundations of Computer Science ( FOCS 2006).
We investigate the optimality of (1+\eps)-approximation algorithms
obtained via the dimensionality reduction method. We show that:
- Any data structure for the (1+\eps)-approximate
nearest neighbor problem in Hamming space, which uses
constant number of probes to answer each query, must
use nW(1/\eps2) space.
- Any algorithm for the (1+\eps)-approximate closest
substring problem must run in time exponential in 1/\eps2-g
for any g > 0 (unless 3SAT can be solved in sub-exponential time)
Both lower bounds are (essentially) tight.
|
We consider deterministic dictionaries in the parallel disk model,
motivated by applications such as file systems. Our main results show
that if the number of disks is moderately large (at least logarithmic
in the size of the universe from which keys come), performance similar
to the expected performance of randomized dictionaries can be achieved.
Thus, we may avoid randomization by extending parallelism. We give
several algorithms with different performance trade-offs. One of our
main tools is a deterministic load balancing scheme based on expander
graphs, that may be of independent interest.
Our algorithms assume access to certain expander graphs "for free".
While current explicit constructions of expander graphs have
suboptimal parameters, we show how to get near-optimal expanders for
the case where the amount of data is polynomially related to the size
of internal memory.
|
Mihai Pătraşcu
and Mikkel Thorup
38th ACM Symposium on Theory of Computing ( STOC 2006).
Full version (very preliminary): arXiv:0603043.
We develop a new technique for proving cell-probe lower bounds for
static data structures. Previous lower bounds used a reduction to
communication games, which was known not to be tight by counting
arguments. We give the first lower bound for an explicit problem which
breaks this communication complexity barrier. In addition, our bounds
give the first separation between polynomial and near linear
space. Such a separation is inherently impossible by communication
complexity.
Using our lower bound technique and new upper bound constructions, we
obtain tight bounds for searching predecessors among a static set of
integers. Given a set Y of n integers of l bits each, the goal
is to efficiently find predecessor(x) = max { y Î Y | y £ x }. For this purpose, we represent Y on a RAM with word
length b using S ³ n l bits of space. Defining a = lg \fracSn, we show that the optimal search time is, up to constant
factors:
|
|
min
| |
ì ï ï í
ï ï î
|
|
|
|
\fraclg\fraclalg( \fracalgn · lg\fracla ) |
|
|
\fraclg\fraclalg( lg\fracla / lg\fraclgna ) |
|
|
|
|
In external memory (b > l), it follows that the optimal strategy
is to use either standard B-trees, or a RAM algorithm ignoring the
larger block size. In the important case of l = b = glgn,
for g > 1 (i.e. polynomial universes), and near linear
space (such as S = n ·lg O(1) n), the optimal search time is
Q(lg l). Thus, our lower bound implies the surprising
conclusion that van Emde Boas' classic data structure from [FOCS'75]
is optimal.
|
|
We develop dynamic dictionaries on the word RAM that use
asymptotically optimal space, up to constant factors, subject to
insertions and deletions, and subject to supporting perfect-hashing
queries and/or membership queries, each operation in constant time
with high probability. When supporting only membership queries, we
attain the optimal space bound of Q(n lg\fracun) bits,
where n and u are the sizes of the dictionary and the universe,
respectively. Previous dictionaries either did not achieve this
space bound or had time bounds that were only expected and
amortized. When supporting perfect-hashing queries, the optimal
space bound depends on the range {1, 2, ..., n+t} of hashcodes
allowed as output. We prove that the optimal space bound is
Q(n lglg\fracun + n lg\fracnt+1) bits when
supporting only perfect-hashing queries, and it is Q(n lg \fracun + n lg\fracnt+1) bits when also supporting
membership queries. All upper bounds are new, as is the W(n lg\fracnt+1) lower bound.
|
We prove nearly tight lower bounds on the number of rounds of
communication required by efficient protocols over asymmetric channels
between a server (with high sending capacity) and one or more clients
(with low sending capacity). This scenario captures the common
asymmetric communication bandwidth between broadband Internet
providers and home users, as well as sensor networks where sensors
(clients) have limited capacity because of the high power requirements
for long-range transmissions. An efficient protocol in this setting
communicates n bits from each of the k clients to the server,
where the clients' bits are sampled from a joint distribution D that
is known to the server but not the clients, with the clients sending
only O(H(D)+k) bits total, where H(D) is the entropy of distribution D.
In the single-client case, there are efficient protocols using O(1)
rounds in expectation and O(lgn) rounds in the worst case. We
prove that this is essentially best possible: with probability
1/2 O(t lgt), any efficient protocol can be forced to use t rounds.
In the multi-client case, there are efficient protocols using O(lgk)
rounds in expectation. We prove that this is essentially best
possible: with probability W(1), any efficient protocol can be
forced to use W(lgk / lglgk) rounds. Along the way, we
develop new techniques of independent interest for proving lower
bounds in communication complexity.
|
Ilya Baran,
Erik Demaine,
and Mihai Pătraşcu
Algorithmica, vol. 50(4), 2008. Special issue.
9th Workshop on Algorithms and Data Structures ( WADS 2005).
We obtain subquadratic algorithms for 3SUM on integers and rationals
in several models. On a standard word RAM with w-bit words, we
obtain a running time of O(n2 / max{ \fracwlg2 w, \fraclg2 n(lglgn)2 }). In the circuit RAM with one
nonstandard AC0 operation, we obtain O(n2 / \fracw2lg2 w). In external memory, we achieve O(n2 / (MB)), even under the
standard assumption of data indivisibility. Cache-obliviously, we
obtain a running time of O(n2 / \fracMBlg2 M). In all
cases, our speedup is almost quadratic in the "parallelism" the
model can afford, which may be the best possible. Our algorithms
are Las Vegas randomized; time bounds hold in expectation, and in
most cases, with high probability.
|
Mihai Pătraşcu
and Corina Tarniţă (Pătraşcu)
Theoretical Computer Science ( TCS), vol. 380, 2007. Special issue.
32nd International Colloquium on Automata, Languages and Programming ( ICALP 2005).
Best Student Paper Award (Track A).
This paper presents several advances in the understanding of dynamic
data structures in the bit-probe model:
- We improve the lower bound record for dynamic language membership
problems to W( (\fraclgnlglgn)2). Surpassing W(lgn)
was listed as the first open problem in a survey by Miltersen.
- We prove a bound of W(\fraclgnlglglgn) for
maintaining partial sums in \Zn2. Previously, the known
bounds were W(\fraclgnlglgn) and O(lgn).
- We prove a surprising and tight upper bound of
O(\fraclgnlglgn) for predecessor problems. We use this
to obtain the same upper bound for dynamic word and prefix problems
in group-free monoids.
|
We consider the problem of maintaining a dynamic set of integers and
answering queries of the form: report a point (equivalently, all
points) in a given interval. Range searching is a natural and
fundamental variant of integer search, and can be solved using
predecessor search. However, for a RAM with w-bit words, we show
how to perform updates in O(lgw) time and answer queries in
O(lglgw) time. The update time is identical to the van Emde
Boas structure, but the query time is exponentially faster(!) Existing
lower bounds show that achieving our query time for predecessor
search requires doubly-exponentially slower updates. We present some
arguments supporting the conjecture that our solution is optimal.
Our solution is based on a new and interesting recursion idea which
is "more extreme" that the van Emde Boas recursion. Whereas van
Emde Boas uses a simple recursion (repeated halving) on each path in
a trie, we use a nontrivial, van Emde Boas-like recursion on every
such path. Despite this, our algorithm is quite clean when seen from
the right angle. To achieve linear space for our data structure, we
solve a problem which is of independent interest. We develop the
first scheme for dynamic perfect hashing requiring sublinear
space. This gives a dynamic Bloomier filter (an approximate storage
scheme for sparse vectors) which uses low space. We strengthen
previous lower bounds to show that these results are optimal.
|
Erik Demaine,
Dion Harmon,
John Iacono,
and Mihai Pătraşcu
SIAM Journal on Computing ( SICOMP), vol. 37(1), 2007. Special issue.
45th IEEE Symposium on Foundations of Computer Science ( FOCS 2004).
We present an O(lglgn)-competitive online binary search tree,
improving upon the best previous (trivial) competitive ratio of O(lgn).
This is the first major progress on Sleator and Tarjan's dynamic optimality
conjecture [STOC'83] that O(1)-competitive binary search trees exist.
|
We consider two algorithmic problems arising in the lives of Yogi Bear
and Ranger Smith. The first problem is the natural algorithmic version
of a classic mathematical result: any (n+1)-subset of {1, ¼, 2n} contains a pair of divisible numbers. How do we actually find
such a pair? If the subset is given in the form of a bit vector, we
give a RAM algorithm with an optimal running time of O(n / lgn).
If the subset is accessible only through a membership oracle, we show
a lower bound of \frac43 n - O(1) and an almost matching upper
bound of (\frac43 + \frac124) n + O(1) on the
number of queries necessary in the worst case.
The second problem we study is a geometric optimization problem where
the objective amusingly influences the constraints. Suppose you want
to surround n trees at given coordinates by a wooden fence.
However, you have no external wood supply, and must obtain wood by
chopping down some of the trees. The goal is to cut down a minimum
number of trees that can be built into a fence that surrounds the
remaining trees. We obtain efficient polynomial-time algorithms for
this problem.
We also describe an unusual data-structural view of the Nim game,
leading to an intriguing open problem.
|
| » |
ACM SIGSAM Bulletin, vol. 38(3), 2004.
This is superseded by a follow-up paper of Adrian Dumitrescu and Guangwu Xu, which also corrects a technical error in our upper bound.
Sorry, abstract not available.
|
|
We study the problem of computing the k-th term of the Farey
sequence of order n, for given n and k. Several methods for
generating the entire Farey sequence are known. However, these
algorithms require at least quadratic time, since the Farey sequence
has Q(n2) elements. For the problem of finding the k-th
element, we obtain an algorithm that runs in time O(n lgn) and
uses space O(Ön). The same bounds hold for the problem of
determining the rank in the Farey sequence of a given fraction. A more
complicated solution can reduce the space to O(n1/3 (lglg n)2/3), and, for the problem of determining the rank of a
fraction, reduce the time to O(n). We also argue that an algorithm
with running time O(\poly(lgn)) is unlikely to exist, since that
would give a polynomial-time algorithm for integer factorization.
|
Erik Demaine,
Thouis "Ray" Jones,
and Mihai Pătraşcu
15th ACM/SIAM Symposium on Discrete Algorithms ( SODA 2004).
Errata: There is an error in Lemma 2.1, regarding the behavior on the uniform distribution. The behavior as stated is correct, but the justification is not.
We define a deterministic metric of "well-behaved data" that enables
searching along the lines of interpolation search. Specifically, define
D to be the ratio of distances between the farthest and nearest
pair of adjacent elements. We develop a data structure that stores a
dynamic set of n integers subject to insertions, deletions, and
predecessor/successor queries in O(lgD) time per operation.
This result generalizes interpolation search and interpolation search trees
smoothly to nonrandom (in particular, non-independent) input data.
In this sense, we capture the amount of "pseudorandomness"
required for effective interpolation search.
|
Mihai Pătraşcu
and Erik Demaine
SIAM Journal on Computing ( SICOMP), vol. 35(4), 2006. Special issue.
Based on two conference publications:
We develop a new technique for proving cell-probe lower bounds on
dynamic data structures. This technique enables us to prove an
amortized randomized W(lgn) lower bound per operation for
several data structural problems on n elements, including partial
sums, dynamic connectivity among disjoint paths (or a forest or a
graph), and several other dynamic graph problems (by simple
reductions). Such a lower bound breaks a long-standing barrier of
W(lgn / lglgn) for any dynamic language membership
problem. It also establishes the optimality of several existing
data structures, such as Sleator and Tarjan's dynamic trees. We
also prove the first W(logB n) lower bound in the
external-memory model without assumptions on the data structure
(such as the comparison model). Our lower bounds also give a
query-update trade-off curve matched, e.g., by several data
structures for dynamic connectivity in graphs. We also prove
matching upper and lower bounds for partial sums when parameterized
by the word size and the maximum additive change in an update.
|
| » |
36th ACM Symposium on Theory of Computing ( STOC 2004).
We prove an W(lgn) cell-probe lower bound on maintaining
connectivity in dynamic graphs, as well as a more general trade-off
between updates and queries. Our bound holds even if the graph is
formed by disjoint paths, and thus also applies to trees and plane
graphs. The bound is known to be tight for these restricted cases,
proving optimality of these data structures (e.g., Sleator and
Tarjan's dynamic trees). Our trade-off is known to be tight for trees,
and the best two data structures for dynamic connectivity in general
graphs are points on our trade-off curve. In this sense these two data
structures are optimal, and this tightness serves as strong evidence
that our lower bounds are the best possible.
From a more theoretical perspective, our result is the first
logarithmic cell-probe lower bound for any problem in the natural
class of dynamic language membership problems, breaking the long
standing record of W(lgn / lglgn). In this sense, our
result is the first data-structure lower bound that is "truly"
logarithmic, i.e., logarithmic in the problem size counted in bits.
Obtaining such a bound is listed as one of three major challenges for
future research by Miltersen (the other two
challenges remain unsolved).
Our techniques form a general framework for proving cell-probe lower bounds
on dynamic data structures. We show how our framework also applies to the
partial-sums problem to obtain a nearly complete understanding of the problem
in cell-probe and algebraic models, solving several previously posed open problems.
|
| » |
15th ACM/SIAM Symposium on Discrete Algorithms ( SODA 2004).
Invited to special issue of ACM Transactions on Algorithms (declined).
We close the gaps between known lower and upper bounds for the
online partial-sums problem in the RAM and group models of
computation. If elements are chosen from an abstract group, we prove
an W(lgn) lower bound on the number of algebraic operations
that must be performed, matching a well-known upper bound. In the RAM
model with b-bit memory registers, we consider the well-studied case
when the elements of the array can be changed additively by
d-bit integers. We give a RAM algorithm that achieves a running time
of Q(1 + lgn / lg(b / d)) and prove a matching lower bound
in the cell-probe model. Our lower bound is for the amortized complexity,
and makes minimal assumptions about the relations between n, b, and d.
The best previous lower bound was W(lgn / (lglgn + lgb)),
and the best previous upper bound matched only in the special case
b = Q(lgn) and d = O(lglgn).
|
Mihai Pătraşcu
Gazeta Informatica ( GInfo), vol. 13(2), 2003.
See problems 3 and 4 on my problem page.
Sorry, abstract not available.
|