LSH Algorithm and
Implementation (E2LSH)
Locality-Sensitive Hashing (LSH)
is an algorithm for solving the (approximate/exact)
Near Neighbor Search in high dimensional spaces. On this webpage, you
will find pointers to the newest LSH algorithm in Euclidean (l_2) spaces, as
well as
the description of the E2LSH package, an implementation of this
new algorithm for the Euclidean space.
-
LSH algorithm description:
-
Most recent algorithm (2006):
"Near-Optimal
Hashing Algorithms for Near Neighbor Problem in High Dimensions"
(by Alexandr Andoni and Piotr Indyk). In Proceedings of the
Symposium on Foundations of
Computer Science (FOCS'06), 2006.
Slides: Here are some slides
on the LSH algorithm from a talk given by Piotr Indyk.
-
A good description of earlier
algorithms: a book with a good introduction to LSH, and
the description of
affairs as of 2005, is Nearest
Neighbor Methods in Learning and
Vision: Theory and Practice,
by T. Darrell and P. Indyk and G. Shakhnarovich (eds.), MIT Press, 2006.
See book
introduction for a smooth introduction to NN problem and LSH, as well as the following book chapter on LSH.
A paper version of the algorithm in
the book: "Locality-Sensitive
Hashing Scheme Based on p-Stable
Distributions", by M. Datar, P. Indyk, N. Immorlica and V. Mirrokni.
In Proceedings of the Symposium on Computational Geometry, 2004.
-
Older version of LSH (1999):
the best algorithm for the Hamming space remains the one described, e.g, in [GIM'99]
paper.
-
LSH implementation:
Currently, we only have an alpha-version available - the E2LSH
package. The code is based on the algorithm described in the book
chapter from above.
If you are interested in trying the code out, please send an email to
Alex Andoni at:
You can also download the manual for the code to see its
functionality. The code has been developed by Alex Andoni.
This research is supported by NSF CAREER Grant #0133849 "Approximate
Algorithms for High-dimensional Geometric Problems".