Spatial selection of sparse pivots for similarity search in metric spaces
Keywords:Similarity Search, metric spaces, pivot selection, databases, searching, indexing
Similarity search is a fundamental operation for applications that deal with unstructured data sources. In this paper we propose a new pivot-based method for similarity search, called Sparse Spatial Selection (SSS). The main characteristic of this method is that it guarantees a good pivot selection more efficiently than other methods previously proposed. In addition, SSS adapts itself to the dimensionality of the metric space we are working with, without being necessary to specify in advance the number of pivots to use. Furthermore, SSS is dynamic, that is, it is capable to support object insertions in the database efficiently, it can work with both continuous and discrete distance functions, and it is suitable for secondary memory storage. In this work we provide experimental results that confirm the advantages of the method with several vector and metric spaces. We also show that the efficiency of our proposal is similar to that of other existing ones over vector spaces, although it is better over general metric spaces.
 Ricardo Baeza-Yates, Walter Cunto, Udi Manber, and Sun Wu. Proximity matching using fixed-queries trees. In Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching, pages 198-212, Springer-Verlag, 1994.
 Tolga Bozkaya and Meral Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proceedings of the ACM International Conference onManagement of Data (SIGMOD 1997), pages 357-368, May 1997.
 Sergey Brin. Near neighbor search in large metric spaces. In 21st conference on Very Large Databases (VLDB), 1995.
 Walter A. Burkhard and Robert M. Keller. Some approaches to best-match file searching. Communications of the ACM, 16(4):230-236, April 1973.
 Benjamín Bustos, Gonzalo Navarro, and Edgar Chávez. Pivot selection techniques for proximity search in metric spaces. In SCCC 2001, Proceedings of the XXI Conference ofthe Chilean Computer Science Society, pages 33-40. IEEE Computer Science Press, 2001.
 Edgar Chávez, José Luis Marroquín, and Gonzalo Navarro. Overcoming the curse of dimensionality. In European Workshop on Content-based Multimedia Indexing (CBMI'99), pages 57-64, 1999.
 Edgar Chávez, Gonzalo Navarro, Ricardo Baeza-Yates, and José Luis Marroquín. Searching in metric spaces. ACM Computing Surveys, 33(3):273-321, September 2001.
 Iraj Kalantari and Gerard McDonald. A data structure and an algorithm for the nearest point problem. IEEE Transactions on Software Engineering, 9:631-634, 1983.
 Luisa Micó, José Oncina, and R. Enrique Vidal. A new version of the nearest-neighbor approximating and eliminating search (AESA) with linear pre-processing time and memory requirements. Pattern Recognition Letters, 15:9-17, 1994.
 Gonzalo Navarro. Searching in metric spaces by spatial approximation. In Proceedings of String Processing and Information Retrieval (SPIRE'99), pages 141-148. IEEE ComputerScience Press, 1999.
 Jeffrey K. Uhlmann. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 40:175-179, 1991.
 Enrique Vidal. An algorithm for finding nearest neighbors in (aproximately) constant average time. Pattern Recognition Letters, 4:145-157, 1986.
 Peter Yianilos. Data structures and algorithms for nearest-neighbor search in general metric spaces. In Proceedings of the fourth annual ACM-SIAM Symposium on Discrete Algorithms, pages 311-321. ACM Press, 1993.
 Peter Yianilos. Excluded middle vantage point forests for nearest neighbor search. In Proceedings of the 6th DIMACS Implementation Challenge: Near neighbour searches (ALENEX 1999), January 1999.
 Pavel Zezula, Giuseppe Amato, Vlatislav Dohnal, and Michal Batko. Similarity search. The metric space approach, volume 32 of Advances in Database Systems. Springer, 2006.