Optimizing the spatial approximation tree from the root
Keywords:similarity search, metric spaces, databases
Many computational applications need to look for information in a database. Nowadays, the predominance of nonconventional databases makes the similarity search (i.e., searching elements of the database that are "similar" to a given query) becomes a preponderant concept. The Spatial Approximation Tree has been shown that it compares favorably against alternative data structures for similarity searching in metric spaces of medium to high dimensionality ("difficult" spaces) or queries with low selectivity. However, for the construction process the tree root has been randomly selected and the tree ,in its shape and performance, is completely determined by this selection. Therefore, we are interested in improve mainly the searches in this data structure trying to select the tree root so to reflect some of the own characteristics of the metric space to be indexed. We regard that selecting the root in this way it allows a better adaption of the data structure to the intrinsic dimensionality of the metric space considered, so also it achieves more efficient similarity searches.
 R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999.
 J. Bentley. Multidimensional binary search trees used for associative searching. Comm. of the ACM, 18(9):509–517, 1975.
 J. Bentley. Multidimensional binary search trees in database applications. IEEE Trans. on Software Engineering, 5(4):333–340, 1979.
 C. Böhm, S. Berchtold, and D. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys , 33(3):322–373, Sept. 2001.
 E. Chávez and G. Navarro. Measuring the dimensionality of general metric spaces. Tech. Report TR/DCC-00-1, DCC, University of Chile, 2000.
 E. Chávez and G .Navarro. Towards measuring the searching complexity of metric spaces. In Proc. Mexican Computing Meeting, II, 969–978, México, 2001.
 E. Chávez, G. Navarro, R. Baeza-Yates, and J. Marroquín. Searching in metric spaces. ACM Computing Surveys, 33(3):273–321, Sept. 2001.
 P. Ciaccia, M. Patella, and P. Zezula. M-tree: an efficient access method for similarity search in metric spaces. In Proc.of the 23rd Conference on Very Large Databases, 426–435, 1997.
 R. Duda and P. Hart. Pattern Classification and Scene Analysis. Wiley, 1973.
 R.F . Santos Filho, A. Traina, C. Traina Jr., and C. Faloutsos. Similarity search without tears: The omni family of all-purpose access methods. In Proc. of the 17th International Conference on Data Engineering, 623–630, 2001.
 V. Gaede and O. Günther. Multidimensional access methods. ACM Computing Surveys, 30(2):170–231, 1998.
 A. Guttman. R-trees: a dynamic index structure for spatial searching. In Proc. ACM SIGMOD International Conference on Management of Data, 47–57, 1984.
 G. Hjaltason and H. Samet. Improved search heuristics for the sa-tree. Pattern Recognition Letters , 24(15):2785–2795, 2003.
 K. Kukich. Techniques for automatically correcting words in text. ACM Computing Surveys, 24(4):377–439, 1992.
 G. Navarro. Searching in metric spaces by spatial approximation. The Very Large Databases Journal, 11(1):28–46, 2002.
 J. Peñarrieta, P. Morriberón, and E. Cuadros-Vargas. Distributed spatial approximation tree - sat *. In M. Marin and G. Acuña, editors, Actas de la XXXII Conferencia Latinoamericana de Informatica (CD ROM), August 2006. 956-303-028-1.
 G. Salton and M. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
 Hanan Samet. Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling). Morgan Kaufmann Publishers Inc.,USA, 2005.
 D. Sankoff and J. Kruskal, editors. Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison. Addison-Wesley, 1983.
 M. Waterman. Introduction to Computational Biology. Chapman and Hall, 1995.
 A. Yoshitaka and T. Ichikawa. A survey on content-based retrieval for multimedia databases. IEEE Trans. on Knowledge and Data Engineering, 11(1):81–93, 1999.
 P. Zezula, G. Amato, V. Dohnal, and M.Batko. Similarity Search:The Metric Space Approach, volume 32 of Advances in Database Systems. Springer, 2006