List of clustered permutations in secondary memory for proximity searching
Keywords:metric spaces, permutation-based algorithm, list of clusters, secondary memory
Similarity search is a difficult problem and various indexing schemas have been defined to process similarity queries efficiently in many applications, including multimedia databases and other repositories handling complex objects. Metric indices support efficient similarity searches, but most of them are designed for main memory. Thus, they can handle only small datasets, suffering serious performance degradations when the objects reside on disk. Most reallife database applications require indices able to work on secondary memory. Among a plethora of indices, the List of Clustered Permutations (LCP) has shown to be competitive in main memory.We introduce a secondary-memory variant of the LCP, which maintains the low number of distance evaluations when comparing the permutations themselves, and also needs a low number of I/O operations at construction and searching.
 E. Chávez and G. Navarro. A compact space decomposition for effective metric indexing. Pattern Recogn. Letters, 26(9):1363–1376, 2005.
 E. Chávez, G. Navarro, R. Baeza-Yates, and J. Marroquı́n. Searching in metric spaces. ACM Comp. Surveys, 33(3):273–321, Sept. 2001.
 E. Chávez, K. Figueroa, and G. Navarro. Proximity searching in high dimensional spaces with a proximity preserving order. In Proc. of MICAI 2005, Mexico, 405–414, 2005.
 E. Chavez, K. Figueroa, and G. Navarro. Effective proximity retrieval by ordering permutations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30:1647–1658, 2008.
 E. Chávez and G. Navarro. Probabilistic proximity search: fighting the curse of dimensionality in metric spaces. Inf. Process. Lett., 85(1):39–46, Jan. 2003.
 A. Esuli. Mipai: Using the pp-index to build an efficient and scalable similarity search system. In Proc. of SISAP 2009, 29-30 Aug. 2009, Czech Republic, 146–148, 2009.
 R. Fagin, R. Kumar, and D. Sivakumar. Comparing top k lists. In Proc. of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’03, 28–36, USA, 2003.
 K. Figueroa and R. Paredes. List of clustered permutations for proximity searching. In Proc. of SISAP 2013, LNCS 8199, 50–58. Springer, 2013.
 K. Figueroa, R. Paredes, and R. Rangel. Efficient group of permutants for proximity searching. In Proc. of MCPR 2011, LNCS 6718, 42–49. Springer, 2011.
 K. Figueroa, G. Navarro, and E. Chávez. Metric spaces library, 2007. Available at http://www.sisap.org/Metric Space Library.html .
 G. Navarro and N. Reyes. Dynamic list of clusters in secondary memory. In Proc. of SISAP 2014, LNCS 8821, 94–105. Springer Int. Publishing, 2014.
 H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., USA, 2005.
 P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search: The Metric Space Approach,
Vol. 32 of Advances in Database Systems. Springer, 2006.