The successful application of natural language processing for information retrieval

Authors

  • Antonio Ferrández Dept. Languages and Information Systems, University of Alicante , Alicante, Spain
  • Yenory Rojas Dept. Languages and Information Systems, University of Alicante , Alicante, Spain
  • Jesús Peral Dept. Languages and Information Systems, University of Alicante , Alicante, Spain

Keywords:

Information Retrieval, Natural Lenguage Processing, Entity, CLEF, Anaphora Resolution

Abstract

In this paper, a novel model for monolingual Information Retrieval in English and Spanish language is proposed. This model uses Natural Language Processing techniques (a POStagger, a Partial Parser, and an Anaphora Resolver) in order to improve the precision of traditional IR systems, by means of indexing the "entities" and the "relations" between these entities in the documents. This model is evaluated on both the Spanish and English CLEF corpora. For the English queries, there is a maximum increase of 35.11% in the average precision. For the Spanish queries, the maximum increase is 37.18%.

Downloads

Download data is not yet available.

References

[1] Alonso, M. A., Vilares, J., Darriba, V. M. (2002) On the Usefulness of Extracting Syntactic Dependencies for Text Indexing. Artificial Intelligence and Cognitive Science. Volume 2464 of Lecture Notes in Artificial Intelligence, pp. 3-11.
[2] Amati, G., Carpineto, C., Romano, G. (2003). Comparing weighting models for monolingual Information Retrieval. In the Proceedings of the Working Notes for the CLEF 2003 Workshop, pp. 169-178.
[3] Arampatzis, A. T., van der Weide, Th. P., Koster, C. H. A., and van Bommel, P. (2000). Linguistically motivated Information Retrieval. Encyclopedia of Library and Information Science. Marcel Dekker, Inc.,
New York, Basel.
[4] Baeza-Yates, R. (2004) Challenges in the Interaction of Information Retrieval and Natural Language Processing. Computational Linguistics and Intelligent Text Processing. Volume 2945 of Lecture Notes in Computer Science, pp. 445-456.
[5] Bartell, B., Cottrell, G., Belew, R. (1994). Automatic combination of multiple ranked retrieval systems. In the Proceedings of the 17th International Conference on Research and Development in Information Retrieval (SIGIR’94), pp. 173-181.
[6] Byung-Kwan, K., Jee-Hyub, K., Geunbae, L., Jung Yun, S. (2000). Corpus-Based Learning of Compound Noun Indexing. In the Proceedings of the ACL 2000 Workshop on Recent Advances in NLP and IR, pp. 57-66.
[7] Cornelis H.A. Koster. (2004) Head/Modifier Frames for Information Retrieval. Computational Linguistics and Intelligent Text Processing. Volume 2945 of Lecture Notes in Computer Science, pp. 420-433.
[8] Ferrández, A., Palomar, M., Moreno, L. (1999). An empirical approach to Spanish anaphora resolution. Machine Translation, 14(3/4), pp. 191-216.
[9] Gonzalo, J., F. Verdejo, I. Chugur, J. Cigarrán (1998) Indexing with WordNet synsets can improve text retrieval. In the Proceedings of the ACL/COLING Workshop on Usage of WordNet for Natural Language Processing, pp. 38-44.
[10] Kaszkiel, M., Zobel, J., Sacks-Davis, R. (1999). Efficient passage ranking for document databases.
ACM Transactions of Information Systems, 17(4), pp. 406-439.
[11] Llopis, F., Vicedo, J.L. (2002). IR-n: A Passage Retrieval System at CLEF-2001. Evaluation of Cross-Language Information Retrieval Systems. Volume 2406 of Lecture Notes in Computer Science, pp.
244-252.
[12] Mitra M., Buckley C., Singhal A., Cardie C. (1997). An analysis of statistical and syntactic phrases. In the Proceedings of the 5th International Conference “Recherche d'Information Assistee par Ordinateur” (RIAO’97), pp. 200-214.
[13] Moffat, A., Zobel, J. (1996). Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems, 14(4), pp. 349-379.
[14] Persing, M., Zobel, J. (1996). Filtered document retrieval with frequency-sort indexes. Journal of the American Society of Information Science, 47(10), pp. 749-764.
[15] Singhal, A., Buckley, C., Mitra, M. (1996). Pivoted Document Length Normalization. In the Proceedings of the 19th International Conference on Research and Development in Information Retrieval (SIGIR’96), pp. 21-29.
[16] Strzalkowski, T. (1999a). Natural Language Information Retrieval. Kluwer Academic Publishers.
[17] Strzalkowski, T. , Fang Lin, Jin Wang, Jose Perez-Carballo (1999b). Evaluating Natural Language Processing Techniques in Information Retrieval. In (Strzalkowski, 1999a), pp. 113-146.
[18] Vilares, J., Alonso, M.A., Ribadas, F.J. (2003). COLE Experiments at CLEF 2003 Spanish Monolingual Track. In the Proceedings of the Working Notes for the CLEF 2003 Workshop, pp. 197-206.
[19] Voorhees, E., Gupta, N., Johnson-Laird, B. (1995). The collection Fusion Problem. In the Proceedings of the Third Text Retrieval Conference (TREC-3), pp. 95-104.
[20] Voorhees (1999). Natural Language Processing and Information Retrieval. Information Extraction: towards scalable, adaptable systems. Volume 1714 of Lecture Notes in Artificial Intelligence, pp. 32-48.
[21] Zhai, Ch., Tong X., Milic-Frayling N., A. Evans, D. (1997). Evaluation of Syntactic Phrase Indexing - CLARIT NLP Track Report. In the Proceedings of the Fifth Text REtrieval Conference (TREC-5).
[22] Zobel J., Moffat A (1998). Exploring the similarity space. In the Proceedings of the 21st International Conference on Research and Development in Information Retrieval (SIGIR’98), pp. 18-34.

Downloads

Published

2007-03-01

How to Cite

Ferrández, A., Rojas, Y., & Peral, J. (2007). The successful application of natural language processing for information retrieval. Journal of Computer Science and Technology, 7(01), p. 79–85. Retrieved from https://journal.info.unlp.edu.ar/JCST/article/view/807

Issue

Section

Original Articles

Most read articles by the same author(s)