E-Mail Processing with Fuzzy SOMs and Association Rules
Keywords:Information Retrieval, Data Mining, Text Mining, E-mails Analysis, FSOM, Association Rules
E-mail texts are hard to process due to their short length. In this article, the use of a diffuse neural network that is capable of identifying the most relevant terms in a set of e-mails is proposed. The associations between these terms will be measured through association rules built with the terms identified by the network. The metrics support, confidence and interest of the rules will be used to qualify the corresponding terms. The method proposed has been used to process e-mails of the PACENI Project (Support Project for Improving First-Year Teaching in Courses of Studies in Exact and Natural Sciences, Economic Science and Computer Science). With this type of analysis, the most common topics of student questions have been identified. Even though this new information can have various applications, they all involve, as a first instance, an improvement in student service.
 Sophia Ananiadou, Paul Thompson, James Thomas, Tingting Mu, Sandy Oliver, Mark Rickinson, Yutaka Sasaki, Davy Weissenbacher, and John McNaught. Supporting the education evidence portal via text mining. Philosophical Transactions of the Royal Society A: Mathematical,Physical and Engineering Sciences, 368(1925):3829–3844, 2010.
 Laura Lanzarini Augusto Villa Monte, César Estrebou. E-mail processing using data mining techniques. In XVI Congreso Argentino de Ciencias de la Computación CACIC 2010, pages 987–992, 2010.
 Dennis D. Perez Barrenechea. A spanish stemming algorithm implementation in prolog and c#. 2006.
 Christian Bird, Alex Gourley, Prem Devanbu, Michael Gertz, and Anand Swaminathan. Mining email social networks. In Proceedings of the 2006 international workshop on Mining software repositories, MSR ’06, pages 137–143, New York, NY, USA, 2006. ACM.
 Kuan C. Chen. Text mining e-complaints data from e-auction store with implications for internet marketing research. Journal of Business and Economics Research (JBER), 7(5):15–24, 2009.
 Carlos G. Figuerola, Raquel Gómez Dı́az, Angel F. Zazo Rodrı́guez, and José Luis Alonso Berrocal. Spanish monolingual track: The impact of stemming on retrieval. In Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems, CLEF ’01, pages 253–261, London, UK, 2002. Springer-Verlag.
 Teuvo Kohonen. Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43:59–69, 1982.
 Teuvo Kohonen. Self-Organizing Maps. Springer, 1997.
 Martin Krallinger and Alfonso Valencia. Text-mining and information-retrieval services for molecular biology. Genome Biology, 6(7):224, 2005.
 Kin-Nam Lau, Kam-Hon Lee, and Ying Ho. Text mining for the hotel industry. Cornell Hotel and Restaurant Administration Quarterly, 46(3):344–362, 2005.
 Xiao Li, Junyong Luo, and Meijuan Yin. E-mail filtering based on analysis of structural features and text classification. In Intelligent Systems and Applications (ISA), 2010 2nd International Workshop on, pages 1–4, 2010.
 Carlos M. Lorenzetti, Rocı́o L. Cecchini, Ana Gabriela Maguitman, and András A. Benczúr. Métodos para la selección y el ajuste de caracterı́sticas en el problema de la detección de spam. XII Workshop de Investigadores en Ciencias de la Computación, Area de Agentes y Sistemas Inteligentes, 2010.
 Wanli Ma, D. Tran, and D. Sharma. A novel spam email detection system based on negative selection. In Computer Sciences and Convergence Information Technology, 2009. ICCIT ’09. Fourth International Conference on, pages 987–992, 2009.
 Jason D. M. Rennie. ifile: An application of machine learning to e-mail filtering. In Proc. KDD Workshop on Text Mining, 2000.
 M.F. Saeedian and H. Beigy. Dynamic classifier selection using clustering for spam detection. In Computational Intelligence and Data Mining, 2009. CIDM ’09. IEEE Symposium on, pages 84–88, 2009.
 O. De Vel. Mining e-mail authorship. KDD2000 Workshop on Text Mining, 30(4):55, 2000.
 Petri Vuorimaa. Fuzzy self-organizing map. Fuzzy Sets Syst., 66:223–231, September 1994.
 Tsan-Ying Yu and Wei-Chih Hsu. E-mail spam filtering using support vector machines with selection of kernel function parameters. In Proceedings of the 2009 Fourth International Conference on Innovative Computing, Information and Control, ICICIC ’09, pages 764–767, Washington, DC, USA, 2009. IEEE Computer Society.