Platform for Collection from Heterogeneous Web Sources and its Application to a Semantic Repository Organization at SeDiCI: Preliminaries

Authors

  • Marisa Raquel De Giusti Comisión de Investigaciones Científicas (CIC) de la Provincia de Buenos Aires and Proyecto de Enlace de Bibliotecas UNLP
  • Ariel Sobrado Proyecto de Enlace de Bibliotecas UNLP
  • Agustin Vosou Proyecto de Enlace de Bibliotecas UNLP
  • Gonzalo Luján Villarreal Consejo Nacional de Investigaciones Técnicas y Científicas (CONICET) and Proyecto de Enlace de Bibliotecas UNLP

Keywords:

SeDiCI, Semantic repository, ontology, thesaurus

Abstract

Presentation of a web collection platform designed to relate and unify information available ondifferent standard web sources with a view to creating a user-browseable thematic repository. The platform will be used at the Intellectual Creation Diffusion Service [1] combined with ontologies and thesaurus to provide improved data sorting. Data is currently spread on web resources and traditional search engines return ranked lists with no semantic relation among documents. Users have to spend a great deal of time relating documents and trying to figure out which ones fully address the issue domain. It is only after locating similarities and differences that information fragments are applied to the user's work, enabling knowledge creation.
The proposed platform sorts out the different theme domain functioning modules to allow their use in various knowledge areas. Development includes two agents that searches data base stored URLs, one is capable of identifying bookmarked pages, interpreting labels and providing rules for extracting information and storing it in a RDF data file; on the other hand, the other agent is in charge of getting related URLs from the given one. After this stage, homogenization is applied and transformed information is sorted out according to domain ontologies. The platform allows for more efficient automatic extraction processes and information search among heterogeneous sources that represent the same concepts using different standards.

Downloads

Download data is not yet available.

References

[1] SeDiCI, Servicio de Difusión de la Creación Intelectual (SeDiCI) http://www.sedici.unlp.edu.ar/
[2] Abian, M.A, El futuro de la web. Xml,rdf/rdfs, ontologías y la web semántica, 2003 http://www.javahispano.org/contenidos/es/el_futuro_de_la_web/
[3] W3C Semantic Web Activity [Online], 2009 http://www.w3.org/2001/sw/grddl-wg/td/grddl-tests#spaces-in-rel/
[4] Wikipedia, Web Semántica [Online], 2009 http://es.wikipedia.org/wiki/Web_semántica
[5]: Berners-Lee, T. y Fischetti, M, Weaving the Web: The original Design and Ultimate Destiny of the World Wide Web by its Inventor. San Francisco:Harper, 1999
[6] Gruber, T. R., “What is an Ontology?”.[Online], 1992 http://www-ksl.stanford.edu/kst/what-is-an-ontology.html
[7] Wikipedia,Tesauro. [Online], 2009 http://es.wikipedia.org/wiki/Tesauro
[8] Wikipedia, Agent [Online], 2009 http://es.wikipedia.org/wiki/Agente_inteligente_(Inteligencia_Artificial)
[9] Wikipedia, Web crawler [Online], 2009, July http://en.wikipedia.org/wiki/Web_crawler
[10] Sun, Writing a Web Crawler in the Java Programming Language [Online], http://java.sun.com/developer/technicalArticles/ThirdParty/WebCrawler/
[11] W3C, Gleaning Resource Descriptions from Dialects of Languages (GRDDL) [Online], 2007 http://www.w3.org/TR/grddl/
[12] W3C, XSL Transformations (XSLT) Version 1.0 [Online], 1999 http://www.w3.org/TR/xslt
[13] Wikipedia, Microformatos [Online], 2009, July http://es.wikipedia.org/wiki/Microformato
[14] Microformats, Microformats.[Online], 2005 http://microformats.org/
[15] Wikipedia, Resource Description Framework. [Online], 2009 http://es.wikipedia.org/wiki/Resource_Description_Framework
[16] Dublin Core Metadata Initiative, Expressing Dublin Core in HTML/XHTML meta and link elements [Online], 2003 http://dublincore.org/documents/dcq-html/
[17] Dublin Core Metadata Initiative (DCMI), Dublin Core Metadata Initiative, 2009 http://dublincore.org.
[18] Mendez, E, DCMF:DC and microformatos, a good marriage”. International Conference on Dublin Core and Metadata Applicationes. [Online], 2008 http://dc2008.de/wp-content/uploads/2008/09/dc2008_mendezetal.pdf
[19] Stanford Center for Biomedical Informatics Research, Welcome to protégé [Online], 2009, July http://protege.stanford.edu/
[20] Dublin Core Ontology, http://protege.stanford.edu/plugins/owl/dc/protege-dc.owl
[21] W3C, Web Ontology Language [Online], 2004 http://www.w3.org/2004/OWL/

Downloads

Published

2009-10-01

How to Cite

De Giusti, M. R., Sobrado, A., Vosou, A., & Villarreal, G. L. (2009). Platform for Collection from Heterogeneous Web Sources and its Application to a Semantic Repository Organization at SeDiCI: Preliminaries. Journal of Computer Science and Technology, 9(02), 89–92. Retrieved from https://journal.info.unlp.edu.ar/JCST/article/view/770

Issue

Section

Original Articles