CAD

Can Data Mining Techniques Ease The Semantic Tagging Burden?

F. Forno L. Farinetti S. Mehan

VLDB2003: First International Workshop on Semantic Web and Databases, Berlin, Germany

ABSTRACT

The effective implementation of the Semantic Web vision is highly dependent upon the widespread availability of large collections of semantically rich resources which are trustworthy and meaningful. Since semantic classification is dependent upon complex ontologies, a recognised difficulty is the steep learning curve presented to human classifiers when attempting to utilise such ontologies. One important method to foster an increase in web accessible, semantically tagged resources is to make available tools which allow users to explore and understand relevant ontologies and to present relevant categories with which to tag new data. In this paper we investigate how an important and powerful data mining technique, Latent Semantic Indexing (LSI), might help in the design and implementation of tools that guide users in semantic tagging tasks. We applied LSI to a large portion of the Open Directory Project (ODP) catalogue, one of the largest repositories of semantically tagged resources available today. We computed statistical information concerning category relationships in the ODP data set, and we incorporated structural information by modifying the construction process of the LSI space. Using this basis, we conducted a comparative experiment where a machine generated classification of new documents was evaluated against a classification created by a group of human users. This paper includes an evaluation and discussion of the experimental results.


Related files:
vldb03.pdfAdobe Acrobat portable document


[FFMe03] F. Forno, L. Farinetti, S. Mehan, "Can Data Mining Techniques Ease The Semantic Tagging Burden?," VLDB2003: First International Workshop on Semantic Web and Databases, Berlin, Germany