The exponential growth of information available on the World Wide Web, and retrievable by search engines, has implied the necessity to develop efficient and effective methods for organizing relevant contents. In this field document clustering plays an important role and remains an interesting and challenging problem in the field of web computing. In this paper we present a document clustering method, which takes into account both contents information and hyperlink structure of web page collection, where a document is viewed as a set of semantic units. We exploit this representation to determine the strength of a relation between two linked pages and to define a relational clustering algorithm based on a probabilistic graph representation. The experimental results show that the proposed approach, called RED-clustering, outperforms two of the most well known clustering algorithm as k-Means and Expectation Maximization
Fersini, E., Messina, V., & Archetti, F.A. (2010). A probabilistic relational approach for web document clustering. INFORMATION PROCESSING & MANAGEMENT, 46(2), 117-130.
Citazione: | Fersini, E., Messina, V., & Archetti, F.A. (2010). A probabilistic relational approach for web document clustering. INFORMATION PROCESSING & MANAGEMENT, 46(2), 117-130. |
Tipo: | Articolo in rivista - Articolo scientifico |
Carattere della pubblicazione: | Scientifica |
Titolo: | A probabilistic relational approach for web document clustering |
Autori: | Fersini, E; Messina, V; Archetti, FA |
Autori: | |
Data di pubblicazione: | mar-2010 |
Lingua: | English |
Rivista: | INFORMATION PROCESSING & MANAGEMENT |
Digital Object Identifier (DOI): | http://dx.doi.org/10.1016/j.ipm.2009.08.003 |
Appare nelle tipologie: | 01 - Articolo su rivista |