Bicocca Open Archive

The exponential growth of information available on the World Wide Web, and retrievable by search engines, has implied the necessity to develop efficient and effective methods for organizing relevant contents. In this field document clustering plays an important role and remains an interesting and challenging problem in the field of web computing. In this paper we present a document clustering method, which takes into account both contents information and hyperlink structure of web page collection, where a document is viewed as a set of semantic units. We exploit this representation to determine the strength of a relation between two linked pages and to define a relational clustering algorithm based on a probabilistic graph representation. The experimental results show that the proposed approach, called RED-clustering, outperforms two of the most well known clustering algorithm as k-Means and Expectation Maximization

Fersini, E., Messina, V., Archetti, F. (2010). A probabilistic relational approach for web document clustering. INFORMATION PROCESSING & MANAGEMENT, 46(2), 117-130 [10.1016/j.ipm.2009.08.003].

A probabilistic relational approach for web document clustering

FERSINI, ELISABETTA;MESSINA, VINCENZINA;ARCHETTI, FRANCESCO ANTONIO

2010

Abstract

The exponential growth of information available on the World Wide Web, and retrievable by search engines, has implied the necessity to develop efficient and effective methods for organizing relevant contents. In this field document clustering plays an important role and remains an interesting and challenging problem in the field of web computing. In this paper we present a document clustering method, which takes into account both contents information and hyperlink structure of web page collection, where a document is viewed as a set of semantic units. We exploit this representation to determine the strength of a relation between two linked pages and to define a relational clustering algorithm based on a probabilistic graph representation. The experimental results show that the proposed approach, called RED-clustering, outperforms two of the most well known clustering algorithm as k-Means and Expectation Maximization

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Relational document clustering, Relational web structure estimation
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				2010
			
	Rivista
	
				INFORMATION PROCESSING & MANAGEMENT
			
	Numero del volume
	
				46
			
	Fascicolo
	
				2
			
	Pagina iniziale
	
				117
			
	Pagina finale
	
				130
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1016/j.ipm.2009.08.003
			
	Fulltext
	
				none
			
	Citazione
	
				Fersini, E., Messina, V., Archetti, F. (2010). A probabilistic relational approach for web document clustering. INFORMATION PROCESSING & MANAGEMENT, 46(2), 117-130 [10.1016/j.ipm.2009.08.003].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/9497

Citazioni

17

12

Social impact