Bicocca Open Archive

One of the most important tasks in Information Retrieval (IR) is related to web page information extraction and processing. It is a common approach to consider a web page as an atomic unit and to model its textual content as a "bag-of-words". However, this kind of representation does not reflect how people perceive a web page. A granular document representation, in terms of semantic objects, can help in identifying semantic areas of a web page and using them for different IR goals. In this paper we use a granular representation to define a new metric for evaluating semantic object importance and to enhance the performance of IR systems. In particular we show that this new metric can be used not only for classification goals, in which instances are assumed as independent and identically distributed, but also to gauge the strength of relationship between hypertextual documents and exploit this information for improving page ranking performance

Fersini, E., Messina, V., Archetti, F. (2008). Granular modeling of web documents: Impact on information retrieval systems. In Proceeding of the 10th ACM workshop on Web information and data management– WIDM 2008 (pp.111-124). Napa Valley, California, USA : ACM [10.1145/1458502.1458520].

Granular modeling of web documents: Impact on information retrieval systems

FERSINI, ELISABETTA^Primo;MESSINA, VINCENZINA^Secondo;ARCHETTI, FRANCESCO ANTONIO^Ultimo

2008

Abstract

One of the most important tasks in Information Retrieval (IR) is related to web page information extraction and processing. It is a common approach to consider a web page as an atomic unit and to model its textual content as a "bag-of-words". However, this kind of representation does not reflect how people perceive a web page. A granular document representation, in terms of semantic objects, can help in identifying semantic areas of a web page and using them for different IR goals. In this paper we use a granular representation to define a new metric for evaluating semantic object importance and to enhance the performance of IR systems. In particular we show that this new metric can be used not only for classification goals, in which instances are assumed as independent and identically distributed, but also to gauge the strength of relationship between hypertextual documents and exploit this information for improving page ranking performance

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				slide + paper
			
	Parole chiave
	
				Document classification; Relational granular document modeling; Visual layout analysis; Web page ranking;
			
	Parole chiave
	
				web document modelling, classification, ranking, Information Search and Retrieval
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				10th ACM Workshop on Web Information and Data Management, WIDM '08, Co-located with the ACM 17th Conference on Information and Knowledge Management, CIKM '08
			
	Anno del convegno
	
				2008
			
	Titolo degli atti
	
				Proceeding of the 10th ACM workshop on Web information and data management– WIDM 2008
			
	ISBN del volume degli atti
	
				9781605582603
			
	Data di pubblicazione
	
				2008
			
	Pagina iniziale
	
				111
			
	Pagina finale
	
				124
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1145/1458502.1458520
			
	URL alternativo
	
				http://delivery.acm.org/10.1145/1460000/1458520/p111-fersini.pdf?key1=1458520&key2=3871979721&coll=GUIDE&dl=GUIDE&CFID=97874199&CFTOKEN=34886139
			
	Fulltext
	
				none
			
	Citazione
	
				Fersini, E., Messina, V., Archetti, F. (2008). Granular modeling of web documents: Impact on information retrieval systems. In Proceeding of the 10th ACM workshop on Web information and data management– WIDM 2008 (pp.111-124). Napa Valley, California, USA : ACM [10.1145/1458502.1458520].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/13871

Citazioni

4

ND

Social impact