One of the most important tasks in Information Retrieval (IR) is related to web page information extraction and processing. It is a common approach to consider a web page as an atomic unit and to model its textual content as a "bag-of-words". However, this kind of representation does not reflect how people perceive a web page. A granular document representation, in terms of semantic objects, can help in identifying semantic areas of a web page and using them for different IR goals. In this paper we use a granular representation to define a new metric for evaluating semantic object importance and to enhance the performance of IR systems. In particular we show that this new metric can be used not only for classification goals, in which instances are assumed as independent and identically distributed, but also to gauge the strength of relationship between hypertextual documents and exploit this information for improving page ranking performance

Fersini, E., Messina, V., Archetti, F. (2008). Granular modeling of web document: impact on information retrieval systems. In Proceeding of the 10th ACM workshop on Web information and data management– WIDM 2008 (pp.111-124). Napa Valley, California, USA : ACM [10.1145/1458502.1458520].

Granular modeling of web document: impact on information retrieval systems

FERSINI, ELISABETTA
Primo
;
MESSINA, VINCENZINA
Secondo
;
ARCHETTI, FRANCESCO ANTONIO
Ultimo
2008

Abstract

One of the most important tasks in Information Retrieval (IR) is related to web page information extraction and processing. It is a common approach to consider a web page as an atomic unit and to model its textual content as a "bag-of-words". However, this kind of representation does not reflect how people perceive a web page. A granular document representation, in terms of semantic objects, can help in identifying semantic areas of a web page and using them for different IR goals. In this paper we use a granular representation to define a new metric for evaluating semantic object importance and to enhance the performance of IR systems. In particular we show that this new metric can be used not only for classification goals, in which instances are assumed as independent and identically distributed, but also to gauge the strength of relationship between hypertextual documents and exploit this information for improving page ranking performance
slide + paper
web document modelling, classification, ranking, Information Search and Retrieval
English
10th ACM Workshop on Web Information and Data Management, WIDM '08, Co-located with the ACM 17th Conference on Information and Knowledge Management, CIKM '08
2008
Proceeding of the 10th ACM workshop on Web information and data management– WIDM 2008
9781605582603
2008
111
124
http://delivery.acm.org/10.1145/1460000/1458520/p111-fersini.pdf?key1=1458520&key2=3871979721&coll=GUIDE&dl=GUIDE&CFID=97874199&CFTOKEN=34886139
none
Fersini, E., Messina, V., Archetti, F. (2008). Granular modeling of web document: impact on information retrieval systems. In Proceeding of the 10th ACM workshop on Web information and data management– WIDM 2008 (pp.111-124). Napa Valley, California, USA : ACM [10.1145/1458502.1458520].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/13871
Citazioni
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
Social impact