We present a web document classification system based on the assumption that the images of a web page are those elements which mainly attract the attention of the user. This assumption implies that the text contained in the visual block in which an image is located, called semantic image-block, should contain relevant information about the page contents. In this paper we propose a new metric, called the Inverse Term Relevance Metric, aimed at assigning higher weighs to relevant terms contained into relevant image-blocks identified by performing a visual layout analysis. The traditional TFxIDF model is modified accordingly and used in the classification task. The effectiveness of this new metric has been validated using different classification algorithms, both supervised and unsupervised.

Archetti, F., Fersini, E., Giordani, I., Messina, V. (2007). Web Page Classification using Semantic Image–Blocks.

Web Page Classification using Semantic Image–Blocks

ARCHETTI, FRANCESCO ANTONIO;FERSINI, ELISABETTA;GIORDANI, ILARIA;MESSINA, VINCENZINA
2007

Abstract

We present a web document classification system based on the assumption that the images of a web page are those elements which mainly attract the attention of the user. This assumption implies that the text contained in the visual block in which an image is located, called semantic image-block, should contain relevant information about the page contents. In this paper we propose a new metric, called the Inverse Term Relevance Metric, aimed at assigning higher weighs to relevant terms contained into relevant image-blocks identified by performing a visual layout analysis. The traditional TFxIDF model is modified accordingly and used in the classification task. The effectiveness of this new metric has been validated using different classification algorithms, both supervised and unsupervised.
web, page, classification, semantic, imageblocks
English
set-2007
http://www.disco.unimib.it/upload/quaderni.2007-02.pdf
ISBN 978-88-548-1603-9 ISSN 1828-3357
Archetti, F., Fersini, E., Giordani, I., Messina, V. (2007). Web Page Classification using Semantic Image–Blocks.
none
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/13528
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact