Bicocca Open Archive

We present a web document classification system based on the assumption that the images of a web page are those elements which mainly attract the attention of the user. This assumption implies that the text contained in the visual block in which an image is located, called semantic image-block, should contain relevant information about the page contents. In this paper we propose a new metric, called the Inverse Term Relevance Metric, aimed at assigning higher weighs to relevant terms contained into relevant image-blocks identified by performing a visual layout analysis. The traditional TFxIDF model is modified accordingly and used in the classification task. The effectiveness of this new metric has been validated using different classification algorithms, both supervised and unsupervised.

Archetti, F., Fersini, E., Giordani, I., Messina, V. (2007). Web Page Classification using Semantic Image–Blocks.

Web Page Classification using Semantic Image–Blocks

ARCHETTI, FRANCESCO ANTONIO;FERSINI, ELISABETTA;GIORDANI, ILARIA;MESSINA, VINCENZINA

2007

Abstract

We present a web document classification system based on the assumption that the images of a web page are those elements which mainly attract the attention of the user. This assumption implies that the text contained in the visual block in which an image is located, called semantic image-block, should contain relevant information about the page contents. In this paper we propose a new metric, called the Inverse Term Relevance Metric, aimed at assigning higher weighs to relevant terms contained into relevant image-blocks identified by performing a visual layout analysis. The traditional TFxIDF model is modified accordingly and used in the classification task. The effectiveness of this new metric has been validated using different classification algorithms, both supervised and unsupervised.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				web, page, classification, semantic, imageblocks
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				set-2007
			
	URL alternativo
	
				http://www.disco.unimib.it/upload/quaderni.2007-02.pdf
			
	Altre informazioni significative
	
				ISBN 978-88-548-1603-9
ISSN 1828-3357
			
	Citazione
	
				Archetti, F., Fersini, E., Giordani, I., Messina, V. (2007). Web Page Classification using Semantic Image–Blocks.
			
	Fulltext
	
				none
			
	Appare nelle tipologie:
	
				99 - Altro

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/13528

Citazioni

ND

ND

Social impact