Bicocca Open Archive

Supervised text classifiers need to learn from many labeled examples to achieve a high accuracy. However, in a real context, sufficient labeled examples are not always available because human labeling is enormously time-consuming. For this reason, there has been recent interest in methods that are capable of obtaining a high accuracy when the size of the training set is small. In this paper we introduce a new single label text classification method that performs better than baseline methods when the number of labeled examples is small. Differently from most of the existing methods that usually make use of a vector of features composed of weighted words, the proposed approach uses a structured vector of features, composed of weighted pairs of words. The proposed vector of features is automatically learned, given a set of documents, using a global method for term extraction based on the Latent Dirichlet Allocation implemented as the Probabilistic Topic Model. Experiments performed using a small percentage of the original training set (about 1%) confirmed our theories. © 2013 Elsevier Ltd. All rights reserved.

Colace, F., De Santo, M., Greco, L., Napoletano, P. (2014). Text classification using a few labeled examples. COMPUTERS IN HUMAN BEHAVIOR, 30, 689-697 [10.1016/j.chb.2013.07.043].

Text classification using a few labeled examples

Colace, F;De Santo, M;Greco, L;NAPOLETANO, PAOLO^Ultimo

2014

Abstract

Supervised text classifiers need to learn from many labeled examples to achieve a high accuracy. However, in a real context, sufficient labeled examples are not always available because human labeling is enormously time-consuming. For this reason, there has been recent interest in methods that are capable of obtaining a high accuracy when the size of the training set is small. In this paper we introduce a new single label text classification method that performs better than baseline methods when the number of labeled examples is small. Differently from most of the existing methods that usually make use of a vector of features composed of weighted words, the proposed approach uses a structured vector of features, composed of weighted pairs of words. The proposed vector of features is automatically learned, given a set of documents, using a global method for term extraction based on the Latent Dirichlet Allocation implemented as the Probabilistic Topic Model. Experiments performed using a small percentage of the original training set (about 1%) confirmed our theories. © 2013 Elsevier Ltd. All rights reserved.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Data mining; Model; Probabilistic topic; Term extraction; Text classification; Text mining; Human-Computer Interaction; Psychology (all); Arts and Humanities (miscellaneous)
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				2014
			
	Rivista
	
				COMPUTERS IN HUMAN BEHAVIOR
			
	Numero del volume
	
				30
			
	Pagina iniziale
	
				689
			
	Pagina finale
	
				697
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1016/j.chb.2013.07.043
			
	Fulltext
	
				reserved
			
	Citazione
	
				Colace, F., De Santo, M., Greco, L., Napoletano, P. (2014). Text classification using a few labeled examples. COMPUTERS IN HUMAN BEHAVIOR, 30, 689-697 [10.1016/j.chb.2013.07.043].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
10.pdf Solo gestori archivio Descrizione: Articolo principale Dimensione 889.75 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	889.75 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/56730

Citazioni

44

37

Social impact