It is well known that supervised text classification methods need to learn from many labeled examples to achieve a high accuracy. However, in a real context, sufficient labeled examples are not always available. In this paper we demonstrate that a way to obtain a high accuracy, when the number of labeled examples is low, is to consider structured features instead of list of weighted words as observed features. The proposed vector of features considers a hierarchical structure, named a mixed Graph of Terms, composed of a directed and an undirected sub-graph of words, that can be automatically constructed from a set of documents through the probabilistic Topic Model. © Springer-Verlag Berlin Heidelberg 2013.

Colace, F., De Santo, M., Greco, L., Napoletano, P. (2013). Learning to Classify Text Using a Few Labeled Examples. In Communications in Computer and Information Science (pp. 200-214). Springer Verlag [10.1007/978-3-642-37186-8_13].

Learning to Classify Text Using a Few Labeled Examples

NAPOLETANO, PAOLO
2013

Abstract

It is well known that supervised text classification methods need to learn from many labeled examples to achieve a high accuracy. However, in a real context, sufficient labeled examples are not always available. In this paper we demonstrate that a way to obtain a high accuracy, when the number of labeled examples is low, is to consider structured features instead of list of weighted words as observed features. The proposed vector of features considers a hierarchical structure, named a mixed Graph of Terms, composed of a directed and an undirected sub-graph of words, that can be automatically constructed from a set of documents through the probabilistic Topic Model. © Springer-Verlag Berlin Heidelberg 2013.
Capitolo o saggio
Probabilistic topic model; Term extraction; Text classification; Computer Science (all)
English
Communications in Computer and Information Science
9783642371851
Colace, F., De Santo, M., Greco, L., Napoletano, P. (2013). Learning to Classify Text Using a Few Labeled Examples. In Communications in Computer and Information Science (pp. 200-214). Springer Verlag [10.1007/978-3-642-37186-8_13].
Colace, F; De Santo, M; Greco, L; Napoletano, P
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/56745
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
Social impact