Much of the valuable information in supporting decision making processes originates in text-based documents. Although these documents can be effectively searched and ranked by modern search engines, actionable knowledge need to be extracted and transformed in a structured form before being used in a decision process. In this paper we describe how the discovery of semantic information embedded in natural language documents can be viewed as an optimization problem aimed at assigning a sequence of labels (hidden states) to a set of interdependent variables (textual tokens). Dependencies among variables are efficiently modeled through Conditional Random Fields, an indirected graphical model able to represent the distribution of labels given a set of observations. The Markov property of these models prevent them to take into account long-range dependencies among variables, which are indeed relevant in Natural Language Processing. In order to overcome this limitation we propose an inference method based on Integer Programming formulation of the problem, where long distance dependencies are included through non-deterministic soft constraints. © 2014 Elsevier Ltd. All rights reserved.

Fersini, E., Messina, V., Felici, G., Roth, D. (2014). Soft-constrained inference for Named Entity Recognition. INFORMATION PROCESSING & MANAGEMENT, 50(5), 807-819 [10.1016/j.ipm.2014.04.005].

Soft-constrained inference for Named Entity Recognition

FERSINI, ELISABETTA
;
MESSINA, VINCENZINA
Secondo
;
2014

Abstract

Much of the valuable information in supporting decision making processes originates in text-based documents. Although these documents can be effectively searched and ranked by modern search engines, actionable knowledge need to be extracted and transformed in a structured form before being used in a decision process. In this paper we describe how the discovery of semantic information embedded in natural language documents can be viewed as an optimization problem aimed at assigning a sequence of labels (hidden states) to a set of interdependent variables (textual tokens). Dependencies among variables are efficiently modeled through Conditional Random Fields, an indirected graphical model able to represent the distribution of labels given a set of observations. The Markov property of these models prevent them to take into account long-range dependencies among variables, which are indeed relevant in Natural Language Processing. In order to overcome this limitation we propose an inference method based on Integer Programming formulation of the problem, where long distance dependencies are included through non-deterministic soft constraints. © 2014 Elsevier Ltd. All rights reserved.
Articolo in rivista - Articolo scientifico
Conditional Random Fields; Integer linear programming; Named Entity Recognition; Rule extraction; Media Technology; Information Systems; Computer Science Applications1707 Computer Vision and Pattern Recognition; Library and Information Sciences; Management Science and Operations Research
English
2014
50
5
807
819
none
Fersini, E., Messina, V., Felici, G., Roth, D. (2014). Soft-constrained inference for Named Entity Recognition. INFORMATION PROCESSING & MANAGEMENT, 50(5), 807-819 [10.1016/j.ipm.2014.04.005].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/59555
Citazioni
  • Scopus 21
  • ???jsp.display-item.citation.isi??? 18
Social impact