Named entity extraction is a crucial task to support the population of Knowledge Bases (KBs) from documents written in natural language. However, in many application domains, these documents must be collected and processed incrementally to update the KB as more data are ingested. In some cases, quality concerns may even require human validation mechanisms along the process. While very recent work in the NLP community has discussed the importance of evaluating and benchmarking continuous entity extraction, it has proposed methods and datasets that avoid Named Entity Linking (NEL) as a component of the extraction process. In this paper, we advocate for batch-based incremental entity extraction methods that can exploit NEL with a background KB, detect mentions of entities that are not present in the KB yet (NIL mentions), and update the KB with the novel entities. Based on this assumption, we present a methodology to evaluate NEL-based incremental entity extraction, which can be applied to a “static” dataset for evaluating NEL into a dataset for evaluating incremental entity extraction. We apply this methodology to an existing benchmark for evaluating NEL algorithms, and evaluate an incremental extraction pipeline that orchestrates different strong state-of-the-art and baseline algorithms for the tasks involved in the extraction process, namely, NEL, NIL prediction, and NIL clustering. In presenting our experiments, we demonstrate the increased difficulty of the information extraction task in incremental settings and discuss the strengths of the available solutions as well as open challenges.

Pozzi, R., Moiraghi Motta, F., Lodi, F., Palmonari, M. (2022). Evaluation of Incremental Entity Extraction with Background Knowledge and Entity Linking. In IJCKG '22: Proceedings of the 11th International Joint Conference on Knowledge Graphs (pp.30-38). New York, NY : Association for Computing Machinery [10.1145/3579051.3579063].

Evaluation of Incremental Entity Extraction with Background Knowledge and Entity Linking

Pozzi, R;Moiraghi Motta, F;Lodi, F;Palmonari, M
2022

Abstract

Named entity extraction is a crucial task to support the population of Knowledge Bases (KBs) from documents written in natural language. However, in many application domains, these documents must be collected and processed incrementally to update the KB as more data are ingested. In some cases, quality concerns may even require human validation mechanisms along the process. While very recent work in the NLP community has discussed the importance of evaluating and benchmarking continuous entity extraction, it has proposed methods and datasets that avoid Named Entity Linking (NEL) as a component of the extraction process. In this paper, we advocate for batch-based incremental entity extraction methods that can exploit NEL with a background KB, detect mentions of entities that are not present in the KB yet (NIL mentions), and update the KB with the novel entities. Based on this assumption, we present a methodology to evaluate NEL-based incremental entity extraction, which can be applied to a “static” dataset for evaluating NEL into a dataset for evaluating incremental entity extraction. We apply this methodology to an existing benchmark for evaluating NEL algorithms, and evaluate an incremental extraction pipeline that orchestrates different strong state-of-the-art and baseline algorithms for the tasks involved in the extraction process, namely, NEL, NIL prediction, and NIL clustering. In presenting our experiments, we demonstrate the increased difficulty of the information extraction task in incremental settings and discuss the strengths of the available solutions as well as open challenges.
paper
Incremental Entity Extraction, Entity Extraction, Knowledge Base Population, Named Entity Linking
English
The 11th International Joint Conference on Knowledge Graphs (IJCKG’22)
2022
Artale, A; Calvanese, D; Wang, H; Zhang, X
IJCKG '22: Proceedings of the 11th International Joint Conference on Knowledge Graphs
978-1-4503-9987-6
2022
11
30
38
http://ijckg.org/2022/papers/IJCKG_2022_paper_3501.pdf
open
Pozzi, R., Moiraghi Motta, F., Lodi, F., Palmonari, M. (2022). Evaluation of Incremental Entity Extraction with Background Knowledge and Entity Linking. In IJCKG '22: Proceedings of the 11th International Joint Conference on Knowledge Graphs (pp.30-38). New York, NY : Association for Computing Machinery [10.1145/3579051.3579063].
File in questo prodotto:
File Dimensione Formato  
Pozzi-2022-IJCKG2022-VoR.pdf

accesso aperto

Descrizione: Intervento a convegno
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Altro
Dimensione 743.52 kB
Formato Adobe PDF
743.52 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/423118
Citazioni
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
Social impact