The extraction of named entities from court judgments is useful in several downstream applications, such as document anonymization and semantic search engines. In this paper, we discuss the application of named entity recognition and linking (NEEL) to extract entities from Italian civil court judgments. To develop and evaluate our work, we use a corpus of 146 manually annotated court judgments. We use a pipeline that combines a transformer-based Named Entity Recognition (NER) component, a transformer-based Named Entity Linking (NEL) component, and a NIL prediction component. While the NEL and NIL prediction components are not fine-tuned on domain-specific data, the NER component is fine-tuned on the annotated corpus. In addition, we compare different masked language modeling (MLM) adaptation strategies to optimize the result and investigate their impact. Results obtained on a 30-document test set reveal satisfactory performance, especially on the NER task, and emphasize challenges to improve NEEL on similar documents. Our code is available on GitHub.(https://github.com/rpo19/pozzi_aixia_2023. We are not allowed to publish sensitive data and the NER models trained on sensitive data.)
Pozzi, R., Rubini, R., Bernasconi, C., Palmonari, M. (2023). Named Entity Recognition and Linking for Entity Extraction from Italian Civil Judgements. In AIxIA 2023 – Advances in Artificial Intelligence XXIInd International Conference of the Italian Association for Artificial Intelligence, AIxIA 2023, Rome, Italy, November 6–9, 2023, Proceedings (pp.187-201). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-47546-7_13].
Named Entity Recognition and Linking for Entity Extraction from Italian Civil Judgements
Pozzi R.
;Palmonari M.
2023
Abstract
The extraction of named entities from court judgments is useful in several downstream applications, such as document anonymization and semantic search engines. In this paper, we discuss the application of named entity recognition and linking (NEEL) to extract entities from Italian civil court judgments. To develop and evaluate our work, we use a corpus of 146 manually annotated court judgments. We use a pipeline that combines a transformer-based Named Entity Recognition (NER) component, a transformer-based Named Entity Linking (NEL) component, and a NIL prediction component. While the NEL and NIL prediction components are not fine-tuned on domain-specific data, the NER component is fine-tuned on the annotated corpus. In addition, we compare different masked language modeling (MLM) adaptation strategies to optimize the result and investigate their impact. Results obtained on a 30-document test set reveal satisfactory performance, especially on the NER task, and emphasize challenges to improve NEEL on similar documents. Our code is available on GitHub.(https://github.com/rpo19/pozzi_aixia_2023. We are not allowed to publish sensitive data and the NER models trained on sensitive data.)File | Dimensione | Formato | |
---|---|---|---|
Pozzi-2023-AIxIA-AAM.pdf
accesso aperto
Descrizione: AAM del paper
Tipologia di allegato:
Author’s Accepted Manuscript, AAM (Post-print)
Licenza:
Altro
Dimensione
308.16 kB
Formato
Adobe PDF
|
308.16 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.