Bicocca Open Archive

Entity extraction is a crucial step in constructing Knowledge Graphs (KGs) from natural language text. In the scientific domain, Named Entity Recognition (NER) is widely used to analyze research papers and facilitate the generation of knowledge graphs that capture research concepts. Given the vast scale of contemporary research output, this task necessitates automated pipelines to maintain efficiency while ensuring the quality of the extracted knowledge. Large Language Models (LLMs) present a promising solution to this challenge. As such, this paper explores the effectiveness of LLMs for NER in scientific texts, using the SciERC dataset as a benchmark. Specifically, it evaluates different LLM architectures, including encoder-only, decoder-only, and encoder-decoder models, to identify the most effective approach for NER in the computer science domain. By examining the strengths and limitations of each model type, this study aims to provide deeper insights into the applicability of LLMs for entity extraction, ultimately improving the construction of domain-specific KGs.

Buscaldi, D., Dessi, D., Osborne, F., Piras, D., Recupero, D. (2025). Evaluating LLMs for Named Entity Recognition in Scientific Domain with Fine-Tuning and Few-Shot Learning. In Third International Workshop on Semantic Technologies and Deep Learning Models for Scientific, Technical and Legal Data (SemTech4STLD 2025) co-located with Extended Semantic Web Conference 2025 (ESWC 2025). CEUR-WS.

Evaluating LLMs for Named Entity Recognition in Scientific Domain with Fine-Tuning and Few-Shot Learning

Buscaldi D.;Dessi D.;Osborne F.;Piras D.;Recupero D. R.

2025

Abstract

Entity extraction is a crucial step in constructing Knowledge Graphs (KGs) from natural language text. In the scientific domain, Named Entity Recognition (NER) is widely used to analyze research papers and facilitate the generation of knowledge graphs that capture research concepts. Given the vast scale of contemporary research output, this task necessitates automated pipelines to maintain efficiency while ensuring the quality of the extracted knowledge. Large Language Models (LLMs) present a promising solution to this challenge. As such, this paper explores the effectiveness of LLMs for NER in scientific texts, using the SciERC dataset as a benchmark. Specifically, it evaluates different LLM architectures, including encoder-only, decoder-only, and encoder-decoder models, to identify the most effective approach for NER in the computer science domain. By examining the strengths and limitations of each model type, this study aims to provide deeper insights into the applicability of LLMs for entity extraction, ultimately improving the construction of domain-specific KGs.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Knowledge Graph Construction; Large Language Models; Named Entity Recognition; Scholarly Domain;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				3rd International Workshop on Semantic Technologies and Deep Learning Models for Scientific, Technical and Legal Data, SemTech4STLD 2025 - June 1st, 2025
			
	Anno del convegno
	
				2025
			
	Curatori della monografia
	
				Dessi, R; Jeenu, J; Dessi, D; Osborne, F; Aras, H
			
	Titolo degli atti
	
				Third International Workshop on Semantic Technologies and Deep Learning Models for Scientific, Technical and Legal Data (SemTech4STLD 2025) co-located with Extended Semantic Web Conference 2025 (ESWC 2025)
			
	Collana o serie
	
				CEUR WORKSHOP PROCEEDINGS
			
	Data di pubblicazione
	
				2025
			
	Numero del volume
	
				3979
			
	URL alternativo
	
				https://ceur-ws.org/Vol-3979/
			
	Fulltext
	
				open
			
	Citazione
	
				Buscaldi, D., Dessi, D., Osborne, F., Piras, D., Recupero, D. (2025). Evaluating LLMs for Named Entity Recognition in Scientific Domain with Fine-Tuning and Few-Shot Learning. In Third International Workshop on Semantic Technologies and Deep Learning Models for Scientific, Technical and Legal Data (SemTech4STLD 2025) co-located with Extended Semantic Web Conference 2025 (ESWC 2025). CEUR-WS.
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Buscaldi et al-2025-SemTech4STLD-CEUR-VoR.pdf accesso aperto Descrizione: Evaluating LLMs for Named Entity Recognition in Scientific Domain with Fine-Tuning and Few-Shot Learning Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 293.48 kB Formato Adobe PDF Visualizza/Apri	293.48 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/567744

Citazioni

0

ND

Social impact