In recent years, natural language processing (NLP) technologies have made a significant contribution in addressing a number of labour market tasks. One of the most interesting challenges is the automatic extraction of competences from unstructured texts. This paper presents a pipeline for efficiently extracting and standardizing skills from job advertisements using NLP techniques. The proposed methodology leverages open-source Transformer and Large Language Models to extract skills and map them to the European labour market taxonomy, ESCO.To address the computational challenges of processing lengthy job advertisements, a BERT model was fine-tuned to identify text segments likely containing skills. This filtering step reduces noise and ensures that only relevant content is processed further. The filtered text is then passed to an LLM, which extracts implicit and explicit hard and soft skills through prompt engineering. The extracted skills are subsequently matched with entries in a vector store containing the ESCO taxonomy to achieve standardization. Evaluation by domain experts shows that the pipeline achieves a precision of 91% for skill extraction, 80% for skill standardization and a combined overall precision of 79%. These results demonstrate the effectiveness of the proposed approach in facilitating structured and standardized skill extraction from job postings.

Malandri, L., Mercorio, F., Serino, A. (2025). SkiLLMo: Normalized ESCO Skill Extraction through Transformer Models. In SAC '25: Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing (pp.1969-1976). Association for Computing Machinery [10.1145/3672608.3707960].

SkiLLMo: Normalized ESCO Skill Extraction through Transformer Models

Malandri, Lorenzo;Mercorio, Fabio;Serino, Antonio
2025

Abstract

In recent years, natural language processing (NLP) technologies have made a significant contribution in addressing a number of labour market tasks. One of the most interesting challenges is the automatic extraction of competences from unstructured texts. This paper presents a pipeline for efficiently extracting and standardizing skills from job advertisements using NLP techniques. The proposed methodology leverages open-source Transformer and Large Language Models to extract skills and map them to the European labour market taxonomy, ESCO.To address the computational challenges of processing lengthy job advertisements, a BERT model was fine-tuned to identify text segments likely containing skills. This filtering step reduces noise and ensures that only relevant content is processed further. The filtered text is then passed to an LLM, which extracts implicit and explicit hard and soft skills through prompt engineering. The extracted skills are subsequently matched with entries in a vector store containing the ESCO taxonomy to achieve standardization. Evaluation by domain experts shows that the pipeline achieves a precision of 91% for skill extraction, 80% for skill standardization and a combined overall precision of 79%. These results demonstrate the effectiveness of the proposed approach in facilitating structured and standardized skill extraction from job postings.
paper
information extraction; labor market; large language models; skill extraction; transformer models;
English
SAC '25: Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing - 31 March 2025- 4 April 2025
2025
SAC '25: Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing
9798400706295
2025
1969
1976
none
Malandri, L., Mercorio, F., Serino, A. (2025). SkiLLMo: Normalized ESCO Skill Extraction through Transformer Models. In SAC '25: Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing (pp.1969-1976). Association for Computing Machinery [10.1145/3672608.3707960].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/555064
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
Social impact