Bicocca Open Archive

The use of online services for advertising job positions has grown in the last decade, thanks to the ability of Online Job Advertisements (OJAs) to observe the labour market in near real-time, predict new occupation trends, identify relevant skills, and support policy and decision-making activities. Unsurprisingly, 2023 was declared the Year of Skills by the EU, as skill mismatch is a key challenge for European economies. In such a scenario, machine learning-based approaches have played a key role in classifying job ads and extracting skills according to well-established taxonomies. However, the effectiveness of ML depends on access to annotated job advertisement datasets, which are often limited and require time-consuming manual annotation. The lack of OJA annotated benchmarks representative of the real online OJA and skills distributions is currently limiting advances in skill intelligence. To deal with this, we propose JobGen, which leverages Large Language Models (LLMs) to generate synthetic OJAs. We use real OJAs collected from an EU project and the ESCO taxonomy to represent job market distributions accurately. JobGen enhances data diversity and semantic alignment, addressing common issues in synthetic data generation. The resulting dataset, JobSet, provides a valuable resource for tasks like skill extraction and job matching and is openly available to the community.

Colombo, S., D'Amico, S., Malandri, L., Mercorio, F., Seveso, A. (2025). JobSet: Synthetic Job Advertisements Dataset for Labour Market Intelligence. In SAC '25: Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing (pp.928-935). Association for Computing Machinery [10.1145/3672608.3707718].

JobSet: Synthetic Job Advertisements Dataset for Labour Market Intelligence

Colombo, Samuele;D'Amico, Simone;Malandri, Lorenzo;Mercorio, Fabio;Seveso, Andrea

2025

Abstract

The use of online services for advertising job positions has grown in the last decade, thanks to the ability of Online Job Advertisements (OJAs) to observe the labour market in near real-time, predict new occupation trends, identify relevant skills, and support policy and decision-making activities. Unsurprisingly, 2023 was declared the Year of Skills by the EU, as skill mismatch is a key challenge for European economies. In such a scenario, machine learning-based approaches have played a key role in classifying job ads and extracting skills according to well-established taxonomies. However, the effectiveness of ML depends on access to annotated job advertisement datasets, which are often limited and require time-consuming manual annotation. The lack of OJA annotated benchmarks representative of the real online OJA and skills distributions is currently limiting advances in skill intelligence. To deal with this, we propose JobGen, which leverages Large Language Models (LLMs) to generate synthetic OJAs. We use real OJAs collected from an EU project and the ESCO taxonomy to represent job market distributions accurately. JobGen enhances data diversity and semantic alignment, addressing common issues in synthetic data generation. The resulting dataset, JobSet, provides a valuable resource for tasks like skill extraction and job matching and is openly available to the community.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				benchmark; labour market; large language models;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				SAC '25: Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing - 31 March 2025- 4 April 2025
			
	Anno del convegno
	
				2025
			
	Titolo degli atti
	
				SAC '25: Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing
			
	ISBN del volume degli atti
	
				9798400706295
			
	Data di pubblicazione
	
				2025
			
	Pagina iniziale
	
				928
			
	Pagina finale
	
				935
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1145/3672608.3707718
			
	Fulltext
	
				none
			
	Citazione
	
				Colombo, S., D'Amico, S., Malandri, L., Mercorio, F., Seveso, A. (2025). JobSet: Synthetic Job Advertisements Dataset for Labour Market Intelligence. In SAC '25: Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing (pp.928-935). Association for Computing Machinery [10.1145/3672608.3707718].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/555062

Citazioni

1

1

Social impact