Bicocca Open Archive

The present paper provides an overall framework to afford the problem of non-representativeness and non-random selectivity arising from online job ads data, using Generalized sample selection models and Eurostat benchmark data. We jointly model the outcome intensity (number of online job ads in observed profiles, whose levels are defined by auxiliary variables) and the probability of endogenous selection (likelihood that online job ads are not missing in a given profile), allowing us to model the missing data mechanism without the need of a priori justification of missingness at random, as generally supposed by multilevel regression and post-stratification, a popular benchmark technique in this field. Moreover, we offer new post-stratification strategies to calibrate the unconditional predictions on benchmark/reference samples. We use data from the Cedefop's Skill Ovate platform collecting online job advertisements for all EU regions in 2022 and an Italian web-platform during 2013Q2-2018Q2, whereas as reference samples, aggregated LFS recent job starters and LFS new hires from microdata that represent reasonable lower bounds for job advertisements. Online job ads present a strong overrepresentation with respect to benchmark data (+40% with respect to LFS recent job starters and +400% over new hires from LFS microdata), whereas generalized sample selection models reduced this bias by half, unlike Multilevel post-stratification and other univariate approaches, which furthermore resulted in bias.

Lovaglio, P., Mezzanzanica, M. (2026). Analyzing Non-Random Selectivity in Online Job Advertisements Using Eurostat Benchmark Data and Generalized Sample Selection Models: An Application to EU Regional Labor Markets. LABOUR, 40(2 (June 2026)), 131-161 [10.1111/labr.70008].

Analyzing Non-Random Selectivity in Online Job Advertisements Using Eurostat Benchmark Data and Generalized Sample Selection Models: An Application to EU Regional Labor Markets

Lovaglio, Pietro Giorgio;Mezzanzanica, Mario

2026

Abstract

The present paper provides an overall framework to afford the problem of non-representativeness and non-random selectivity arising from online job ads data, using Generalized sample selection models and Eurostat benchmark data. We jointly model the outcome intensity (number of online job ads in observed profiles, whose levels are defined by auxiliary variables) and the probability of endogenous selection (likelihood that online job ads are not missing in a given profile), allowing us to model the missing data mechanism without the need of a priori justification of missingness at random, as generally supposed by multilevel regression and post-stratification, a popular benchmark technique in this field. Moreover, we offer new post-stratification strategies to calibrate the unconditional predictions on benchmark/reference samples. We use data from the Cedefop's Skill Ovate platform collecting online job advertisements for all EU regions in 2022 and an Italian web-platform during 2013Q2-2018Q2, whereas as reference samples, aggregated LFS recent job starters and LFS new hires from microdata that represent reasonable lower bounds for job advertisements. Online job ads present a strong overrepresentation with respect to benchmark data (+40% with respect to LFS recent job starters and +400% over new hires from LFS microdata), whereas generalized sample selection models reduced this bias by half, unlike Multilevel post-stratification and other univariate approaches, which furthermore resulted in bias.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Eurostat benchmark data; Labor Force Survey; missingness mechanism; multilevel modeling; online job advertisements; post-stratification; representativeness; sample selection models;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				24-gen-2026
			
	Data di pubblicazione
	
				2026
			
	Rivista
	
				LABOUR
			
	Numero del volume
	
				40
			
	Fascicolo
	
				2 (June 2026)
			
	Pagina iniziale
	
				131
			
	Pagina finale
	
				161
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1111/labr.70008
			
	Fulltext
	
				open
			
	Citazione
	
				Lovaglio, P., Mezzanzanica, M. (2026). Analyzing Non-Random Selectivity in Online Job Advertisements Using Eurostat Benchmark Data and Generalized Sample Selection Models: An Application to EU Regional Labor Markets. LABOUR, 40(2 (June 2026)), 131-161 [10.1111/labr.70008].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Lovaglio et al-2026-Labour-VoR.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 747.77 kB Formato Adobe PDF Visualizza/Apri	747.77 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/589417

Citazioni

1

0

Social impact