Missing but Not Missed: On Learnability Under Imputation

Campagner, A

doi:10.1007/978-3-032-06078-5_20

Missing data represents one of the most ubiquitous data quality issues, and also one of the most impactful on machine learning (ML) pipelines. Indeed, not only most commonly applied ML methods cannot directly employ incomplete data, but also the techniques employed to manage this issue can impact on the performance and evaluation of ML models. Among such techniques to manage missing data, imputation, that is filling in the missing values using information from the observed data, remains among the most popular and effective in practice. Yet, from a theoretical point of view, it is still not clear under which conditions it is possible to learn effectively after imputation. In this article we address this gap by studying learnability under imputation in the framework of statistical learning theory. After giving a general definition of learnability under imputation, we show three main contributions: 1) we introduce a novel stability condition, called noise risk stability, which we prove to be both sufficient and, under weak assumptions, necessary for learnability under imputation; 2) we show that a large class of ML models (including linear and kernel methods) satisfies noise risk stability; 3) we characterize the learning-theoretic properties of two common imputation methods (constant and regression imputation). Our results set the stage for a rigorous study of imputation and missing data management in the framework of statistical learning theory, by also describing relevant open questions.

Campagner, A. (2026). Missing but Not Missed: On Learnability Under Imputation. In Machine Learning and Knowledge Discovery in Databases. Research Track European Conference, ECML PKDD 2025, Porto, Portugal, September 15–19, 2025, Proceedings, Part IV (pp.344-361). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-032-06078-5_20].

Missing but Not Missed: On Learnability Under Imputation

Campagner A.^Primo

2026

Abstract

Missing data represents one of the most ubiquitous data quality issues, and also one of the most impactful on machine learning (ML) pipelines. Indeed, not only most commonly applied ML methods cannot directly employ incomplete data, but also the techniques employed to manage this issue can impact on the performance and evaluation of ML models. Among such techniques to manage missing data, imputation, that is filling in the missing values using information from the observed data, remains among the most popular and effective in practice. Yet, from a theoretical point of view, it is still not clear under which conditions it is possible to learn effectively after imputation. In this article we address this gap by studying learnability under imputation in the framework of statistical learning theory. After giving a general definition of learnability under imputation, we show three main contributions: 1) we introduce a novel stability condition, called noise risk stability, which we prove to be both sufficient and, under weak assumptions, necessary for learnability under imputation; 2) we show that a large class of ML models (including linear and kernel methods) satisfies noise risk stability; 3) we characterize the learning-theoretic properties of two common imputation methods (constant and regression imputation). Our results set the stage for a rigorous study of imputation and missing data management in the framework of statistical learning theory, by also describing relevant open questions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Imputation; Learnability; Missing Data; Statistical Learning Theory;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				European Conference, ECML PKDD 2025 - September 15–19, 2025
			
	Anno del convegno
	
				2025
			
	Curatori della monografia
	
				Ribeiro, RP; Pfahringer, B; Japkowicz, N; Larrañaga, P; Jorge, AM; Soares, C; Abreu, PH; Gama, J
			
	Titolo degli atti
	
				Machine Learning and Knowledge Discovery in Databases. Research Track
European Conference, ECML PKDD 2025, Porto, Portugal, September 15–19, 2025, Proceedings, Part IV
			
	ISBN del volume degli atti
	
				9783032060778
			
	Collana o serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	Data ahead of print o Data prima pubblicazione Online
	
				30-set-2025
			
	Data di pubblicazione
	
				2026
			
	Numero del volume
	
				16016 LNCS
			
	Pagina iniziale
	
				344
			
	Pagina finale
	
				361
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1007/978-3-032-06078-5_20
			
	Fulltext
	
				reserved
			
	Citazione
	
				Campagner, A. (2026). Missing but Not Missed: On Learnability Under Imputation. In Machine Learning and Knowledge Discovery in Databases. Research Track
European Conference, ECML PKDD 2025, Porto, Portugal, September 15–19, 2025, Proceedings, Part IV (pp.344-361). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-032-06078-5_20].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Campagner-2026-ECML PKDD 2025-VoR.pdf Solo gestori archivio Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Tutti i diritti riservati Dimensione 688.27 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	688.27 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/574841

Citazioni

0

ND

Bicocca Open Archive

Missing but Not Missed: On Learnability Under Imputation

Campagner A.^Primo

Primo

2026

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

Social impact

Bicocca Open Archive

Missing but Not Missed: On Learnability Under Imputation

Campagner A. Primo

Primo

2026

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Citazioni

Social impact

Conferma cancellazione

Campagner A.^Primo

Scheda breve

Scheda completa

Scheda completa (DC)