Classification of high-dimensional spectroscopic data is a common task in analytical chemistry. Well-established procedures like support vector machines (SVMs) and partial least squares discriminant analysis (PLS-DA) are the most common methods for tackling this supervised learning problem. Nonetheless, interpretation of these models remains sometimes difficult, and solutions based on feature selection are often adopted as they lead to the automatic identification of the most informative wavelengths. Unfortunately, for some delicate applications like food authenticity, mislabeled and adulterated spectra occur both in the calibration and/or validation sets, with dramatic effects on the model development, its prediction accuracy and robustness. Motivated by these issues, the present paper proposes a robust model-based method that simultaneously performs variable selection, outliers and label noise detection. We demonstrate the effectiveness of our proposal in dealing with three agri-food spectroscopic studies, where several forms of perturbations are considered. Our approach succeeds in diminishing problem complexity, identifying anomalous spectra and attaining competitive predictive accuracy considering a very low number of selected wavelengths.

Cappozzo, A., Duponchel, L., Greselin, F., Murphy, T. (2021). Robust variable selection in the framework of classification with label noise and outliers: Applications to spectroscopic data in agri-food. ANALYTICA CHIMICA ACTA, 1153(8 April 2021) [10.1016/j.aca.2021.338245].

Robust variable selection in the framework of classification with label noise and outliers: Applications to spectroscopic data in agri-food

Cappozzo A.
Primo
;
Greselin F.;
2021

Abstract

Classification of high-dimensional spectroscopic data is a common task in analytical chemistry. Well-established procedures like support vector machines (SVMs) and partial least squares discriminant analysis (PLS-DA) are the most common methods for tackling this supervised learning problem. Nonetheless, interpretation of these models remains sometimes difficult, and solutions based on feature selection are often adopted as they lead to the automatic identification of the most informative wavelengths. Unfortunately, for some delicate applications like food authenticity, mislabeled and adulterated spectra occur both in the calibration and/or validation sets, with dramatic effects on the model development, its prediction accuracy and robustness. Motivated by these issues, the present paper proposes a robust model-based method that simultaneously performs variable selection, outliers and label noise detection. We demonstrate the effectiveness of our proposal in dealing with three agri-food spectroscopic studies, where several forms of perturbations are considered. Our approach succeeds in diminishing problem complexity, identifying anomalous spectra and attaining competitive predictive accuracy considering a very low number of selected wavelengths.
Articolo in rivista - Articolo scientifico
Agri-food; Label noise; Mid infrared spectroscopy; Near infrared spectroscopy; Outlier detection; Robust classification; Variable selection;
English
1-feb-2021
2021
1153
8 April 2021
338245
reserved
Cappozzo, A., Duponchel, L., Greselin, F., Murphy, T. (2021). Robust variable selection in the framework of classification with label noise and outliers: Applications to spectroscopic data in agri-food. ANALYTICA CHIMICA ACTA, 1153(8 April 2021) [10.1016/j.aca.2021.338245].
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0003267021000714-main-2.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Dimensione 5.11 MB
Formato Adobe PDF
5.11 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/306620
Citazioni
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 8
Social impact