The self-organizing map (SOM) is a nonlinear machine learning algorithm that is particularly well suited for visualizing and analyzing high-dimensional, hyperspectral time-of-flight secondary ion mass spectrometry (ToF-SIMS) imaging data. Previously, we compared the capabilities of the SOM with more traditional linear techniques using ToF-SIMS imaging data. Although SOMs perform well with minimal data preprocessing and negligible hyperparameter optimization, it is important to understand how different data preprocessing methods and hyperparameter settings influence the performance of SOMs. While these investigations have been reported outside of the ToF-SIMS field, no such study has been reported for hyperspectral MSI data. To address this, we used two labeled ToF-SIMS imaging datasets, one of which was a polymer microarray dataset, while the other was semisynthetic hyperspectral data. The latter was generated using a novel algorithm that we describe here. A grid-search was used to evaluate which data preprocessing methods and SOM hyperparameters had the largest impact on the performance of the SOM. This was assessed using multiple linear regression, whereby performance metrics were regressed onto each variable defining the preprocessing-hyperparameter space. We found that preprocessing was generally more important than hyperparameter selection. We also found statistically significant interactions between several parameters studied, suggesting a complex interplay between preprocessing and hyperparameter selection. Importantly, we identified interesting trends, both dataset specific and dataset agnostic, which we describe and discuss in detail.

Gardner, W., Winkler, D., Alexander, D., Ballabio, D., Muir, B., Pigram, P. (2023). Effect of data preprocessing and machine learning hyperparameters on mass spectrometry imaging models. JOURNAL OF VACUUM SCIENCE & TECHNOLOGY. A. VACUUM, SURFACES, AND FILMS, 41(6 (December 2023)), 063204-1-063204-12 [10.1116/6.0002788].

Effect of data preprocessing and machine learning hyperparameters on mass spectrometry imaging models

Ballabio, D;
2023

Abstract

The self-organizing map (SOM) is a nonlinear machine learning algorithm that is particularly well suited for visualizing and analyzing high-dimensional, hyperspectral time-of-flight secondary ion mass spectrometry (ToF-SIMS) imaging data. Previously, we compared the capabilities of the SOM with more traditional linear techniques using ToF-SIMS imaging data. Although SOMs perform well with minimal data preprocessing and negligible hyperparameter optimization, it is important to understand how different data preprocessing methods and hyperparameter settings influence the performance of SOMs. While these investigations have been reported outside of the ToF-SIMS field, no such study has been reported for hyperspectral MSI data. To address this, we used two labeled ToF-SIMS imaging datasets, one of which was a polymer microarray dataset, while the other was semisynthetic hyperspectral data. The latter was generated using a novel algorithm that we describe here. A grid-search was used to evaluate which data preprocessing methods and SOM hyperparameters had the largest impact on the performance of the SOM. This was assessed using multiple linear regression, whereby performance metrics were regressed onto each variable defining the preprocessing-hyperparameter space. We found that preprocessing was generally more important than hyperparameter selection. We also found statistically significant interactions between several parameters studied, suggesting a complex interplay between preprocessing and hyperparameter selection. Importantly, we identified interesting trends, both dataset specific and dataset agnostic, which we describe and discuss in detail.
Articolo in rivista - Articolo scientifico
chemometrics; machine learning; mass spectrometry
English
20-set-2023
2023
41
6 (December 2023)
063204-1
063204-12
063204
open
Gardner, W., Winkler, D., Alexander, D., Ballabio, D., Muir, B., Pigram, P. (2023). Effect of data preprocessing and machine learning hyperparameters on mass spectrometry imaging models. JOURNAL OF VACUUM SCIENCE & TECHNOLOGY. A. VACUUM, SURFACES, AND FILMS, 41(6 (December 2023)), 063204-1-063204-12 [10.1116/6.0002788].
File in questo prodotto:
File Dimensione Formato  
Gardner-2023-J Vac Sci Technol A-VoR .pdf

accesso aperto

Descrizione: Research Article
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 3.02 MB
Formato Adobe PDF
3.02 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/439758
Citazioni
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
Social impact