Random sampling of the feasible region defined by knowledge-based and data-driven constraints is being increasingly employed for the analysis of metabolic networks. The aim is to identify a set of reactions that are used at a significantly different extent between two conditions of biological interest, such as physiological and pathological conditions. A reference constraint-based model incorporating knowledge-based constraints on reaction stoichiometry and a reasonable mass balance constraint is thus deferentially constrained for the two conditions according to different types of -omics data, such as transcriptomics and/or proteomics. The hypothesis that two samples randomly obtained from the two models come from the same distribution is then rejected/confirmed according to standard statistical tests. However, the impact of under-sampling on false discoveries has not been investigated so far. To this aim, we evaluated the presence of false discoveries by comparing samples obtained from the very same feasible region, for which the null hypothesis must be confirmed. We compared different sampling algorithms and sampling parameters. Our results indicate that established sampling convergence tests are not sufficient to prevent high false discovery rates. We propose some best practices to reduce the false discovery rate. We advocate the usage of the CHRR algorithm, a large value of the thinning parameter, and a threshold on the fold-change between the averages of the sampled flux values.

Galuzzi, B., Milazzo, L., Damiani, C. (2023). Best Practices in Flux Sampling of Constrained-Based Models. In Machine Learning, Optimization, and Data Science 8th International Conference, LOD 2022, Certosa di Pontignano, Italy, September 18–22, 2022, Revised Selected Papers, Part II (pp.234-248). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-25891-6_18].

Best Practices in Flux Sampling of Constrained-Based Models

Galuzzi B. G.;Milazzo L.;Damiani C.
2023

Abstract

Random sampling of the feasible region defined by knowledge-based and data-driven constraints is being increasingly employed for the analysis of metabolic networks. The aim is to identify a set of reactions that are used at a significantly different extent between two conditions of biological interest, such as physiological and pathological conditions. A reference constraint-based model incorporating knowledge-based constraints on reaction stoichiometry and a reasonable mass balance constraint is thus deferentially constrained for the two conditions according to different types of -omics data, such as transcriptomics and/or proteomics. The hypothesis that two samples randomly obtained from the two models come from the same distribution is then rejected/confirmed according to standard statistical tests. However, the impact of under-sampling on false discoveries has not been investigated so far. To this aim, we evaluated the presence of false discoveries by comparing samples obtained from the very same feasible region, for which the null hypothesis must be confirmed. We compared different sampling algorithms and sampling parameters. Our results indicate that established sampling convergence tests are not sufficient to prevent high false discovery rates. We propose some best practices to reduce the false discovery rate. We advocate the usage of the CHRR algorithm, a large value of the thinning parameter, and a threshold on the fold-change between the averages of the sampled flux values.
slide + paper
Constrained-based modelling; Flux sampling; Metabolic network;
English
8th International Conference on Machine Learning, Optimization, and Data Science, LOD 2022, held in conjunction with the 2nd Advanced Course and Symposium on Artificial Intelligence and Neuroscience, ACAIN 2022 - 18 September 2022 through 22 September 2022
2022
Nicosia, G; Ojha, V; La Malfa, E; La Malfa, G; Pardalos, P; Di Fatta, G; Giuffrida, G; Umeton, R
Machine Learning, Optimization, and Data Science 8th International Conference, LOD 2022, Certosa di Pontignano, Italy, September 18–22, 2022, Revised Selected Papers, Part II
978-3-031-25890-9
10-mar-2023
2023
13811 LNCS
234
248
reserved
Galuzzi, B., Milazzo, L., Damiani, C. (2023). Best Practices in Flux Sampling of Constrained-Based Models. In Machine Learning, Optimization, and Data Science 8th International Conference, LOD 2022, Certosa di Pontignano, Italy, September 18–22, 2022, Revised Selected Papers, Part II (pp.234-248). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-25891-6_18].
File in questo prodotto:
File Dimensione Formato  
Galluzzi-2023-Lect Notes Comput Sci-VoR.pdf

Solo gestori archivio

Descrizione: Conference Paper
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Tutti i diritti riservati
Dimensione 4.54 MB
Formato Adobe PDF
4.54 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/411724
Citazioni
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
Social impact