Random sampling of the feasible region defined by knowledge-based and data-driven constraints is being increasingly employed for the analysis of metabolic networks. The aim is to identify a set of reactions that are used at a significantly different extent between two conditions of biological interest, such as physiological and pathological conditions. A reference constraint-based model incorporating knowledge-based constraints on reaction stoichiometry and a reasonable mass balance constraint is thus deferentially constrained for the two conditions according to different types of -omics data, such as transcriptomics and/or proteomics. The hypothesis that two samples randomly obtained from the two models come from the same distribution is then rejected/confirmed according to standard statistical tests. However, the impact of under-sampling on false discoveries has not been investigated so far. To this aim, we evaluated the presence of false discoveries by comparing samples obtained from the very same feasible region, for which the null hypothesis must be confirmed. We compared different sampling algorithms and sampling parameters. Our results indicate that established sampling convergence tests are not sufficient to prevent high false discovery rates. We propose some best practices to reduce the false discovery rate. We advocate the usage of the CHRR algorithm, a large value of the thinning parameter, and a threshold on the fold-change between the averages of the sampled flux values.
Galuzzi, B., Milazzo, L., Damiani, C. (2023). Best Practices in Flux Sampling of Constrained-Based Models. In Machine Learning, Optimization, and Data Science 8th International Conference, LOD 2022, Certosa di Pontignano, Italy, September 18–22, 2022, Revised Selected Papers, Part II (pp.234-248). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-25891-6_18].
Best Practices in Flux Sampling of Constrained-Based Models
Galuzzi B. G.;Milazzo L.;Damiani C.
2023
Abstract
Random sampling of the feasible region defined by knowledge-based and data-driven constraints is being increasingly employed for the analysis of metabolic networks. The aim is to identify a set of reactions that are used at a significantly different extent between two conditions of biological interest, such as physiological and pathological conditions. A reference constraint-based model incorporating knowledge-based constraints on reaction stoichiometry and a reasonable mass balance constraint is thus deferentially constrained for the two conditions according to different types of -omics data, such as transcriptomics and/or proteomics. The hypothesis that two samples randomly obtained from the two models come from the same distribution is then rejected/confirmed according to standard statistical tests. However, the impact of under-sampling on false discoveries has not been investigated so far. To this aim, we evaluated the presence of false discoveries by comparing samples obtained from the very same feasible region, for which the null hypothesis must be confirmed. We compared different sampling algorithms and sampling parameters. Our results indicate that established sampling convergence tests are not sufficient to prevent high false discovery rates. We propose some best practices to reduce the false discovery rate. We advocate the usage of the CHRR algorithm, a large value of the thinning parameter, and a threshold on the fold-change between the averages of the sampled flux values.File | Dimensione | Formato | |
---|---|---|---|
Galluzzi-2023-Lect Notes Comput Sci-VoR.pdf
Solo gestori archivio
Descrizione: Conference Paper
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Tutti i diritti riservati
Dimensione
4.54 MB
Formato
Adobe PDF
|
4.54 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.