Motivated by the challenges in analyzing gut microbiome and metagenomic data, this paper introduces a novel mixture distribution for multivariate counts and a regression model built upon it. The flexibility and interpretability of the proposed distribution accommodate both negative and positive dependence among taxa and are accompanied by numerous theoretical properties, including explicit expressions for inter- and intraclass correlations, thereby providing a powerful tool for understanding complex microbiome interactions. Furthermore, the regression model based on this distribution facilitates the clear identification and interpretation of relationships between taxa and covariates by modeling the marginal mean of the multivariate response (i.e., taxa counts). Inference is performed using a tailored Hamiltonian Monte Carlo estimation method combined with a spike-and-slab variable selection procedure. Extensive simulation studies and an application to a human gut microbiome dataset highlight the proposed model's substantial improvements over competing models in terms of fit, interpretability, and predictive performance.
Ascari, R., Migliorati, S., Ongaro, A. (2025). A New Dirichlet-Multinomial Mixture Regression Model for the Analysis of Microbiome Data. STATISTICS IN MEDICINE, 44(18-19 (August 2025)) [10.1002/sim.70220].
A New Dirichlet-Multinomial Mixture Regression Model for the Analysis of Microbiome Data
Ascari, Roberto
;Migliorati, Sonia;Ongaro, Andrea
2025
Abstract
Motivated by the challenges in analyzing gut microbiome and metagenomic data, this paper introduces a novel mixture distribution for multivariate counts and a regression model built upon it. The flexibility and interpretability of the proposed distribution accommodate both negative and positive dependence among taxa and are accompanied by numerous theoretical properties, including explicit expressions for inter- and intraclass correlations, thereby providing a powerful tool for understanding complex microbiome interactions. Furthermore, the regression model based on this distribution facilitates the clear identification and interpretation of relationships between taxa and covariates by modeling the marginal mean of the multivariate response (i.e., taxa counts). Inference is performed using a tailored Hamiltonian Monte Carlo estimation method combined with a spike-and-slab variable selection procedure. Extensive simulation studies and an application to a human gut microbiome dataset highlight the proposed model's substantial improvements over competing models in terms of fit, interpretability, and predictive performance.| File | Dimensione | Formato | |
|---|---|---|---|
|
Ascari et al-2025-Statistics in Medicine-VoR.pdf
accesso aperto
Descrizione: This is an open access article under the terms of the Creative Commons Attribution License
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Creative Commons
Dimensione
552.46 kB
Formato
Adobe PDF
|
552.46 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


