We propose a Hidden Markov (HM) model for continuous longitudinal data with missing responses and dropout, thus extending the finite mixture model of multivariate Gaussian distributions. As known, the HM models assume the existence of an unobservable process, which follows a Markov chain with a discrete number of hidden states, affecting the distribution of the observed outcomes. We consider multivariate continuous responses that, for the same time occasion, are assumed to be correlated, according to a specific variance-covariance matrix, even conditionally on the latent states. For the analysis of such data, missing observations represent a relevant problem since dropout or non-monotone missing data patterns could occur. We propose an approach for inference with missing data by exploiting the steps of the Expectation-Maximization (EM) algorithm on the basis of suitable recursions. The resulting EM algorithm provides exact maximum likelihood estimates of model parameters under the missing-at-random (MAR) assumption, where the missing patterns are independent of the missing responses given all the observed data. The resulting HM model accounts for different types of missing pattern: (i) partially missing outcomes at a given time occasion; (ii) completely missing outcomes at a given time occasion (intermittent pattern); (iii) dropout before the end of the period of observation (monotone pattern). The estimation algorithm is also employed when there are available covariates supposed to affect the distribution of the latent process and, in particular, the initial and the transition probabilities of the Markov chain. In this way, it is possible to identify latent or unobserved clusters of units with homogeneous behavior and understand the influence of the covariates on the dynamic allocation of the individuals between states over time. The approach is illustrated by a Monte Carlo simulation study involving different scenarios. We also report an application based on the well-known primary biliary cholangitis dataset. These data are very sparse due to missing visits, and several dropouts occurred due to death. Continuous and binary covariates related to the patients are also available, allowing us to investigate how individual characteristics are associated with dropout risk. The application is particularly challenging and confirms the capability of the proposed method to deal with different types of missingness and to provide risk groups of patients derived by the model, which can be useful to make clinical decisions about therapy.
Pennoni, F., Bartolucci, F., Pandolfi, S. (2022). Maximum likelihood estimation of Hidden Markov models for continuous longitudinal data with missing responses and dropout. In Data Science Everywhere: Innovations in Statistical Computing (pp.22-23).
Maximum likelihood estimation of Hidden Markov models for continuous longitudinal data with missing responses and dropout
Pennoni, F;
2022
Abstract
We propose a Hidden Markov (HM) model for continuous longitudinal data with missing responses and dropout, thus extending the finite mixture model of multivariate Gaussian distributions. As known, the HM models assume the existence of an unobservable process, which follows a Markov chain with a discrete number of hidden states, affecting the distribution of the observed outcomes. We consider multivariate continuous responses that, for the same time occasion, are assumed to be correlated, according to a specific variance-covariance matrix, even conditionally on the latent states. For the analysis of such data, missing observations represent a relevant problem since dropout or non-monotone missing data patterns could occur. We propose an approach for inference with missing data by exploiting the steps of the Expectation-Maximization (EM) algorithm on the basis of suitable recursions. The resulting EM algorithm provides exact maximum likelihood estimates of model parameters under the missing-at-random (MAR) assumption, where the missing patterns are independent of the missing responses given all the observed data. The resulting HM model accounts for different types of missing pattern: (i) partially missing outcomes at a given time occasion; (ii) completely missing outcomes at a given time occasion (intermittent pattern); (iii) dropout before the end of the period of observation (monotone pattern). The estimation algorithm is also employed when there are available covariates supposed to affect the distribution of the latent process and, in particular, the initial and the transition probabilities of the Markov chain. In this way, it is possible to identify latent or unobserved clusters of units with homogeneous behavior and understand the influence of the covariates on the dynamic allocation of the individuals between states over time. The approach is illustrated by a Monte Carlo simulation study involving different scenarios. We also report an application based on the well-known primary biliary cholangitis dataset. These data are very sparse due to missing visits, and several dropouts occurred due to death. Continuous and binary covariates related to the patients are also available, allowing us to investigate how individual characteristics are associated with dropout risk. The application is particularly challenging and confirms the capability of the proposed method to deal with different types of missingness and to provide risk groups of patients derived by the model, which can be useful to make clinical decisions about therapy.File | Dimensione | Formato | |
---|---|---|---|
Abs_IASC_Kyoto_2022.pdf
accesso aperto
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Dimensione
2.55 MB
Formato
Adobe PDF
|
2.55 MB | Adobe PDF | Visualizza/Apri |
IASC_2022_Pennoni1.pdf
accesso aperto
Descrizione: Slides of the presentation
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Dimensione
719.28 kB
Formato
Adobe PDF
|
719.28 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.