We propose a Hidden Markov (HM) model for continuous longitudinal data with missing responses and dropout, thus extending the finite mixture model of multivariate Gaussian distributions. As known, the HM models assume the existence of an unobservable process, which follows a Markov chain with a discrete number of hidden states, affecting the distribution of the observed outcomes. We consider multivariate continuous responses that, for the same time occasion, are assumed to be correlated, according to a specific variance-covariance matrix, even conditionally on the latent states. For the analysis of such data, missing observations represent a relevant problem since dropout or non-monotone missing data patterns could occur. We propose an approach for inference with missing data by exploiting the steps of the Expectation-Maximization (EM) algorithm on the basis of suitable recursions. The resulting EM algorithm provides exact maximum likelihood estimates of model parameters under the missing-at-random (MAR) assumption, where the missing patterns are independent of the missing responses given all the observed data. The resulting HM model accounts for different types of missing pattern: (i) partially missing outcomes at a given time occasion; (ii) completely missing outcomes at a given time occasion (intermittent pattern); (iii) dropout before the end of the period of observation (monotone pattern). The estimation algorithm is also employed when there are available covariates supposed to affect the distribution of the latent process and, in particular, the initial and the transition probabilities of the Markov chain. In this way, it is possible to identify latent or unobserved clusters of units with homogeneous behavior and understand the influence of the covariates on the dynamic allocation of the individuals between states over time. The approach is illustrated by a Monte Carlo simulation study involving different scenarios. We also report an application based on the well-known primary biliary cholangitis dataset. These data are very sparse due to missing visits, and several dropouts occurred due to death. Continuous and binary covariates related to the patients are also available, allowing us to investigate how individual characteristics are associated with dropout risk. The application is particularly challenging and confirms the capability of the proposed method to deal with different types of missingness and to provide risk groups of patients derived by the model, which can be useful to make clinical decisions about therapy.

Pennoni, F., Bartolucci, F., Pandolfi, S. (2022). Maximum likelihood estimation of Hidden Markov models for continuous longitudinal data with missing responses and dropout. In Data Science Everywhere: Innovations in Statistical Computing (pp.22-23).

Maximum likelihood estimation of Hidden Markov models for continuous longitudinal data with missing responses and dropout

Pennoni, F;
2022

Abstract

We propose a Hidden Markov (HM) model for continuous longitudinal data with missing responses and dropout, thus extending the finite mixture model of multivariate Gaussian distributions. As known, the HM models assume the existence of an unobservable process, which follows a Markov chain with a discrete number of hidden states, affecting the distribution of the observed outcomes. We consider multivariate continuous responses that, for the same time occasion, are assumed to be correlated, according to a specific variance-covariance matrix, even conditionally on the latent states. For the analysis of such data, missing observations represent a relevant problem since dropout or non-monotone missing data patterns could occur. We propose an approach for inference with missing data by exploiting the steps of the Expectation-Maximization (EM) algorithm on the basis of suitable recursions. The resulting EM algorithm provides exact maximum likelihood estimates of model parameters under the missing-at-random (MAR) assumption, where the missing patterns are independent of the missing responses given all the observed data. The resulting HM model accounts for different types of missing pattern: (i) partially missing outcomes at a given time occasion; (ii) completely missing outcomes at a given time occasion (intermittent pattern); (iii) dropout before the end of the period of observation (monotone pattern). The estimation algorithm is also employed when there are available covariates supposed to affect the distribution of the latent process and, in particular, the initial and the transition probabilities of the Markov chain. In this way, it is possible to identify latent or unobserved clusters of units with homogeneous behavior and understand the influence of the covariates on the dynamic allocation of the individuals between states over time. The approach is illustrated by a Monte Carlo simulation study involving different scenarios. We also report an application based on the well-known primary biliary cholangitis dataset. These data are very sparse due to missing visits, and several dropouts occurred due to death. Continuous and binary covariates related to the patients are also available, allowing us to investigate how individual characteristics are associated with dropout risk. The application is particularly challenging and confirms the capability of the proposed method to deal with different types of missingness and to provide risk groups of patients derived by the model, which can be useful to make clinical decisions about therapy.
abstract + slide
Expectation-Maximization algorithm, Forward-backward recursion, Latent Markov model, Missing values, Prediction
English
11th Conference of the Asian Regional Section of the International Association for Statistical Computing (IASC-ARS)
2022
Yuichi Mori
Hiroshi Yadohisa; Tomokazu Fujino; Hidetoshi Murakami; Wataru Sakamoto; Fumitake Sakaori; Hirohito Sakurai; Yoshikazu Terada; Makoto Tomita; Hiroshi Yadohisa; Kensuke Okada; Kosuke Okusa; Koji Yamamoto; Michio Yamamoto; Yoshiro Yamamoto; Yoshitomo Akimoto
Data Science Everywhere: Innovations in Statistical Computing
2022
22
23
https://iasc-ars2022.org/
open
Pennoni, F., Bartolucci, F., Pandolfi, S. (2022). Maximum likelihood estimation of Hidden Markov models for continuous longitudinal data with missing responses and dropout. In Data Science Everywhere: Innovations in Statistical Computing (pp.22-23).
File in questo prodotto:
File Dimensione Formato  
Abs_IASC_Kyoto_2022.pdf

accesso aperto

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Dimensione 2.55 MB
Formato Adobe PDF
2.55 MB Adobe PDF Visualizza/Apri
IASC_2022_Pennoni1.pdf

accesso aperto

Descrizione: Slides of the presentation
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Dimensione 719.28 kB
Formato Adobe PDF
719.28 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/355347
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact