A novel variable and model selection method is proposed to analyze multiple time-series and panel data based on a Hidden Markov (HM) model for multivariate continuous responses. We consider an approach for inference, under the missing-at-random assumption to account for missing data, focusing on the maximum likelihood estimation of the model parameters through a modified Expectation-Maximization (EM) algorithm. We develop a greedy forward-backward algorithm based on the Bayesian Information Criterion (BIC) seen as an approximation of the Bayes factor. In this way, we achieve a dimensionality reduction of the complete set of response variables to a smaller subset and thus we select the most useful variables for clustering purposes. The BIC is also used to choose the optimal number of latent states during the steps of the greedy search algorithm. In applying the selection method, the estimation of multivariate linear regression models is required. In the presence of missing values in the set of independent variables, we adopt a sort of multiple imputations based on the posterior expected values obtained at the convergence of the EM algorithm of the estimated HM model. To illustrate the proposal we use a collection of macroeconomic indicators provided by the World Bank related to 217 countries followed over a long period of time. The chosen HM model allows us to dynamically characterize countries’ transitions between hidden states representing different levels of development.

Pennoni, F., Bartolucci, F., Pandolfi, S. (2021). Variable selection in hidden Markov models with missing data. In Book of Abstracts. CFE-CMStatistics 2021. 15th International Conference on Computational and Financial Econometrics. 14th International Conference of the ERCIM Working Group on Computational and Methodological Statistics.

Variable selection in hidden Markov models with missing data

Pennoni, F;
2021

Abstract

A novel variable and model selection method is proposed to analyze multiple time-series and panel data based on a Hidden Markov (HM) model for multivariate continuous responses. We consider an approach for inference, under the missing-at-random assumption to account for missing data, focusing on the maximum likelihood estimation of the model parameters through a modified Expectation-Maximization (EM) algorithm. We develop a greedy forward-backward algorithm based on the Bayesian Information Criterion (BIC) seen as an approximation of the Bayes factor. In this way, we achieve a dimensionality reduction of the complete set of response variables to a smaller subset and thus we select the most useful variables for clustering purposes. The BIC is also used to choose the optimal number of latent states during the steps of the greedy search algorithm. In applying the selection method, the estimation of multivariate linear regression models is required. In the presence of missing values in the set of independent variables, we adopt a sort of multiple imputations based on the posterior expected values obtained at the convergence of the EM algorithm of the estimated HM model. To illustrate the proposal we use a collection of macroeconomic indicators provided by the World Bank related to 217 countries followed over a long period of time. The chosen HM model allows us to dynamically characterize countries’ transitions between hidden states representing different levels of development.
abstract
Expectation-Maximization algorithm; Countries development process; Greedy forward-backward procedure; Model-based clustering; Missing at random assumption
English
14th International Conference of the ERCIM WG on Computational and Methodological Statistics and 15th International Conference on Computational and Financial Econometrics
2021
Book of Abstracts. CFE-CMStatistics 2021. 15th International Conference on Computational and Financial Econometrics. 14th International Conference of the ERCIM Working Group on Computational and Methodological Statistics
978-9925-7812-5-6
2021
http://www.cmstatistics.org/CMStatistics2021/docs/BoACFECMStatistics2021.pdf?20211206015943
none
Pennoni, F., Bartolucci, F., Pandolfi, S. (2021). Variable selection in hidden Markov models with missing data. In Book of Abstracts. CFE-CMStatistics 2021. 15th International Conference on Computational and Financial Econometrics. 14th International Conference of the ERCIM Working Group on Computational and Methodological Statistics.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/346371
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact