Issues on the Estimation of Latent Variable and Latent Class Models with Social Science Applications

Pennoni, F

This Ph.D. work is made of different reseach problems which have in common the precence of latent variables. Chapters 1 and 2 provide accessible primer on the models developped in the subsequent chapters. Chapters 3 and 4 are written in form of articles. A list of references at the end of each chapter is provided and a general bibliography is also reported as last part of the work. The first chapter introduces the models of depedence and association and their interpretation using graphical models which have been proved useful to display in graphical form the essential relationships between variables. The structure of the graph yields direct information about various aspects related to the statistical analysis. At first we provide the necessary notation and background on graph theory. We describe the Markov properties that associate a set of conditional independence assumptions to an undirected and directed graph. Such definitions does not depend of any particular distributional form and hence can be applied to models with both discrete and continuous random variables. In particular we consider models for Gaussian continuous variables where the structure is assumed to be adequately described via a vector of means and by a covariance matrix. The concentration and the covariance graphs models are illustrated. The specification of the complex multivariate distribution through univariate regressions induced by a Directed Acyclic Graph (DAG) can be regarded as a simplification, as the single regression models typically involve considerably fewer variables than the whole multivariate vector. In the present work it is shown that such models are a subclass of the structural equation models developed for linear analysis known as Structural Equation Models (SEM) The chapter is concluded by some bibliographical notes. Chapter 2 takes into account the latent class model for measuring one or more latent categorical variables by means of a set of observed categorical variables. After some notes on the model identifiability and estimation we consider the model extension to study latent changes over time when longitudinal studies are used. The hidden Markov model is presented cosisting of hidden state variables and observed variables both varying over time. In Chapter 3 we consider in detail the DAG Gaussian models in which one of the variables is not observed. Once the condition for global identification has been satisfied, we show how the incomplete log-likelihood of the observed data can be maximize using the EM algorithm. As the EM does not provide the matrix of the second derivatives we propose a method for obtaining an explicit formula of the observed information matrix using the missing information principle. We illustrate the models with two examples on real data concerning the educational attainement and criminological research. The first appendix of the chapter reports details on the calculations of the quantities necessary for the E-step of the EM algorithm. The second appendix reports the code of the statistical software R to get the estimated standard errors, which may implemented in the R package called ggm. Chapter 4 starts from the practical problem of classifying criminal activity. The latent class cluster model is extended by proposing a latent class model that also incorporates the longitudinal structure of data using a method similar to a local likelihood approach. The data set which is taken from the Home Office Offenders Index of England and Wales. It contains the complete criminal histories of a sample of those born in 1953 and followed for forty years.

Pennoni, F (2004). Issues on the Estimation of Latent Variable and Latent Class Models with Social Science Applications. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2004).