Missing data imputation in multivariate t distribution with unknown degrees of freedom using expectation maximization algorithm and its stochastic variants

Paul Kimani Kinyanjui; Cox Lwaka Tamba; Luke Akong’o Orawo; Justin Obwoge Okenye

doi:10.3233/mas-200493

Missing data imputation in multivariate t distribution with unknown degrees of freedom using expectation maximization algorithm and its stochastic variants

Model Assisted Statistics and Applications ◽

10.3233/mas-200493 ◽

2020 ◽

Vol 15 (3) ◽

pp. 263-272

Author(s):

Paul Kimani Kinyanjui ◽

Cox Lwaka Tamba ◽

Luke Akong’o Orawo ◽

Justin Obwoge Okenye

Keyword(s):

Parameter Estimation ◽

Missing Data ◽

Expectation Maximization ◽

Degrees Of Freedom ◽

Data Imputation ◽

Missing Data Imputation ◽

Multivariate T Distribution ◽

Monte Carlo Em ◽

T Distribution ◽

Multivariate T

Many researchers encounter the missing data problem. The phenomenon may be occasioned by data omission, non-response, death of respondents, recording errors, among others. It is important to find an appropriate data imputation technique to fill in the missing positions. In this study, the Expectation Maximization (EM) algorithm and two of its stochastic variants, stochastic EM (SEM) and Monte Carlo EM (MCEM), are employed in missing data imputation and parameter estimation in multivariate t distribution with unknown degrees of freedom. The imputation efficiencies of the three methods are then compared using mean square error (MSE) criterion. SEM yields the lowest MSE, making it the most efficient method in data imputation when the data assumes the multivariate t distribution. The algorithm’s stochastic nature enables it to avoid local saddle points and achieve global maxima; ultimately increasing its efficiency. The EM and MCEM techniques yield almost similar results. Large sample draws in the MCEM’s E-step yield more or less the same results as the deterministic EM. In parameter estimation, it is observed that the parameter estimates for EM and MCEM are relatively close to the simulated data’s maximum likelihood (ML) estimates. This is not the case in SEM, owing to the random nature of the algorithm.

Download Full-text

Missing Data Imputation Using the Multivariate t Distribution

Journal of Multivariate Analysis ◽

10.1006/jmva.1995.1029 ◽

1995 ◽

Vol 53 (1) ◽

pp. 139-158 ◽

Cited By ~ 31

Author(s):

C. Liu

Keyword(s):

Missing Data ◽

Data Imputation ◽

Missing Data Imputation ◽

Multivariate T Distribution ◽

T Distribution ◽

Multivariate T

Download Full-text

Salvaging Data Records with Missing Data: Data Imputation using the Multivariate t Distribution

2021 IEEE Aerospace Conference (50100) ◽

10.1109/aero50100.2021.9438137 ◽

2021 ◽

Author(s):

Melissa Hooke ◽

Joseph Mrozinski ◽

Michael DiNicola

Keyword(s):

Missing Data ◽

Data Imputation ◽

Multivariate T Distribution ◽

T Distribution ◽

Multivariate T

Download Full-text

Calibration of Spatiotemporal Missing Data Imputation Algorithm in Distributed Space-Time Expectation-Maximization with Application in Recovering of Air Pollution Missing Data in Multi-Site Monitoring Network

Environmental Epidemiology ◽

10.1097/01.ee9.0000605696.03825.4b ◽

2019 ◽

Vol 3 ◽

pp. 9-10

Author(s):

Amini H ◽

Taghavi-shahri S ◽

Fassò A ◽

Mahaki B

Keyword(s):

Air Pollution ◽

Missing Data ◽

Expectation Maximization ◽

Monitoring Network ◽

Space Time ◽

Data Imputation ◽

Missing Data Imputation ◽

Site Monitoring

Download Full-text

Concurrent spatiotemporal daily land use regression modeling and missing data imputation of fine particulate matter using distributed space-time expectation maximization

Atmospheric Environment ◽

10.1016/j.atmosenv.2019.117202 ◽

2020 ◽

Vol 224 ◽

pp. 117202 ◽

Cited By ~ 4

Author(s):

Seyed Mahmood Taghavi-Shahri ◽

Alessandro Fassò ◽

Behzad Mahaki ◽

Heresh Amini

Keyword(s):

Land Use ◽

Particulate Matter ◽

Missing Data ◽

Expectation Maximization ◽

Fine Particulate Matter ◽

Regression Modeling ◽

Data Imputation ◽

Land Use Regression ◽

Missing Data Imputation ◽

Fine Particulate

Download Full-text

Concurrent Spatiotemporal Daily Land Use Regression Modeling and Missing Data Imputation of Fine Particulate Matter Using Distributed Space Time Expectation Maximization

10.1101/354852 ◽

2018 ◽

Cited By ~ 1

Author(s):

Seyed Mahmood Taghavi-Shahri ◽

Alessandro Fassò ◽

Behzad Mahaki ◽

Heresh Amini

Keyword(s):

Land Use ◽

Particulate Matter ◽

Missing Data ◽

Exposure Assessment ◽

Expectation Maximization ◽

Space Time ◽

Data Imputation ◽

Land Use Regression ◽

Missing Data Imputation

AbstractGraphical AbstractLand use regression (LUR) has been widely applied in epidemiologic research for exposure assessment. In this study, for the first time, we aimed to develop a spatiotemporal LUR model using Distributed Space Time Expectation Maximization (D-STEM). This spatiotemporal LUR model examined with daily particulate matter ≤ 2.5 μm (PM2.5) within the megacity of Tehran, capital of Iran. Moreover, D-STEM missing data imputation was compared with mean substitution in each monitoring station, as it is equivalent to ignoring of missing data, which is common in LUR studies that employ regulatory monitoring stations’ data. The amount of missing data was 28% of the total number of observations, in Tehran in 2015. The annual mean of PM2.5 concentrations was 33 μg/m3. Spatiotemporal R-squared of the D-STEM final daily LUR model was 78%, and leave-one-out cross-validation (LOOCV) R-squared was 66%. Spatial R-squared and LOOCV R-squared were 89% and 72%, respectively. Temporal R-squared and LOOCV R-squared were 99.5% and 99.3%, respectively. Mean absolute error decreased 26% in imputation of missing data by using the D-STEM final LUR model instead of mean substitution. This study reveals competence of the D-STEM software in spatiotemporal missing data imputation, estimation of temporal trend, and mapping of small scale (20 × 20 meters) within-city spatial variations, in the LUR context. The estimated PM2.5 concentrations maps could be used in future studies on short- and/or long-term health effects. Overall, we suggest using D-STEM capabilities in increasing LUR studies that employ data of regulatory network monitoring stations.Highlights-First Land Use Regression using D-STEM, a recently introduced statistical software-Assess D-STEM in spatiotemporal modeling, mapping, and missing data imputation-Estimate high resolution (20×20 m) daily maps for exposure assessment in a megacity-Provide both short- and long-term exposure assessment for epidemiological studies

Download Full-text

Some Extensions of the Multivariate t-Distribution and the Multivariate Generalization of the Distribution of the Regression Coefficient

Mathematical Proceedings of the Cambridge Philosophical Society ◽

10.1017/s0305004100034885 ◽

1961 ◽

Vol 57 (1) ◽

pp. 80-85 ◽

Cited By ~ 31

Author(s):

A. M. Kshirsagar

Keyword(s):

Normal Distribution ◽

Degrees Of Freedom ◽

Regression Coefficient ◽

Bivariate Normal Distribution ◽

Practical Applications ◽

Multivariate T Distribution ◽

The Matrix ◽

Pre Treatment ◽

T Distribution ◽

Multivariate T

If the components x1, x2,…, xk of a vector X have a non-singular multivariate normal distribution having a null vector of means and variance-covariance matrix Σ= σ2, the matrix R=[ρij] (where ρii = 1) is known in certain cases but σ2 is unknown. If s2 is an estimate of σ2 based on ƒ degrees of freedom and is distributed independently of X, the distribution of the vector t=x/s is known as the multivariate t-distribution. This distribution was first obtained by Dunnett and Sobel (6) and independently by Cornish (3). Dunnett, Sobel and Bechhofer(2) have discussed some practical applications of this distribution. Cornish (3) obtained this distribution while considering the pre-treatment to be given to certain types of replicated experiments. This distribution possesses some useful properties and makes it suitable as a basis for exact tests of significance in various problems, and Dunnett and Sobel (6), by providing tables of the probability integral, have taken the first step towards its use in practice. Cornish, in a later paper (4) considered the sampling distribution of statistics derived from the multivariate t-distribution and using this he obtained the well-known ((7), (8)) distribution of the sample regression coefficient of one variate with respect to another, when both have a bivariate normal distribution.

Download Full-text

Spike classification with multivariate t-distribution mixture model via improved Expectation-Maximization algorithm

2010 Sixth International Conference on Natural Computation ◽

10.1109/icnc.2010.5582856 ◽

2010 ◽

Author(s):

Haibing Yin ◽

Yadong Liu ◽

Dewen Hu

Keyword(s):

Mixture Model ◽

Expectation Maximization ◽

Expectation Maximization Algorithm ◽

Multivariate T Distribution ◽

Distribution Mixture ◽

T Distribution ◽

Multivariate T

Download Full-text

Objective priors for the number of degrees of freedom of a multivariate t distribution and the t-copula

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2018.03.010 ◽

2018 ◽

Vol 124 ◽

pp. 197-219 ◽

Cited By ~ 5

Author(s):

Cristiano Villa ◽

Francisco J. Rubio

Keyword(s):

Degrees Of Freedom ◽

Multivariate T Distribution ◽

T Copula ◽

T Distribution ◽

Multivariate T

Download Full-text

A Novel Parameter Estimation Algorithm for the Multivariate t-Distribution and Its Application to Computer Vision

Computer Vision – ECCV 2010 - Lecture Notes in Computer Science ◽

10.1007/978-3-642-15552-9_43 ◽

2010 ◽

pp. 594-607 ◽

Cited By ~ 6

Author(s):

Chad Aeschliman ◽

Johnny Park ◽

Avinash C. Kak

Keyword(s):

Computer Vision ◽

Parameter Estimation ◽

Estimation Algorithm ◽

Multivariate T Distribution ◽

Parameter Estimation Algorithm ◽

T Distribution ◽

Multivariate T

Download Full-text

Missing data imputation via the expectation-maximization algorithm can improve principal component analysis aimed at deriving biomarker profiles and dietary patterns

Nutrition Research ◽

10.1016/j.nutres.2020.01.001 ◽

2020 ◽

Vol 75 ◽

pp. 67-76 ◽

Cited By ~ 1

Author(s):

Linda Malan ◽

Cornelius M. Smuts ◽

Jeannine Baumgartner ◽

Cristian Ricci

Keyword(s):

Principal Component Analysis ◽

Missing Data ◽

Dietary Patterns ◽

Expectation Maximization ◽

Expectation Maximization Algorithm ◽

Principal Component ◽

Component Analysis ◽

Data Imputation ◽

Missing Data Imputation

Download Full-text