missing data techniques
Recently Published Documents


TOTAL DOCUMENTS

53
(FIVE YEARS 3)

H-INDEX

14
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Xijuan Zhang

Missing data are common in psychological and educational research. With the improvement in computing technology in recent decades, more researchers begin developing missing data techniques. In their research, they often conduct Monte Carlo simulation studies to compare the performances of different missing data techniques. During such simulation studies, researchers must generate missing data in the simulated dataset by deciding which data values to delete. However, in the current literature, there are few guidelines on how to generate missing data for simulation studies. Our paper is one of the first papers that examines ways of generating missing data for simulation studies. We emphasize the importance of specifying missing data rules which are statistical models for generating missing data. We begin the paper by reviewing the types of missing data mechanisms and missing data patterns. We then explain how to specify missing data rules to generate missing data with different mechanisms and patterns. We end the paper by presenting recommendations for generating missing data for simulation studies.


2021 ◽  
Vol 1 (1) ◽  
Author(s):  
Danielle M. Rodgers ◽  
Ross Jacobucci ◽  
Kevin J. Grimm

Decision trees (DTs) is a machine learning technique that searches the predictor space for the variable and observed value that leads to the best prediction when the data are split into two nodes based on the variable and splitting value. The algorithm repeats its search within each partition of the data until a stopping rule ends the search. Missing data can be problematic in DTs because of an inability to place an observation with a missing value into a node based on the chosen splitting variable. Moreover, missing data can alter the selection process because of its inability to place observations with missing values. Simple missing data approaches (e.g., listwise deletion, majority rule, and surrogate split) have been implemented in DT algorithms; however, more sophisticated missing data techniques have not been thoroughly examined. We propose a modified multiple imputation approach to handling missing data in DTs, and compare this approach with simple missing data approaches as well as single imputation and a multiple imputation with prediction averaging via Monte Carlo Simulation. This study evaluated the performance of each missing data approach when data were MAR or MCAR. The proposed multiple imputation approach and surrogate splits had superior performance with the proposed multiple imputation approach performing best in the more severe missing data conditions. We conclude with recommendations for handling missing data in DTs.


2020 ◽  
Vol 58 (11) ◽  
pp. 2863-2878
Author(s):  
Ali Idri ◽  
Ilham Kadi ◽  
Ibtissam Abnane ◽  
José Luis Fernandez-Aleman

Author(s):  
Hettie A. Richardson ◽  
Marcia J. Simmering

Nonresponse and the missing data that it produces are ubiquitous in survey research, but they are also present in archival and other forms of research. Nonresponse and missing data can be especially problematic in organizational contexts where the risks of providing personal or organizational data might be perceived as (or actually) greater than in public opinion contexts. Moreover, nonresponse and missing data are presenting new challenges with the advent of online and mobile survey technology. When observational units (e.g., individuals, teams, organizations) do not provide some or all of the information sought by a researcher and the reasons for nonresponse are systematically related to the survey topic, nonresponse bias can result and the research community may draw faulty conclusions. Due to concerns about nonresponse bias, scholars have spent several decades seeking to understand why participants choose not to respond to certain items and entire surveys, and how best to avoid nonresponse through actions such as improved study design, the use of incentives, and follow-up initiatives. At the same time, researchers recognize that it is virtually impossible to avoid nonresponse and missing data altogether, and as such, in any given study there will likely be a need to diagnose patterns of missingness and their potential for bias. There will likewise be a need to statistically deal with missing data by employing post hoc mechanisms that maximize the sample available for hypothesis testing and minimize the extent to which missing data obscures the underlying true characteristics of the dataset. In this connection, a large body of programmatic research supports maximum likelihood (ML) and multiple imputation (MI) as useful data replacement procedures; although in some situations, it might be reasonable to use simpler procedures instead. Despite strong support for these statistical techniques, organizational scholars have yet to embrace them. Instead they tend to rely on approaches such as listwise deletion that do not preserve underlying data characteristics, reduce the sample available for statistical analysis, and in some cases, actually exacerbate the potential problems associated with missing data. Although there are certainly remaining questions that can be addressed about missing data techniques, these techniques are also well understood and validated. There remains, however, a strong need for exploration into the nature, causes, and extent of nonresponse in various organizational contexts, such when using online and mobile surveys. Such research could play a useful role in helping researchers avoid nonresponse in organizational settings, as well as extend insight about how best and when to apply validated missing data techniques.


2019 ◽  
Vol 55 (1) ◽  
pp. 87-101 ◽  
Author(s):  
Po-Yi Chen ◽  
Wei Wu ◽  
Mauricio Garnier-Villarreal ◽  
Benjamin Arthur Kite ◽  
Fan Jia

Sign in / Sign up

Export Citation Format

Share Document