Airport Pavement Missing Data Management and Imputation with Stochastic Multiple Imputation Model

Author(s):  
J. Farhan ◽  
T. F. Fwa
2020 ◽  
Author(s):  
Anna-Carolina Haensch ◽  
Bernd Weiß

Many phenomena in the social or the medical sciences can be described as events, meaning that a qualitative change occurs at some particular point in time. Typical research questions focus on whether, when, and under which circumstances events occur. In the social sciences, discrete-time-to-event models are popular (Discrete-Time Survival Analysis Model, DTSAM). Data analyzed through DTSAMs is in the so-called person-period format. The model is a logistic regression model with the event indicator as the dependent variable. However, like many other statistical applications, the practical analysis of discrete-time survival data is challenged by missing data in one or more covariates. Negative consequences of such missing data range from efficiency losses to bias. A popular approach to circumvent these unwanted effects of missing data is multiple imputation (MI). With multiple imputation, it is crucial to include outcome information in the model for imputing partially observed covariates. Unfortunately, this is not straightforward in case of DTSAM, since we (a) usually have a partly observed (left- or right-censored) outcome, (b) do not have only one outcome variable, but two: the event indicator and the time-to-event and (c) have to decide whether to impute while the data set is still in person format or after transformation in person-period format, especially if we look at time-invariant information. Since there is little guidance on how to incorporate the observed outcome information in the imputation model of missing covariates in discrete-time survival analysis, we explore different approaches using fully conditional specification (FCS) (van Buuren 2006) and the newer substantial model compatible (SMC-) FCS MI (Bartlett et al., 2014). These approaches vary in their complexity with which we incorporate the outcome into the imputation model, the FCS algorithm used, and the data format used during the imputation. We compare the methods using Monte Carlo simulations and provide a practical example using data from the German Family Panel pairfam.We confirm the results by White and Royston (2009) and Beesley et al. (2016) that imputing conditional on the (partly imputed) uncensored time-to-event yields high bias. A compatible imputation model for SMC-FCS MI with data in person-period format proves to be the key to imputations with good performance results under different simulation conditions.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Albee Ling ◽  
Maria Montez-Rath ◽  
Maya Mathur ◽  
Kris Kapphahn ◽  
Manisha Desai

Propensity score matching (PSM) has been widely used to mitigate confounding in observational studies, although complications arise when the covariates used to estimate the PS are only partially observed. Multiple imputation (MI) is a potential solution for handling missing covariates in the estimation of the PS. However, it is not clear how to best apply MI strategies in the context of PSM. We conducted a simulation study to compare the performances of popular non-MI missing data methods and various MI-based strategies under different missing data mechanisms. We found that commonly applied missing data methods resulted in biased and inefficient estimates, and we observed large variation in performance across MI-based strategies. Based on our findings, we recommend 1) estimating the PS after applying MI to impute missing confounders; 2) conducting PSM within each imputed dataset followed by averaging the treatment effects to arrive at one summarized finding; 3) a bootstrapped-based variance to account for uncertainty of PS estimation, matching, and imputation; and 4) inclusion of key auxiliary variables in the imputation model.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Sara Javadi ◽  
Abbas Bahrampour ◽  
Mohammad Mehdi Saber ◽  
Behshid Garrusi ◽  
Mohammad Reza Baneshi

Multiple imputation by chained equations (MICE) is the most common method for imputing missing data. In the MICE algorithm, imputation can be performed using a variety of parametric and nonparametric methods. The default setting in the implementation of MICE is for imputation models to include variables as linear terms only with no interactions, but omission of interaction terms may lead to biased results. It is investigated, using simulated and real datasets, whether recursive partitioning creates appropriate variability between imputations and unbiased parameter estimates with appropriate confidence intervals. We compared four multiple imputation (MI) methods on a real and a simulated dataset. MI methods included using predictive mean matching with an interaction term in the imputation model in MICE (MICE-interaction), classification and regression tree (CART) for specifying the imputation model in MICE (MICE-CART), the implementation of random forest (RF) in MICE (MICE-RF), and MICE-Stratified method. We first selected secondary data and devised an experimental design that consisted of 40 scenarios (2 × 5 × 4), which differed by the rate of simulated missing data (10%, 20%, 30%, 40%, and 50%), the missing mechanism (MAR and MCAR), and imputation method (MICE-Interaction, MICE-CART, MICE-RF, and MICE-Stratified). First, we randomly drew 700 observations with replacement 300 times, and then the missing data were created. The evaluation was based on raw bias (RB) as well as five other measurements that were averaged over the repetitions. Next, in a simulation study, we generated data 1000 times with a sample size of 700. Then, we created missing data for each dataset once. For all scenarios, the same criteria were used as for real data to evaluate the performance of methods in the simulation study. It is concluded that, when there is an interaction effect between a dummy and a continuous predictor, substantial gains are possible by using recursive partitioning for imputation compared to parametric methods, and also, the MICE-Interaction method is always more efficient and convenient to preserve interaction effects than the other methods.


2021 ◽  
Author(s):  
Melissa Middleton ◽  
Cattram Nguyen ◽  
Margarita Moreno-Betancur ◽  
John B Carlin ◽  
Katherine J Lee

Abstract Background In case-cohort studies a random subcohort is selected from the inception cohort and acts as the sample of controls for several outcome investigations. Analysis is conducted using only the cases and the subcohort, with inverse probability weighting (IPW) used to account for the unequal sampling probabilities resulting from the study design. Like all epidemiological studies, case-cohort studies are susceptible to missing data. Multiple imputation (MI) has become increasingly popular for addressing missing data in epidemiological studies. It is currently unclear how best to incorporate the weights from a case-cohort analysis in MI procedures used to address missing covariate data.Method A simulation study was conducted with missingness in two covariates, motivated by a case study within the Barwon Infant Study. MI methods considered were: using the outcome, a proxy for weights in the simple case-cohort design considered, as a predictor in the imputation model, with and without exposure and covariate interactions; imputing separately within each weight category; and using a weighted imputation model. These methods were compared to a complete case analysis (CCA) within the context of a standard IPW analysis model estimating either the risk or odds ratio. The strength of associations, missing data mechanism, proportion of observations with incomplete covariate data, and subcohort selection probability varied across the simulation scenarios. Methods were also applied to the case study.Results There was similar performance in terms of relative bias and precision with all MI methods across the scenarios considered, with expected improvements compared with the CCA. Slight underestimation of the standard error was seen throughout but the nominal level of coverage (95%) was generally achieved. All MI methods showed a similar increase in precision as the subcohort selection probability increased, irrespective of the scenario. A similar pattern of results was seen in the case study.Conclusions How weights were incorporated into the imputation model had minimal effect on the performance of MI; this may be due to case-cohort studies only having two weight categories. In this context, inclusion of the outcome in the imputation model was sufficient to account for the unequal sampling probabilities in the analysis model.


Rheumatology ◽  
2021 ◽  
Vol 60 (Supplement_1) ◽  
Author(s):  
Alice Gottlieb ◽  
Frank Behrens ◽  
Peter Nash ◽  
Joseph F Merola ◽  
Pascale Pellet ◽  
...  

Abstract Background/Aims  Psoriatic arthritis (PsA) is a heterogeneous disease comprising musculoskeletal and dermatological manifestations, especially plaque psoriasis. Secukinumab, an interleukin17A inhibitor, provided significantly greater PASI75/100 responses in two head-to-head trials versus etanercept or ustekinumab, a tumour necrosis factor inhibitor (TNFi), in patients with moderate-to-severe plaque psoriasis. The EXCEED study (NCT02745080) investigated whether secukinumab was superior to adalimumab, another TNFi, as monotherapy in biologic-naive active PsA patients with active plaque psoriasis (defined as having ≥1 psoriatic plaque of ≥ 2 cm diameter, nail changes consistent with psoriasis or documented history of plaque psoriasis). Here we report the pre-specified skin outcomes from the EXCEED study in the subset of patients with ≥3% body surface area (BSA) affected with psoriasis at baseline. Methods  In this head-to-head, Phase 3b, randomised, double-blind, active-controlled, multicentre, parallel-group trial, patients were randomised to receive subcutaneous secukinumab 300 mg at baseline and Weeks 1-4, followed by dosing every 4 weeks until Week 48, or subcutaneous adalimumab 40 mg at baseline followed by the same dosing every 2 weeks until Week 50. The primary endpoint was superiority of secukinumab versus adalimumab on ACR20 response at Week 52. Pre-specified outcomes included the proportion of patients achieving a combined ACR50 and PASI100 response, PASI100 response, and absolute PASI score ≤3. Missing data were handled using multiple imputation. Results  Overall, 853 patients were randomised to receive secukinumab (n = 426) or adalimumab (n = 427). At baseline, 215 and 202 patients had at least 3% BSA affected with psoriasis in the secukinumab and adalimumab groups, respectively. At Week 52, more patients achieved simultaneous improvement in ACR50 and PASI100 response with secukinumab versus adalimumab (30.7% versus 19.2%, respectively; P = 0.0087). Greater efficacy was demonstrated for secukinumab versus adalimumab for PASI100 responses and for the proportion of patients achieving absolute PASI score ≤3 (Table 1). Conclusion  In this pre-specified analysis, secukinumab provided higher responses compared with adalimumab in achievement of combined improvement in joint and skin disease (combined ACR50 and PASI100 response) and in skin-specific endpoints (PASI100 and absolute PASI score ≤3) at Week 52. P189 Table 1:Skin-specific outcomes at Week 52Endpoints, % responseSEC 300 mg (N = 215)ADA 40 mg (N = 202)P value (unadjusted)PASI10046300.0007Combined ACR50 and PASI10031190.0087Absolute PASI score ≤379650.0015P value vs ADA; unadjusted P values are presented. Multiple imputation was used for handling missing data. ADA, adalimumab; ACR, American College of Rheumatology; N, number of patients in the psoriasis subset; PASI, Psoriasis Area and Severity Index; SEC, secukinumab. Disclosure  A. Gottlieb: Grants/research support; A.G. has received research support, consultation fees or speaker honoraria from Pfizer, AbbVie, BMS, Lilly, MSD, Novartis, Roche, Sanofi, Sandoz, Nordic, Celltrion and UCB. F. Behrens: Consultancies; F.B. is a consultant for Pfizer, AbbVie, Sanofi, Lilly, Novartis, Genzyme, Boehringer Ingelheim, Janssen, MSD, Celgene, Roche and Chugai. Grants/research support; F.B. has received grant/research support from Pfizer, Janssen, Chugai, Celgene, Lilly and Roche. P. Nash: Consultancies; P.N. is a consultant for AbbVie, Bristol Myers Squibb, Celgene, Eli Lilly, Gilead, Janssen, MSD, Novartis, Pfizer Inc., Roche, Sanofi and UCB. Member of speakers’ bureau; for AbbVie, Bristol Myers Squibb, Celgene, Eli Lilly, Gilead, Janssen, MSD, Novartis, Pfizer Inc., Roche, Sanofi and UCB. Grants/research support; P.N. has received research support from AbbVie, Bristol Myers Squibb, Celgene, Eli Lilly and Company, Gilead, Janssen, MSD, Novartis, Pfizer Inc, Roche, Sanofi and UCB. J. Merola: Consultancies; J.F.M. is a consultant for Merck, AbbVie, Dermavant, Eli Lilly, Novartis, Janssen, UCB Pharma, Celgene, Sanofi, Regeneron, Arena, Sun Pharma, Biogen, Pfizer, EMD Sorono, Avotres and LEO Pharma. P. Pellet: Corporate appointments; P.P. is an employee of Novartis. Shareholder/stock ownership; P.P. is a shareholder of Novartis. L. Pricop: Corporate appointments; L.P. is an employee of Novartis. Shareholder/stock ownership; L.P. is a shareholder of Novartis. I. McInnes: Consultancies; I.M. is a consultant for AbbVie, Bristol Myers Squibb, Celgene, Eli Lilly and Company, Gilead, Janssen, Novartis, Pfizer and UCB. Grants/research support; I.M. has received grant/research support from Bristol Myers Squibb, Celgene, Eli Lilly and Company, Janssen and UCB.


2021 ◽  
Vol 50 (Supplement_1) ◽  
Author(s):  
Jiaxin Zhang ◽  
S. Ghazaleh Dashti ◽  
John B. Carlin ◽  
Katherine J. Lee ◽  
Margarita Moreno-Betancur

Abstract Background Outcome regression remains widely applied for estimating causal effects in observational studies, in which causal inference is conceptualised as emulating a randomized controlled trial (RCT). Multiple imputation (MI) is a commonly used method for handling missing data, but while in RCTs it has been shown that MI should be conducted by treatment group to reduce bias, whether imputation should be conducted by exposure group in observational studies has not been studied. Methods We conducted a simulation study to evaluate the performance of seven methods for handling missing data: Complete-case analysis (CCA), MI of main effect, MI with interactions (between exposure and: outcome, a strong confounder, outcome and a strong confounder, all incomplete), and MI conducted by exposure group. We simulated data based on an example from the Victorian Adolescent Health Cohort Study. Three exposure prevalences and seven outcome generation models were considered, the latter ranging from no interaction to strong-positive or negative exposure-confounder interaction. Various missingness scenarios were examined: with incomplete outcome only or also incomplete confounders, and three levels of complexity regarding the missingness mechanism. Results For all scenarios, MI by exposure led to the least bias, followed by MI approaches that included exposure-confounder interactions. Conclusions If MI is adopted in outcome regression, we recommend conducting MI by exposure group and, when not feasible, including exposure-confounder interactions in the imputation model. Key messages Similar to RCTs, MI should be conducted by exposure group when estimating average causal effects using outcome regression in observational studies.


Author(s):  
Tra My Pham ◽  
Irene Petersen ◽  
James Carpenter ◽  
Tim Morris

ABSTRACT BackgroundEthnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. ObjectivesI propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables.MethodsWeighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation.ResultsWhile a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity.ConclusionsAlthough not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.


Sign in / Sign up

Export Citation Format

Share Document