scholarly journals Clustering-Based Multiple Imputation via Gray Relational Analysis for Missing Data and Its Application to Aerospace Field

2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
Jing Tian ◽  
Bing Yu ◽  
Dan Yu ◽  
Shilong Ma

A large number of scientific researches and industrial applications commonly suffer from missing data. Some inappropriate techniques of missing value treatment compromise data quality, which detrimentally influences the knowledge discovery. In this paper, we propose a missing data completion method named CBGMI. Firstly, it separates the nonmissing data instances into several clusters by excluding the missing-valued entries. Then, it utilizes the entropy of the proximal category for each incomplete instance in terms of the similarity metric based on gray relational analysis. Experiments on UCI datasets and aerospace datasets demonstrate that the superiority of our algorithm to other approaches on validity.

2011 ◽  
Vol 26 (S2) ◽  
pp. 572-572
Author(s):  
N. Resseguier ◽  
H. Verdoux ◽  
F. Clavel-Chapelon ◽  
X. Paoletti

IntroductionThe CES-D scale is commonly used to assess depressive symptoms (DS) in large population-based studies. Missing values in items of the scale may create biases.ObjectivesTo explore reasons for not completing items of the CES-D scale and to perform sensitivity analysis of the prevalence of DS to assess the impact of different missing data hypotheses.Methods71412 women included in the French E3N cohort returned in 2005 a questionnaire containing the CES-D scale. 45% presented at least one missing value in the scale. An interview study was carried out on a random sample of 204 participants to examine the different hypotheses for the missing value mechanism. The prevalence of DS was estimated according to different methods for handling missing values: complete cases analysis, single imputation, multiple imputation under MAR (missing at random) and MNAR (missing not at random) assumptions.ResultsThe interviews showed that participants were not embarrassed to fill in questions about DS. Potential reasons of nonresponse were identified. MAR and MNAR hypotheses remained plausible and were explored.Among complete responders, the prevalence of DS was 26.1%. After multiple imputation under MAR assumption, it was 28.6%, 29.8% and 31.7% among women presenting up to 4, to 10 and to 20 missing values, respectively. The estimates were robust after applying various scenarios of MNAR data for the sensitivity analysis.ConclusionsThe CES-D scale can easily be used to assess DS in large cohorts. Multiple imputation under MAR assumption allows to reliably handle missing values.


METRON ◽  
2021 ◽  
Author(s):  
Paolo Mariani ◽  
Andrea Marletta

AbstractSocial media has become a widespread element of people’s everyday life, which is used to communicate and generate contents. Among the several ways to express a reaction to social media contents, the “Likes” are critical. Indeed, they convey preferences, which drive existing markets or allow the creation of new ones. Nevertheless, the appreciation indicators have some complex features, as for example the interpretation of the absence of “Likes”. In this case, the lack of approval may be considered as a specific behaviour. The present study aimed to define whether the absence of Likes may indicate the presence of a specific behaviour through the contextualization of the treatment of missing data applied to real cases. We provided a practical strategy for extracting more knowledge from social media data, whose synthesis raises several measurement problems. We proposed an approach based on the disambiguation of missing data in two modalities: “Dislike” and “Nothing”. Finally, a data pre-processing technique was suggested to increase the signal of social media data.


2011 ◽  
Vol 328-330 ◽  
pp. 2400-2404
Author(s):  
Zi Qi Ju

To prevent runway incursions, we should have the corresponding systematic prevent ideas. Based on the definition of runway incursions and classification of relevant criteria, it analyzed the runway incursion system, put forward the closed-loop management ideas to prevent runway incursions, and found the main contradictions of preventing runway incursions using the gray relational analysis. With the example of runway incursion dates of U.S.A, by means of Grey Relational Analysis of different severities and different factors for runway incursions, it have shown that the key factors leading to the class AB and class CD runway incursions are Vehicle/Pedestrian Deviations and Pilot Deviations respectively. Meanwhile, it proposed integrated prevention measures of runway incursions.


Rheumatology ◽  
2021 ◽  
Vol 60 (Supplement_1) ◽  
Author(s):  
Alice Gottlieb ◽  
Frank Behrens ◽  
Peter Nash ◽  
Joseph F Merola ◽  
Pascale Pellet ◽  
...  

Abstract Background/Aims  Psoriatic arthritis (PsA) is a heterogeneous disease comprising musculoskeletal and dermatological manifestations, especially plaque psoriasis. Secukinumab, an interleukin17A inhibitor, provided significantly greater PASI75/100 responses in two head-to-head trials versus etanercept or ustekinumab, a tumour necrosis factor inhibitor (TNFi), in patients with moderate-to-severe plaque psoriasis. The EXCEED study (NCT02745080) investigated whether secukinumab was superior to adalimumab, another TNFi, as monotherapy in biologic-naive active PsA patients with active plaque psoriasis (defined as having ≥1 psoriatic plaque of ≥ 2 cm diameter, nail changes consistent with psoriasis or documented history of plaque psoriasis). Here we report the pre-specified skin outcomes from the EXCEED study in the subset of patients with ≥3% body surface area (BSA) affected with psoriasis at baseline. Methods  In this head-to-head, Phase 3b, randomised, double-blind, active-controlled, multicentre, parallel-group trial, patients were randomised to receive subcutaneous secukinumab 300 mg at baseline and Weeks 1-4, followed by dosing every 4 weeks until Week 48, or subcutaneous adalimumab 40 mg at baseline followed by the same dosing every 2 weeks until Week 50. The primary endpoint was superiority of secukinumab versus adalimumab on ACR20 response at Week 52. Pre-specified outcomes included the proportion of patients achieving a combined ACR50 and PASI100 response, PASI100 response, and absolute PASI score ≤3. Missing data were handled using multiple imputation. Results  Overall, 853 patients were randomised to receive secukinumab (n = 426) or adalimumab (n = 427). At baseline, 215 and 202 patients had at least 3% BSA affected with psoriasis in the secukinumab and adalimumab groups, respectively. At Week 52, more patients achieved simultaneous improvement in ACR50 and PASI100 response with secukinumab versus adalimumab (30.7% versus 19.2%, respectively; P = 0.0087). Greater efficacy was demonstrated for secukinumab versus adalimumab for PASI100 responses and for the proportion of patients achieving absolute PASI score ≤3 (Table 1). Conclusion  In this pre-specified analysis, secukinumab provided higher responses compared with adalimumab in achievement of combined improvement in joint and skin disease (combined ACR50 and PASI100 response) and in skin-specific endpoints (PASI100 and absolute PASI score ≤3) at Week 52. P189 Table 1:Skin-specific outcomes at Week 52Endpoints, % responseSEC 300 mg (N = 215)ADA 40 mg (N = 202)P value (unadjusted)PASI10046300.0007Combined ACR50 and PASI10031190.0087Absolute PASI score ≤379650.0015P value vs ADA; unadjusted P values are presented. Multiple imputation was used for handling missing data. ADA, adalimumab; ACR, American College of Rheumatology; N, number of patients in the psoriasis subset; PASI, Psoriasis Area and Severity Index; SEC, secukinumab. Disclosure  A. Gottlieb: Grants/research support; A.G. has received research support, consultation fees or speaker honoraria from Pfizer, AbbVie, BMS, Lilly, MSD, Novartis, Roche, Sanofi, Sandoz, Nordic, Celltrion and UCB. F. Behrens: Consultancies; F.B. is a consultant for Pfizer, AbbVie, Sanofi, Lilly, Novartis, Genzyme, Boehringer Ingelheim, Janssen, MSD, Celgene, Roche and Chugai. Grants/research support; F.B. has received grant/research support from Pfizer, Janssen, Chugai, Celgene, Lilly and Roche. P. Nash: Consultancies; P.N. is a consultant for AbbVie, Bristol Myers Squibb, Celgene, Eli Lilly, Gilead, Janssen, MSD, Novartis, Pfizer Inc., Roche, Sanofi and UCB. Member of speakers’ bureau; for AbbVie, Bristol Myers Squibb, Celgene, Eli Lilly, Gilead, Janssen, MSD, Novartis, Pfizer Inc., Roche, Sanofi and UCB. Grants/research support; P.N. has received research support from AbbVie, Bristol Myers Squibb, Celgene, Eli Lilly and Company, Gilead, Janssen, MSD, Novartis, Pfizer Inc, Roche, Sanofi and UCB. J. Merola: Consultancies; J.F.M. is a consultant for Merck, AbbVie, Dermavant, Eli Lilly, Novartis, Janssen, UCB Pharma, Celgene, Sanofi, Regeneron, Arena, Sun Pharma, Biogen, Pfizer, EMD Sorono, Avotres and LEO Pharma. P. Pellet: Corporate appointments; P.P. is an employee of Novartis. Shareholder/stock ownership; P.P. is a shareholder of Novartis. L. Pricop: Corporate appointments; L.P. is an employee of Novartis. Shareholder/stock ownership; L.P. is a shareholder of Novartis. I. McInnes: Consultancies; I.M. is a consultant for AbbVie, Bristol Myers Squibb, Celgene, Eli Lilly and Company, Gilead, Janssen, Novartis, Pfizer and UCB. Grants/research support; I.M. has received grant/research support from Bristol Myers Squibb, Celgene, Eli Lilly and Company, Janssen and UCB.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Nishith Kumar ◽  
Md. Aminul Hoque ◽  
Masahiro Sugimoto

AbstractMass spectrometry is a modern and sophisticated high-throughput analytical technique that enables large-scale metabolomic analyses. It yields a high-dimensional large-scale matrix (samples × metabolites) of quantified data that often contain missing cells in the data matrix as well as outliers that originate for several reasons, including technical and biological sources. Although several missing data imputation techniques are described in the literature, all conventional existing techniques only solve the missing value problems. They do not relieve the problems of outliers. Therefore, outliers in the dataset decrease the accuracy of the imputation. We developed a new kernel weight function-based proposed missing data imputation technique that resolves the problems of missing values and outliers. We evaluated the performance of the proposed method and other conventional and recently developed missing imputation techniques using both artificially generated data and experimentally measured data analysis in both the absence and presence of different rates of outliers. Performances based on both artificial data and real metabolomics data indicate the superiority of our proposed kernel weight-based missing data imputation technique to the existing alternatives. For user convenience, an R package of the proposed kernel weight-based missing value imputation technique was developed, which is available at https://github.com/NishithPaul/tWLSA.


2013 ◽  
Vol 401-403 ◽  
pp. 1766-1771 ◽  
Author(s):  
Lan Kou ◽  
Si Rui Chen ◽  
Rui Wang

Multipath Transmission Control Protocol (MPTCP), a transport layer protocol, proposed by the IETF working group in 2009, can provide multipath communication end to end. It also can improve the utilization of network resources and network transmission reliability. However, that how to select multiple paths to improve the end to end overall throughput, and how to avoid the throughput declining by the performance difference, become the focus of this study. We propose a path selection strategy based on improved gray relational analysis, and set the optimal values of the QoS parameters for the selected paths as the reference sequence. According to the value of improved grey relational degree (IGRD) which is compared with reference sequence, we select the paths with better performance, smaller difference for transmission.


Author(s):  
Caio Ribeiro ◽  
Alex A. Freitas

AbstractLongitudinal datasets of human ageing studies usually have a high volume of missing data, and one way to handle missing values in a dataset is to replace them with estimations. However, there are many methods to estimate missing values, and no single method is the best for all datasets. In this article, we propose a data-driven missing value imputation approach that performs a feature-wise selection of the best imputation method, using known information in the dataset to rank the five methods we selected, based on their estimation error rates. We evaluated the proposed approach in two sets of experiments: a classifier-independent scenario, where we compared the applicabilities and error rates of each imputation method; and a classifier-dependent scenario, where we compared the predictive accuracy of Random Forest classifiers generated with datasets prepared using each imputation method and a baseline approach of doing no imputation (letting the classification algorithm handle the missing values internally). Based on our results from both sets of experiments, we concluded that the proposed data-driven missing value imputation approach generally resulted in models with more accurate estimations for missing data and better performing classifiers, in longitudinal datasets of human ageing. We also observed that imputation methods devised specifically for longitudinal data had very accurate estimations. This reinforces the idea that using the temporal information intrinsic to longitudinal data is a worthwhile endeavour for machine learning applications, and that can be achieved through the proposed data-driven approach.


Sign in / Sign up

Export Citation Format

Share Document