Clustering-Based Multiple Imputation via Gray Relational Analysis for Missing Data and Its Application to Aerospace Field

The Scientific World JOURNAL ◽

10.1155/2013/720392 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10 ◽

Cited By ~ 5

Author(s):

Jing Tian ◽

Bing Yu ◽

Dan Yu ◽

Shilong Ma

Keyword(s):

Missing Data ◽

Data Quality ◽

Knowledge Discovery ◽

Multiple Imputation ◽

Industrial Applications ◽

Gray Relational Analysis ◽

Missing Value ◽

Similarity Metric ◽

Relational Analysis ◽

Data Completion

A large number of scientific researches and industrial applications commonly suffer from missing data. Some inappropriate techniques of missing value treatment compromise data quality, which detrimentally influences the knowledge discovery. In this paper, we propose a missing data completion method named CBGMI. Firstly, it separates the nonmissing data instances into several clusters by excluding the missing-valued entries. Then, it utilizes the entropy of the proximal category for each incomplete instance in terms of the similarity metric based on gray relational analysis. Experiments on UCI datasets and aerospace datasets demonstrate that the superiority of our algorithm to other approaches on validity.

Download Full-text

Using the CES-D scale in a large cohort study and dealing with missing data: Application to the French E3N cohort

European Psychiatry ◽

10.1016/s0924-9338(11)72279-9 ◽

2011 ◽

Vol 26 (S2) ◽

pp. 572-572

Author(s):

N. Resseguier ◽

H. Verdoux ◽

F. Clavel-Chapelon ◽

X. Paoletti

Keyword(s):

Sensitivity Analysis ◽

Missing Data ◽

Multiple Imputation ◽

Missing Values ◽

Large Population ◽

Missing At Random ◽

Population Based ◽

Missing Value ◽

Perform Sensitivity Analysis ◽

The Impact

IntroductionThe CES-D scale is commonly used to assess depressive symptoms (DS) in large population-based studies. Missing values in items of the scale may create biases.ObjectivesTo explore reasons for not completing items of the CES-D scale and to perform sensitivity analysis of the prevalence of DS to assess the impact of different missing data hypotheses.Methods71412 women included in the French E3N cohort returned in 2005 a questionnaire containing the CES-D scale. 45% presented at least one missing value in the scale. An interview study was carried out on a random sample of 204 participants to examine the different hypotheses for the missing value mechanism. The prevalence of DS was estimated according to different methods for handling missing values: complete cases analysis, single imputation, multiple imputation under MAR (missing at random) and MNAR (missing not at random) assumptions.ResultsThe interviews showed that participants were not embarrassed to fill in questions about DS. Potential reasons of nonresponse were identified. MAR and MNAR hypotheses remained plausible and were explored.Among complete responders, the prevalence of DS was 26.1%. After multiple imputation under MAR assumption, it was 28.6%, 29.8% and 31.7% among women presenting up to 4, to 10 and to 20 missing values, respectively. The estimates were robust after applying various scenarios of MNAR data for the sensitivity analysis.ConclusionsThe CES-D scale can easily be used to assess DS in large cohorts. Multiple imputation under MAR assumption allows to reliably handle missing values.

Download Full-text

The Comparison of Elementaπ Teachers' Longitudinal Advice Network Missing Data Analysis: Based on Multiple Imputation when Missing Not At idom(MNAR)

Korean Society for Educational Evaluation ◽

10.31158/jeev.2019.32.4.671 ◽

2019 ◽

Vol 32 (4) ◽

pp. 671-703

Author(s):

Chong Min Kim

Keyword(s):

Data Analysis ◽

Missing Data ◽

Multiple Imputation ◽

Missing Data Analysis

Download Full-text

Missing value or behaviour: how to increase the signal of social media data

METRON ◽

10.1007/s40300-021-00216-7 ◽

2021 ◽

Author(s):

Paolo Mariani ◽

Andrea Marletta

Keyword(s):

Social Media ◽

Missing Data ◽

Everyday Life ◽

Processing Technique ◽

Missing Value ◽

Social Media Data ◽

Practical Strategy ◽

Specific Behaviour ◽

Complex Features ◽

Media Data

AbstractSocial media has become a widespread element of people’s everyday life, which is used to communicate and generate contents. Among the several ways to express a reaction to social media contents, the “Likes” are critical. Indeed, they convey preferences, which drive existing markets or allow the creation of new ones. Nevertheless, the appreciation indicators have some complex features, as for example the interpretation of the absence of “Likes”. In this case, the lack of approval may be considered as a specific behaviour. The present study aimed to define whether the absence of Likes may indicate the presence of a specific behaviour through the contextualization of the treatment of missing data applied to real cases. We provided a practical strategy for extracting more knowledge from social media data, whose synthesis raises several measurement problems. We proposed an approach based on the disambiguation of missing data in two modalities: “Dislike” and “Nothing”. Finally, a data pre-processing technique was suggested to increase the signal of social media data.

Download Full-text

Reason Analysis of Runway Incursions Based on Grey Theory

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.328-330.2400 ◽

2011 ◽

Vol 328-330 ◽

pp. 2400-2404

Author(s):

Zi Qi Ju

Keyword(s):

Gray Relational Analysis ◽

Grey Theory ◽

Prevention Measures ◽

Key Factors ◽

Class Ab ◽

Relational Analysis ◽

Definition Of ◽

Grey Relational ◽

Runway Incursions ◽

Runway Incursion

To prevent runway incursions, we should have the corresponding systematic prevent ideas. Based on the definition of runway incursions and classification of relevant criteria, it analyzed the runway incursion system, put forward the closed-loop management ideas to prevent runway incursions, and found the main contradictions of preventing runway incursions using the gray relational analysis. With the example of runway incursion dates of U.S.A, by means of Grey Relational Analysis of different severities and different factors for runway incursions, it have shown that the key factors leading to the class AB and class CD runway incursions are Vehicle/Pedestrian Deviations and Pilot Deviations respectively. Meanwhile, it proposed integrated prevention measures of runway incursions.

Download Full-text

P189 Comparison of secukinumab versus adalimumab efficacy on skin outcomes in psoriatic arthritis: 52-week results from the EXCEED study

Rheumatology ◽

10.1093/rheumatology/keab247.184 ◽

2021 ◽

Vol 60 (Supplement_1) ◽

Author(s):

Alice Gottlieb ◽

Frank Behrens ◽

Peter Nash ◽

Joseph F Merola ◽

Pascale Pellet ◽

...

Keyword(s):

Psoriatic Arthritis ◽

Missing Data ◽

Multiple Imputation ◽

Plaque Psoriasis ◽

Stock Ownership ◽

Research Support ◽

Acr20 Response ◽

Number Of Patients ◽

Pasi Score ◽

To Receive

Abstract Background/Aims Psoriatic arthritis (PsA) is a heterogeneous disease comprising musculoskeletal and dermatological manifestations, especially plaque psoriasis. Secukinumab, an interleukin17A inhibitor, provided significantly greater PASI75/100 responses in two head-to-head trials versus etanercept or ustekinumab, a tumour necrosis factor inhibitor (TNFi), in patients with moderate-to-severe plaque psoriasis. The EXCEED study (NCT02745080) investigated whether secukinumab was superior to adalimumab, another TNFi, as monotherapy in biologic-naive active PsA patients with active plaque psoriasis (defined as having ≥1 psoriatic plaque of ≥ 2 cm diameter, nail changes consistent with psoriasis or documented history of plaque psoriasis). Here we report the pre-specified skin outcomes from the EXCEED study in the subset of patients with ≥3% body surface area (BSA) affected with psoriasis at baseline. Methods In this head-to-head, Phase 3b, randomised, double-blind, active-controlled, multicentre, parallel-group trial, patients were randomised to receive subcutaneous secukinumab 300 mg at baseline and Weeks 1-4, followed by dosing every 4 weeks until Week 48, or subcutaneous adalimumab 40 mg at baseline followed by the same dosing every 2 weeks until Week 50. The primary endpoint was superiority of secukinumab versus adalimumab on ACR20 response at Week 52. Pre-specified outcomes included the proportion of patients achieving a combined ACR50 and PASI100 response, PASI100 response, and absolute PASI score ≤3. Missing data were handled using multiple imputation. Results Overall, 853 patients were randomised to receive secukinumab (n = 426) or adalimumab (n = 427). At baseline, 215 and 202 patients had at least 3% BSA affected with psoriasis in the secukinumab and adalimumab groups, respectively. At Week 52, more patients achieved simultaneous improvement in ACR50 and PASI100 response with secukinumab versus adalimumab (30.7% versus 19.2%, respectively; P = 0.0087). Greater efficacy was demonstrated for secukinumab versus adalimumab for PASI100 responses and for the proportion of patients achieving absolute PASI score ≤3 (Table 1). Conclusion In this pre-specified analysis, secukinumab provided higher responses compared with adalimumab in achievement of combined improvement in joint and skin disease (combined ACR50 and PASI100 response) and in skin-specific endpoints (PASI100 and absolute PASI score ≤3) at Week 52. P189 Table 1:Skin-specific outcomes at Week 52Endpoints, % responseSEC 300 mg (N = 215)ADA 40 mg (N = 202)P value (unadjusted)PASI10046300.0007Combined ACR50 and PASI10031190.0087Absolute PASI score ≤379650.0015P value vs ADA; unadjusted P values are presented. Multiple imputation was used for handling missing data. ADA, adalimumab; ACR, American College of Rheumatology; N, number of patients in the psoriasis subset; PASI, Psoriasis Area and Severity Index; SEC, secukinumab. Disclosure A. Gottlieb: Grants/research support; A.G. has received research support, consultation fees or speaker honoraria from Pfizer, AbbVie, BMS, Lilly, MSD, Novartis, Roche, Sanofi, Sandoz, Nordic, Celltrion and UCB. F. Behrens: Consultancies; F.B. is a consultant for Pfizer, AbbVie, Sanofi, Lilly, Novartis, Genzyme, Boehringer Ingelheim, Janssen, MSD, Celgene, Roche and Chugai. Grants/research support; F.B. has received grant/research support from Pfizer, Janssen, Chugai, Celgene, Lilly and Roche. P. Nash: Consultancies; P.N. is a consultant for AbbVie, Bristol Myers Squibb, Celgene, Eli Lilly, Gilead, Janssen, MSD, Novartis, Pfizer Inc., Roche, Sanofi and UCB. Member of speakers’ bureau; for AbbVie, Bristol Myers Squibb, Celgene, Eli Lilly, Gilead, Janssen, MSD, Novartis, Pfizer Inc., Roche, Sanofi and UCB. Grants/research support; P.N. has received research support from AbbVie, Bristol Myers Squibb, Celgene, Eli Lilly and Company, Gilead, Janssen, MSD, Novartis, Pfizer Inc, Roche, Sanofi and UCB. J. Merola: Consultancies; J.F.M. is a consultant for Merck, AbbVie, Dermavant, Eli Lilly, Novartis, Janssen, UCB Pharma, Celgene, Sanofi, Regeneron, Arena, Sun Pharma, Biogen, Pfizer, EMD Sorono, Avotres and LEO Pharma. P. Pellet: Corporate appointments; P.P. is an employee of Novartis. Shareholder/stock ownership; P.P. is a shareholder of Novartis. L. Pricop: Corporate appointments; L.P. is an employee of Novartis. Shareholder/stock ownership; L.P. is a shareholder of Novartis. I. McInnes: Consultancies; I.M. is a consultant for AbbVie, Bristol Myers Squibb, Celgene, Eli Lilly and Company, Gilead, Janssen, Novartis, Pfizer and UCB. Grants/research support; I.M. has received grant/research support from Bristol Myers Squibb, Celgene, Eli Lilly and Company, Janssen and UCB.

Download Full-text

Kernel weighted least square approach for imputing missing values of metabolomics data

Scientific Reports ◽

10.1038/s41598-021-90654-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Nishith Kumar ◽

Md. Aminul Hoque ◽

Masahiro Sugimoto

Keyword(s):

Missing Data ◽

Large Scale ◽

Missing Values ◽

Kernel Weight ◽

Least Square ◽

Data Matrix ◽

Data Imputation ◽

Metabolomics Data ◽

Missing Value ◽

Missing Data Imputation

AbstractMass spectrometry is a modern and sophisticated high-throughput analytical technique that enables large-scale metabolomic analyses. It yields a high-dimensional large-scale matrix (samples × metabolites) of quantified data that often contain missing cells in the data matrix as well as outliers that originate for several reasons, including technical and biological sources. Although several missing data imputation techniques are described in the literature, all conventional existing techniques only solve the missing value problems. They do not relieve the problems of outliers. Therefore, outliers in the dataset decrease the accuracy of the imputation. We developed a new kernel weight function-based proposed missing data imputation technique that resolves the problems of missing values and outliers. We evaluated the performance of the proposed method and other conventional and recently developed missing imputation techniques using both artificially generated data and experimentally measured data analysis in both the absence and presence of different rates of outliers. Performances based on both artificial data and real metabolomics data indicate the superiority of our proposed kernel weight-based missing data imputation technique to the existing alternatives. For user convenience, an R package of the proposed kernel weight-based missing value imputation technique was developed, which is available at https://github.com/NishithPaul/tWLSA.

Download Full-text

A MPTCP Path Selection Strategy Based on Improved Grey Relational Analysis

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.401-403.1766 ◽

2013 ◽

Vol 401-403 ◽

pp. 1766-1771 ◽

Cited By ~ 2

Author(s):

Lan Kou ◽

Si Rui Chen ◽

Rui Wang

Keyword(s):

Transmission Control Protocol ◽

Path Selection ◽

Transport Layer ◽

Reference Sequence ◽

Gray Relational Analysis ◽

Selection Strategy ◽

Relational Analysis ◽

Multiple Paths ◽

End To End ◽

Grey Relational

Multipath Transmission Control Protocol (MPTCP), a transport layer protocol, proposed by the IETF working group in 2009, can provide multipath communication end to end. It also can improve the utilization of network resources and network transmission reliability. However, that how to select multiple paths to improve the end to end overall throughput, and how to avoid the throughput declining by the performance difference, become the focus of this study. We propose a path selection strategy based on improved gray relational analysis, and set the optimal values of the QoS parameters for the selected paths as the reference sequence. According to the value of improved grey relational degree (IGRD) which is compared with reference sequence, we select the paths with better performance, smaller difference for transmission.

Download Full-text

Improving Power Grid Monitoring Data Quality: An Efficient Machine Learning Framework for Missing Data Prediction

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems ◽

10.1109/hpcc-css-icess.2015.16 ◽

2015 ◽

Cited By ~ 10

Author(s):

Weiwei Shi ◽

Yongxin Zhu ◽

Jinkui Zhang ◽

Xiang Tao ◽

Gehao Sheng ◽

...

Keyword(s):

Machine Learning ◽

Missing Data ◽

Data Quality ◽

Power Grid ◽

Monitoring Data ◽

Learning Framework ◽

Data Prediction ◽

Grid Monitoring ◽

Efficient Machine ◽

Missing Data Prediction

Download Full-text

Gray relational analysis on airline employees' pay satisfaction and violations

2013 International Conference on Management Science and Engineering 20th Annual Conference Proceedings ◽

10.1109/icmse.2013.6586296 ◽

2013 ◽

Cited By ~ 1

Author(s):

Chen Xiao-qian ◽

Li Hui-tao

Keyword(s):

Gray Relational Analysis ◽

Pay Satisfaction ◽

Relational Analysis

Download Full-text

A data-driven missing value imputation approach for longitudinal datasets

Artificial Intelligence Review ◽

10.1007/s10462-021-09963-5 ◽

2021 ◽

Author(s):

Caio Ribeiro ◽

Alex A. Freitas

Keyword(s):

Missing Data ◽

Longitudinal Data ◽

Missing Values ◽

Error Rates ◽

Imputation Method ◽

Data Driven ◽

Missing Value ◽

Missing Value Imputation ◽

Human Ageing ◽

Imputation Approach

AbstractLongitudinal datasets of human ageing studies usually have a high volume of missing data, and one way to handle missing values in a dataset is to replace them with estimations. However, there are many methods to estimate missing values, and no single method is the best for all datasets. In this article, we propose a data-driven missing value imputation approach that performs a feature-wise selection of the best imputation method, using known information in the dataset to rank the five methods we selected, based on their estimation error rates. We evaluated the proposed approach in two sets of experiments: a classifier-independent scenario, where we compared the applicabilities and error rates of each imputation method; and a classifier-dependent scenario, where we compared the predictive accuracy of Random Forest classifiers generated with datasets prepared using each imputation method and a baseline approach of doing no imputation (letting the classification algorithm handle the missing values internally). Based on our results from both sets of experiments, we concluded that the proposed data-driven missing value imputation approach generally resulted in models with more accurate estimations for missing data and better performing classifiers, in longitudinal datasets of human ageing. We also observed that imputation methods devised specifically for longitudinal data had very accurate estimations. This reinforces the idea that using the temporal information intrinsic to longitudinal data is a worthwhile endeavour for machine learning applications, and that can be achieved through the proposed data-driven approach.

Download Full-text