A Comparative Study on TIBA Imputation Methods in FCMdd-Based Linear Clustering with Relational Data

Advances in Fuzzy Systems ◽

10.1155/2011/265170 ◽

2011 ◽

Vol 2011 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Takeshi Yamamoto ◽

Katsuhiro Honda ◽

Akira Notsu ◽

Hidetomo Ichihashi

Keyword(s):

Comparative Study ◽

Iterative Algorithm ◽

Fuzzy Clustering ◽

Incomplete Data ◽

Numerical Experiments ◽

Missing Values ◽

Relational Data ◽

Imputation Methods ◽

Clustering Model ◽

Linear Cluster

Relational fuzzy clustering has been developed for extracting intrinsic cluster structures of relational data and was extended to a linear fuzzy clustering model based on Fuzzyc-Medoids (FCMdd) concept, in which Fuzzyc-Means-(FCM-) like iterative algorithm was performed by defining linear cluster prototypes using two representative medoids for each line prototype. In this paper, the FCMdd-type linear clustering model is further modified in order to handle incomplete data including missing values, and the applicability of several imputation methods is compared. In several numerical experiments, it is demonstrated that some pre-imputation strategies contribute to properly selecting representative medoids of each cluster.

Download Full-text

Simultaneous Application of Fuzzy Clustering and Quantification with Incomplete Categorical Data

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2004.p0397 ◽

2004 ◽

Vol 8 (4) ◽

pp. 397-402 ◽

Cited By ~ 7

Author(s):

Katsuhiro Honda ◽

◽

Yoshihito Nakamura ◽

Hidetomo Ichihashi

Keyword(s):

Principal Component Analysis ◽

Loss Function ◽

Fuzzy Clustering ◽

Categorical Data ◽

Incomplete Data ◽

Numerical Experiments ◽

Missing Values ◽

Principal Component ◽

Homogeneity Analysis ◽

Simultaneous Application

This paper proposes the simultaneous application of homogeneity analysis and fuzzy clustering with incomplete data. Taking into account the similarity between the loss function for homogeneity analysis and the least squares criterion for principal component analysis, we define the new objective function in a formulation similar to linear fuzzy clustering with missing values. Numerical experiments demonstrate the feasibility of the proposed method.

Download Full-text

Non-Euclidean Extension of FCMdd-Based Linear Clustering for Relational Data

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2011.p1050 ◽

2011 ◽

Vol 15 (8) ◽

pp. 1050-1056 ◽

Cited By ~ 3

Author(s):

Takeshi Yamamoto ◽

◽

Katsuhiro Honda ◽

Akira Notsu ◽

Hidetomo Ichihashi

Keyword(s):

Fuzzy Clustering ◽

Real World ◽

Numerical Experiments ◽

Document Classification ◽

Classification Task ◽

Relational Data ◽

Real World Applications ◽

Data Elements ◽

Euclidean Type ◽

Linear Cluster

Relational data is common in many real-world applications. Linear fuzzy clustering models have been extended for handling relational data based on Fuzzyc-Medoids (FCMdd) framework. In this paper, with the goal being to handle non-Euclidean data, β-spread transformation of relational data matrices used in Non-Euclidean-type Relational Fuzzy (NERF)c-means is applied before FCMdd-type linear cluster extraction. β-spread transformation modifies data elements to avoid negative values for clustering criteria of distances between objects and linear prototypes. In numerical experiments, typical features of the proposed approach are demonstrated not only using artificially generated data but also in a document classification task with a document-keyword co-occurrence relation.

Download Full-text

Entropy-Regularized Fuzzy Clustering for Non-Euclidean Relational Data and Indefinite Kernel Data

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2012.p0784 ◽

2012 ◽

Vol 16 (7) ◽

pp. 784-792 ◽

Cited By ~ 8

Author(s):

Yuchi Kanzawa ◽

Keyword(s):

Fuzzy Clustering ◽

Numerical Experiments ◽

Standard Approach ◽

Relational Data ◽

Data Types ◽

Indefinite Kernel ◽

Clustering Approach ◽

Theoretical Results

In this paper, an entropy-regularized fuzzy clustering approach for non-Euclidean relational data and indefinite kernel data is developed that has not previously been discussed. It is important because relational data and kernel data are not always Euclidean and positive semi-definite, respectively. It is theoretically determined that an entropy-regularized approach for both non-Euclidean relational data and indefinite kernel data can be applied without using a β-spread transformation, and that two other options make the clustering results crisp for both data types. These results are in contrast to those from the standard approach. Numerical experiments are employed to verify the theoretical results, and the clustering accuracy of three entropy-regularized approaches for non-Euclidean relational data, and three for indefinite kernel data, is compared.

Download Full-text

Handling incomplete data classification using imputed feature selected bagging (IFBag) method

Intelligent Data Analysis ◽

10.3233/ida-205331 ◽

2021 ◽

Vol 25 (4) ◽

pp. 825-846

Author(s):

Ahmad Jaffar Khan ◽

Basit Raza ◽

Ahmad Raza Shahid ◽

Yogan Jaya Kumar ◽

Muhammad Faheem ◽

...

Keyword(s):

Multiple Imputation ◽

Ensemble Learning ◽

Incomplete Data ◽

Missing Values ◽

Learning Approach ◽

Imputation Methods ◽

Real World Datasets ◽

Almost All ◽

Bagging Ensemble

Almost all real-world datasets contain missing values. Classification of data with missing values can adversely affect the performance of a classifier if not handled correctly. A common approach used for classification with incomplete data is imputation. Imputation transforms incomplete data with missing values to complete data. Single imputation methods are mostly less accurate than multiple imputation methods which are often computationally much more expensive. This study proposes an imputed feature selected bagging (IFBag) method which uses multiple imputation, feature selection and bagging ensemble learning approach to construct a number of base classifiers to classify new incomplete instances without any need for imputation in testing phase. In bagging ensemble learning approach, data is resampled multiple times with substitution, which can lead to diversity in data thus resulting in more accurate classifiers. The experimental results show the proposed IFBag method is considerably fast and gives 97.26% accuracy for classification with incomplete data as compared to common methods used.

Download Full-text

An Efficient and Effective Model to Handle Missing Data in Classification

BioMed Research International ◽

10.1155/2020/8810143 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Kamran Mehrabani-Zeinabad ◽

Marziyeh Doostfatemeh ◽

Seyyed Mohammad Taghi Ayatollahi

Keyword(s):

Missing Data ◽

Incomplete Data ◽

Missing Values ◽

Computational Time ◽

Medical Sciences ◽

Imputation Methods ◽

Simulation Based ◽

Additive Regression ◽

Incomplete Datasets

Missing data is one of the most important causes in reduction of classification accuracy. Many real datasets suffer from missing values, especially in medical sciences. Imputation is a common way to deal with incomplete datasets. There are various imputation methods that can be applied, and the choice of the best method depends on the dataset conditions such as sample size, missing percent, and missing mechanism. Therefore, the better solution is to classify incomplete datasets without imputation and without any loss of information. The structure of the “Bayesian additive regression trees” (BART) model is improved with the “Missingness Incorporated in Attributes” approach to solve its inefficiency in handling the missingness problem. Implementation of MIA-within-BART is named “BART.m”. As the abilities of BART.m are not investigated in classification of incomplete datasets, this simulation-based study aimed to provide such resource. The results indicate that BART.m can be used even for datasets with 90 missing present and more importantly, it diagnoses the irrelevant variables and removes them by its own. BART.m outperforms common models for classification with incomplete data, according to accuracy and computational time. Based on the revealed properties, it can be said that BART.m is a high accuracy model in classification of incomplete datasets which avoids any assumptions and preprocess steps.

Download Full-text

A comparative study of evaluating missing value imputation methods in label-free proteomics

Scientific Reports ◽

10.1038/s41598-021-81279-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Liang Jin ◽

Yingtao Bi ◽

Chenqi Hu ◽

Jun Qu ◽

Shichen Shen ◽

...

Keyword(s):

Comparative Study ◽

Large Scale ◽

Cell Activation ◽

Missing Values ◽

Immune Cell ◽

Benchmark Dataset ◽

Label Free ◽

Imputation Methods ◽

Signature Genes ◽

Selection Of

AbstractThe presence of missing values (MVs) in label-free quantitative proteomics greatly reduces the completeness of data. Imputation has been widely utilized to handle MVs, and selection of the proper method is critical for the accuracy and reliability of imputation. Here we present a comparative study that evaluates the performance of seven popular imputation methods with a large-scale benchmark dataset and an immune cell dataset. Simulated MVs were incorporated into the complete part of each dataset with different combinations of MV rates and missing not at random (MNAR) rates. Normalized root mean square error (NRMSE) was applied to evaluate the accuracy of protein abundances and intergroup protein ratios after imputation. Detection of true positives (TPs) and false altered-protein discovery rate (FADR) between groups were also compared using the benchmark dataset. Furthermore, the accuracy of handling real MVs was assessed by comparing enriched pathways and signature genes of cell activation after imputing the immune cell dataset. We observed that the accuracy of imputation is primarily affected by the MNAR rate rather than the MV rate, and downstream analysis can be largely impacted by the selection of imputation methods. A random forest-based imputation method consistently outperformed other popular methods by achieving the lowest NRMSE, high amount of TPs with the average FADR < 5%, and the best detection of relevant pathways and signature genes, highlighting it as the most suitable method for label-free proteomics.

Download Full-text

Comparision Between Accuracy and MSE,RMSE by Using Proposed Method with Imputation Technique

Oriental journal of computer science and technology ◽

10.13005/ojcst/10.04.11 ◽

2017 ◽

Vol 10 (04) ◽

pp. 773-779

Author(s):

V.B. Kamble ◽

S.N. Deshmukh

Keyword(s):

Data Mining ◽

Data Analysis ◽

Incomplete Data ◽

Missing Values ◽

Mean Squared Error ◽

Research Work ◽

Imputation Methods ◽

Squared Error ◽

Simple Imputation ◽

Work Student

Presence of missing values in the dataset leads to difficult for data analysis in data mining task. In this research work, student dataset is taken contains marks of four different subjects in engineering college. Mean, Mode, Median Imputation were used to deal with challenges of incomplete data. By using MSE and RMSE on dataset using with proposed Method and imputation methods like Mean, Mode, and Median Imputation on the dataset and found out to be values of Mean Squared Error and Root Mean Squared Error for the dataset. Accuracy also found out to be using Proposed Method with Imputation Technique. Experimental observation it was found that, MSE and RMSE gradually decreases when size of the databases is gradually increases by using proposed Method. Also MSE and RMSE gradually increase when size of the databases is gradually increases by using simple imputation technique. Accuracy is also increases with increases size of the databases.

Download Full-text

Visualization of Non-Euclidean Relational Data by Robust Linear Fuzzy Clustering Based on FCMdd Framework

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2013.p0312 ◽

2013 ◽

Vol 17 (2) ◽

pp. 312-317 ◽

Cited By ~ 2

Author(s):

Katsuhiro Honda ◽

◽

Takeshi Yamamoto ◽

Akira Notsu ◽

Hidetomo Ichihashi

Keyword(s):

Fuzzy Clustering ◽

Relational Data ◽

Feature Maps ◽

Robust Clustering ◽

Dimensional Scaling ◽

Clustering Model ◽

Local Linear ◽

Multi Dimensional Scaling ◽

Data Matrices

Visualization is a fundamental approach for revealing intrinsic structures in multidimensional observation. This paper considers visualization of non-Euclidean relational data by extracting local linear substructures. In order to extract robust linear clusters, an FCMdd-based linear fuzzy clustering model is applied in conjunction with a robust measure of alternativec-means. Non-Euclidean data matrices are handled with β-spread transformation in a manner similar to that of NERFc-Means. In several experiments, robust feature maps derived by the robust clustering model are compared with feature maps given by the conventional clustering model and Multi-Dimensional Scaling (MDS).

Download Full-text

Normalization and Outlier Removal in Class Center-Based Firefly Algorithm for Missing Value Imputation

10.21203/rs.3.rs-538193/v1 ◽

2021 ◽

Author(s):

Heru Nugroho ◽

Nugraha Priya Utama ◽

Kridanto Surendro

Keyword(s):

Incomplete Data ◽

Missing Values ◽

Classification Algorithms ◽

Data Normalization ◽

Outlier Removal ◽

Processing Stage ◽

Missing Value ◽

Imputation Methods ◽

Knn Classifier ◽

True Values

Abstract Missing data is one of the factors often causing incomplete data in research. Data normalization and missing value handling were considered major problems in the data pre-processing stage, while classification algorithms were adopted to handle numerical features. Furthermore, in cases where the observed data contains outliers, the missing values’ estimated results are sometimes unreliable, or even differ greatly from the true values. This study aims to proposed combination of normalization and outlier removal’s before imputing missing values using several methods, mean, random value, regression, multiple imputation, KNN, and C3-FA. Experimental results on the sonar dataset show normalization and outlier removal’s effect in these imputation methods. In the proposed C3-FA method, this produced accuracy, F1-Score, Precision, and Recall values of 0.906, 0.906, 0.908, and 0.906, respectively. Based on the KNN classifier evaluation results, this value outperformed the other five (5) methods. Meanwhile, the results for RMSE, Dks, and r obtained from combining normalization and outlier removal’s in the C3-FA method were 0.02, 0.04, and 0.935, respectively. This shows that the proposed method is able to reproduce the real values of the data or the prediction accuracy and maintain the distribution of the values or the distribution accuracy.

Download Full-text

A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/58/1/012017 ◽

2017 ◽

Vol 58 ◽

pp. 012017

Author(s):

Y Susianto ◽

K A Notodiputro ◽

A Kurnia ◽

H Wijayanto

Keyword(s):

Comparative Study ◽

Missing Values ◽

Imputation Methods ◽

Capita Expenditure ◽

Central Java ◽

Per Capita

Download Full-text