REPMAC: A New Hybrid Approach to Highly Imbalanced Classification Problems

Author(s):  
Hernán Ahumada ◽  
Guillermo L. Grinblat ◽  
Lucas C. Uzal ◽  
Pablo M. Granitto ◽  
Alejandro Ceccatto
2020 ◽  
Vol 203 ◽  
pp. 106116
Author(s):  
Jianan Wei ◽  
Haisong Huang ◽  
Liguo Yao ◽  
Yao Hu ◽  
Qingsong Fan ◽  
...  

2011 ◽  
Vol 8 (4) ◽  
pp. 199-211
Author(s):  
Hernán Ahumada ◽  
Guillermo L. Grinblat ◽  
Lucas C. Uzal ◽  
Alejandro Ceccatto ◽  
Pablo M. Granitto

2018 ◽  
Vol 7 (2.14) ◽  
pp. 478 ◽  
Author(s):  
Hartono . ◽  
Opim Salim Sitompul ◽  
Erna Budhiarti Nababan ◽  
Tulus . ◽  
Dahlan Abdullah ◽  
...  

Data mining and machine learning techniques designed to solve classification problems require balanced class distribution. However, in reality sometimes the classification of datasets indicates the existence of a class represented by a large number of instances whereas there are classes with far fewer instances. This problem is known as the class imbalance problem. Classifier Ensembles is a method often used in overcoming class imbalance problems. Data Diversity is one of the cornerstones of ensembles. An ideal ensemble system should have accurrate individual classifiers and if there is an error it is expected to occur on different objects or instances. This research will present the results of overview and experimental study using Hybrid Approach Redefinition (HAR) Method in handling class imbalance and at the same time expected to get better data diversity. This research will be conducted using 6 datasets with different imbalanced ratios and will be compared with SMOTEBoost which is one of the Re-Weighting method which is often used in handling class imbalance. This study shows that the data diversity is related to performance in the imbalance learning ensembles and the proposed methods can obtain better data diversity.  


2016 ◽  
Vol 6 (3) ◽  
pp. 173-188 ◽  
Author(s):  
Vladimir Stanovov ◽  
Eugene Semenkin ◽  
Olga Semenkina

Abstract A novel approach for instance selection in classification problems is presented. This adaptive instance selection is designed to simultaneously decrease the amount of computation resources required and increase the classification quality achieved. The approach generates new training samples during the evolutionary process and changes the training set for the algorithm. The instance selection is guided by means of changing probabilities, so that the algorithm concentrates on problematic examples which are difficult to classify. The hybrid fuzzy classification algorithm with a self-configuration procedure is used as a problem solver. The classification quality is tested upon 9 problem data sets from the KEEL repository. A special balancing strategy is used in the instance selection approach to improve the classification quality on imbalanced datasets. The results prove the usefulness of the proposed approach as compared with other classification methods.


2018 ◽  
Author(s):  
Sebastian Bittrich ◽  
Marika Kaden ◽  
Christoph Leberecht ◽  
Florian Kaiser ◽  
Thomas Villmann ◽  
...  

AbstractBackgroundMachine learning strategies are prominent tools for data analysis. Especially in life sciences, they have become increasingly important to handle the growing datasets collected by the scientific community. Meanwhile, algorithms improve in performance, but also gain complexity, and tend to neglect interpretability and comprehensiveness of the resulting models.ResultsGeneralized Matrix Learning Vector Quantization (GMLVQ) is a supervised, prototype-based machine learning method and provides comprehensive visualization capabilities not present in other classifiers which allow for a fine-grained interpretation of the data. In contrast to commonly used machine learning strategies, GMLVQ is well-suited for imbalanced classification problems which are frequent in life sciences. We present a Weka plug-in implementing GMLVQ. The feasibility of GMLVQ is demonstrated on a dataset of Early Folding Residues (EFR) that have been shown to initiate and guide the protein folding process. Using 27 features, an area under the receiver operating characteristic of 76.6% was achieved which is comparable to other state-of-the-art classifiers.ConclusionsThe application on EFR prediction demonstrates how an easy interpretation of classification models can promote the comprehension of biological mechanisms. The results shed light on the special features of EFR which were reported as most influential for the classification: EFR are embedded in ordered secondary structure elements and they participate in networks of hydrophobic residues. Visualization capabilities of GMLVQ are presented as we demonstrate how to interpret the results.


Sign in / Sign up

Export Citation Format

Share Document