scholarly journals Improved PSO_AdaBoost Ensemble Algorithm for Imbalanced Data

Sensors ◽  
2019 ◽  
Vol 19 (6) ◽  
pp. 1476 ◽  
Author(s):  
Kewen Li ◽  
Guangyue Zhou ◽  
Jiannan Zhai ◽  
Fulai Li ◽  
Mingwen Shao

The Adaptive Boosting (AdaBoost) algorithm is a widely used ensemble learning framework, and it can get good classification results on general datasets. However, it is challenging to apply the AdaBoost algorithm directly to imbalanced data since it is designed mainly for processing misclassified samples rather than samples of minority classes. To better process imbalanced data, this paper introduces the indicator Area Under Curve (AUC) which can reflect the comprehensive performance of the model, and proposes an improved AdaBoost algorithm based on AUC (AdaBoost-A) which improves the error calculation performance of the AdaBoost algorithm by comprehensively considering the effects of misclassification probability and AUC. To prevent redundant or useless weak classifiers the traditional AdaBoost algorithm generated from consuming too much system resources, this paper proposes an ensemble algorithm, PSOPD-AdaBoost-A, which can re-initialize parameters to avoid falling into local optimum, and optimize the coefficients of AdaBoost weak classifiers. Experiment results show that the proposed algorithm is effective for processing imbalanced data, especially the data with relatively high imbalances.

2016 ◽  
Vol 26 (03) ◽  
pp. 1750007 ◽  
Author(s):  
S. Dinakaran ◽  
P. Ranjit Jeba Thangaiah

This article introduces a novel ensemble method named eAdaBoost (Effective Adaptive Boosting) is a meta classifier which is developed by enhancing the existing AdaBoost algorithm and to handle the time complexity and also to produce the best classification accuracy. The eAdaBoost reduces the error rate when compared with the existing methods and generates the best accuracy by reweighing each feature for further process. The comparison results of an extensive experimental evaluation of the proposed method are explained using the UCI machine learning repository datasets. The accuracy of the classifiers and statistical test comparisons are made with various boosting algorithms. The proposed eAdaBoost has been also implemented with different decision tree classifiers like C4.5, Decision Stump, NB Tree and Random Forest. The algorithm has been computed with various dataset, with different weight thresholds and the performance is analyzed. The proposed method produces better results using random forest and NB tree as base classifier than the decision stump and C4.5 classifiers for few datasets. The eAdaBoost gives better classification accuracy, and prediction accuracy, and execution time is also less when compared with other classifiers.


2021 ◽  
Author(s):  
Tetiana Biloborodova ◽  
Inna Skarga-Bandurova ◽  
Mark Koverha ◽  
Illia Skarha-Bandurov ◽  
Yelyzaveta Yevsieieva

Medical image classification and diagnosis based on machine learning has made significant achievements and gradually penetrated the healthcare industry. However, medical data characteristics such as relatively small datasets for rare diseases or imbalance in class distribution for rare conditions significantly restrains their adoption and reuse. Imbalanced datasets lead to difficulties in learning and obtaining accurate predictive models. This paper follows the FAIR paradigm and proposes a technique for the alignment of class distribution, which enables improving image classification performance in imbalanced data and ensuring data reuse. The experiments on the acne disease dataset support that the proposed framework outperforms the baselines and enable to achieve up to 5% improvement in image classification.


2017 ◽  
Author(s):  
Arne Ehlers

This dissertation addresses the problem of visual object detection based on machine-learned classifiers. A distributed machine learning framework is developed to learn detectors for several object classes creating cascaded ensemble classifiers by the Adaptive Boosting algorithm. Methods are proposed that enhance several components of an object detection framework: At first, the thesis deals with augmenting the training data in order to improve the performance of object detectors learned from sparse training sets. Secondly, feature mining strategies are introduced to create feature sets that are customized to the object class to be detected. Furthermore, a novel class of fractal features is proposed that allows to represent a wide variety of shapes. Thirdly, a method is introduced that models and combines internal confidences and uncertainties of the cascaded detector using Dempster’s theory of evidence in order to increase the quality of the post-processing. ...


2020 ◽  
Vol 34 (04) ◽  
pp. 6438-6445
Author(s):  
Yuan Wu ◽  
Yuhong Guo

With the advent of deep learning, the performance of text classification models have been improved significantly. Nevertheless, the successful training of a good classification model requires a sufficient amount of labeled data, while it is always expensive and time consuming to annotate data. With the rapid growth of digital data, similar classification tasks can typically occur in multiple domains, while the availability of labeled data can largely vary across domains. Some domains may have abundant labeled data, while in some other domains there may only exist a limited amount (or none) of labeled data. Meanwhile text classification tasks are highly domain-dependent — a text classifier trained in one domain may not perform well in another domain. In order to address these issues, in this paper we propose a novel dual adversarial co-learning approach for multi-domain text classification (MDTC). The approach learns shared-private networks for feature extraction and deploys dual adversarial regularizations to align features across different domains and between labeled and unlabeled data simultaneously under a discrepancy based co-learning framework, aiming to improve the classifiers' generalization capacity with the learned features. We conduct experiments on multi-domain sentiment classification datasets. The results show the proposed approach achieves the state-of-the-art MDTC performance.


Sign in / Sign up

Export Citation Format

Share Document