KGEARSRG: Kernel Graph Embedding on Attributed Relational SIFT-Based Regions Graph

Mario Manzo

doi:10.3390/make1030055

KGEARSRG: Kernel Graph Embedding on Attributed Relational SIFT-Based Regions Graph

Machine Learning and Knowledge Extraction ◽

10.3390/make1030055 ◽

2019 ◽

Vol 1 (3) ◽

pp. 962-973 ◽

Cited By ~ 1

Author(s):

Mario Manzo

Keyword(s):

Binary Classification ◽

Class Imbalance ◽

Graph Embedding ◽

Support Vector ◽

Class Imbalance Problem ◽

Scale Invariant ◽

Imbalance Problem ◽

Series Of Experiments ◽

Imbalanced Classes ◽

Scale Invariant Feature

In real world applications, binary classification is often affected by imbalanced classes. In this paper, a new methodology to solve the class imbalance problem that occurs in image classification is proposed. A digital image is described through a novel vector-based representation called Kernel Graph Embedding on Attributed Relational Scale-Invariant Feature Transform-based Regions Graph (KGEARSRG). A classification stage using a procedure based on support vector machines (SVMs) is organized. Methodology is evaluated through a series of experiments performed on art painting dataset images, affected by varying imbalance percentages. Experimental results show that the proposed approach consistently outperforms the competitors.

Download Full-text

Tackling Class Imbalance Problem in Binary Classification using Augmented Neighborhood Cleaning Algorithm

Lecture Notes in Electrical Engineering - Information Science and Applications ◽

10.1007/978-3-662-46578-3_98 ◽

2015 ◽

pp. 827-834 ◽

Cited By ~ 1

Author(s):

Nadyah Obaid Al Abdouli ◽

Zeyar Aung ◽

Wei Lee Woon ◽

Davor Svetinovic

Keyword(s):

Binary Classification ◽

Class Imbalance ◽

Class Imbalance Problem ◽

Imbalance Problem

Download Full-text

Biased support vector machine and weighted-smote in handling class imbalance problem

International Journal of Advances in Intelligent Informatics ◽

10.26555/ijain.v4i1.146 ◽

2018 ◽

Vol 4 (1) ◽

pp. 21 ◽

Cited By ~ 21

Author(s):

Hartono Hartono ◽

Opim Salim Sitompul ◽

Tulus Tulus ◽

Erna Budhiarti Nababan

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Data Distribution ◽

Class Imbalance ◽

Support Vector ◽

Class Imbalance Problem ◽

Precise Method ◽

Imbalance Problem

Class imbalance occurs when instances in a class are much higher than in other classes. This machine learning major problem can affect the predicted accuracy. Support Vector Machine (SVM) is robust and precise method in handling class imbalance problem but weak in the bias data distribution, Biased Support Vector Machine (BSVM) became popular choice to solve the problem. BSVM provide better control sensitivity yet lack accuracy compared to general SVM. This study proposes the integration of BSVM and SMOTEBoost to handle class imbalance problem. Non Support Vector (NSV) sets from negative samples and Support Vector (SV) sets from positive samples will undergo a Weighted-SMOTE process. The results indicate that implementation of Biased Support Vector Machine and Weighted-SMOTE achieve better accuracy and sensitivity.

Download Full-text

Pulsar candidate classification using generative adversary networks

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stz2975 ◽

2019 ◽

Vol 490 (4) ◽

pp. 5424-5439 ◽

Cited By ~ 3

Author(s):

Ping Guo ◽

Fuqing Duan ◽

Pei Wang ◽

Yao Yao ◽

Qian Yin ◽

...

Keyword(s):

Class Imbalance ◽

Feature Learning ◽

Computer Experiments ◽

Support Vector ◽

Data Sets ◽

Class Imbalance Problem ◽

Generative Adversarial Network ◽

Feature Representations ◽

Adversarial Network ◽

Imbalance Problem

ABSTRACT Discovering pulsars is a significant and meaningful research topic in the field of radio astronomy. With the advent of astronomical instruments, the volume and rate of data acquisition have grown exponentially. This development necessitates a focus on artificial intelligence (AI) technologies that can mine large astronomical data sets. Automatic pulsar candidate identification (APCI) can be considered as a task determining potential candidates for further investigation and eliminating the noise of radio-frequency interference and other non-pulsar signals. As reported in the existing literature, AI techniques, especially convolutional neural network (CNN)-based techniques, have been adopted for APCI. However, it is challenging to enhance the performance of CNN-based pulsar identification because only an extremely limited number of real pulsar samples exist, which results in a crucial class imbalance problem. To address these problems, we propose a framework that combines a deep convolution generative adversarial network (DCGAN) with a support vector machine (SVM). The DCGAN is used as a sample generation and feature learning model, and the SVM is adopted as the classifier for predicting the label of a candidate at the inference stage. The proposed framework is a novel technique, which not only can solve the class imbalance problem but also can learn the discriminative feature representations of pulsar candidates instead of computing hand-crafted features in the pre-processing steps. The proposed method can enhance the accuracy of the APCI, and the computer experiments performed on two pulsar data sets verified the effectiveness and efficiency of the proposed method.

Download Full-text

HCBST: An Efficient Hybrid Sampling Technique for Class Imbalance Problems

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3488280 ◽

2022 ◽

Vol 16 (3) ◽

pp. 1-37

Author(s):

Robert A. Sowah ◽

Bernard Kuditchar ◽

Godfrey A. Mills ◽

Amevi Acakpovi ◽

Raphael A. Twum ◽

...

Keyword(s):

Geometric Mean ◽

Class Imbalance ◽

Sampling Technique ◽

Data Repository ◽

Support Vector ◽

Classification Algorithms ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

High Degree ◽

Hybrid Sampling

Class imbalance problem is prevalent in many real-world domains. It has become an active area of research. In binary classification problems, imbalance learning refers to learning from a dataset with a high degree of skewness to the negative class. This phenomenon causes classification algorithms to perform woefully when predicting positive classes with new examples. Data resampling, which involves manipulating the training data before applying standard classification techniques, is among the most commonly used techniques to deal with the class imbalance problem. This article presents a new hybrid sampling technique that improves the overall performance of classification algorithms for solving the class imbalance problem significantly. The proposed method called the Hybrid Cluster-Based Undersampling Technique (HCBST) uses a combination of the cluster undersampling technique to under-sample the majority instances and an oversampling technique derived from Sigma Nearest Oversampling based on Convex Combination, to oversample the minority instances to solve the class imbalance problem with a high degree of accuracy and reliability. The performance of the proposed algorithm was tested using 11 datasets from the National Aeronautics and Space Administration Metric Data Program data repository and University of California Irvine Machine Learning data repository with varying degrees of imbalance. Results were compared with classification algorithms such as the K-nearest neighbours, support vector machines, decision tree, random forest, neural network, AdaBoost, naïve Bayes, and quadratic discriminant analysis. Tests results revealed that for the same datasets, the HCBST performed better with average performances of 0.73, 0.67, and 0.35 in terms of performance measures of area under curve, geometric mean, and Matthews Correlation Coefficient, respectively, across all the classifiers used for this study. The HCBST has the potential of improving the performance of the class imbalance problem, which by extension, will improve on the various applications that rely on the concept for a solution.

Download Full-text

Oversample Based Large Scale Support Vector Machine for Online Class Imbalance Problem

Big Data Analytics - Lecture Notes in Computer Science ◽

10.1007/978-3-030-04780-1_24 ◽

2018 ◽

pp. 348-362

Author(s):

D. Himaja ◽

T. Maruthi Padmaja ◽

P. Radha Krishna

Keyword(s):

Support Vector Machine ◽

Large Scale ◽

Class Imbalance ◽

Support Vector ◽

Class Imbalance Problem ◽

Online Class ◽

Imbalance Problem

Download Full-text

Exploiting Correlation Subspace to Predict Heterogeneous Cross-Project Defects

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194016710017 ◽

2016 ◽

Vol 26 (09n10) ◽

pp. 1571-1580 ◽

Cited By ~ 6

Author(s):

Ming Cheng ◽

Guoqing Wu ◽

Hongyan Wan ◽

Guoan You ◽

Mengting Yuan ◽

...

Keyword(s):

Class Imbalance ◽

Imbalanced Data ◽

Feature Space ◽

Support Vector ◽

Class Imbalance Problem ◽

Classifier Design ◽

Imbalance Problem ◽

Project Data ◽

The Impact ◽

Cross Project

Cross-project defect prediction trains a prediction model using historical data from source projects and applies the model to target projects. Most previous efforts assumed the cross-project data have the same metrics set, which means the metrics used and the size of metrics set are the same. However, this assumption may not hold in practical scenarios. In addition, software defect datasets have the class-imbalance problem which increases the difficulty for the learner to predict defects. In this paper, we advance canonical correlation analysis by deriving a joint feature space for associating cross-project data. We also propose a novel support vector machine algorithm which incorporates the correlation transfer information into classifier design for cross-project prediction. Moreover, we take different misclassification costs into consideration to make the classification inclining to classify a module as a defective one, alleviating the impact of imbalanced data. The experimental results show that our method is more effective compared to state-of-the-art methods.

Download Full-text

Impact of class-imbalance on multi-class high-dimensional class prediction

Advances in Methodology and Statistics ◽

10.51936/grxm1445 ◽

2012 ◽

Vol 9 (1) ◽

Author(s):

Rok Blagus ◽

Lara Lusa

Keyword(s):

Sample Size ◽

Binary Classification ◽

Class Imbalance ◽

Imbalanced Data ◽

Real Data ◽

High Dimensional ◽

Class Imbalance Problem ◽

Class Prediction ◽

Linear Discriminant ◽

Imbalance Problem

The goal of multi-class supervised classification is to develop a rule that accurately predicts the class membership of new samples when the number of classes is larger than two. In this paper we consider high-dimensional class-imbalanced data: the number of variables greatly exceeds the number of samples and the number of samples in each class is not equal. We focus on Friedman's one-versus-one approach for three-class problems and show how its class probabilities depend on the class probabilities from the binary classification sub-problems. We further explore its performance using diagonal linear discriminant analysis (DLDA) as a base classifier and compare its performance with multi-class DLDA, using simulated and real data. Our results show that the class-imbalance has a significant effect on the classification results: the classification is biased towards the majority class as in the two-class problems and the problem is magnified when the number of variables is large. The amount of the bias depends also, jointly, on the magnitude of the differences between the classes and on the sample size: the bias diminishes when the difference between the classes is larger or the sample size is increased. Also variable selection plays an important role in the class-imbalance problem and the most effective strategy depends on the type of differences that exist between classes. DLDA seems to be among the least sensible classifiers to class-imbalance and its use is recommended also for multi-class problems. Whenever possible the experiments should be planned using balanced data in order to avoid the class-imbalance problem.

Download Full-text

A Novel Hybrid Sampling Algorithm for Solving Class Imbalance Problem in Big Data

Advances in Data Science and Adaptive Analysis ◽

10.1142/s2424922x21500054 ◽

2021 ◽

pp. 2150005

Author(s):

Khyati Ahlawat ◽

Anuradha Chug ◽

Amit Prakash Singh

Keyword(s):

Big Data ◽

Class Imbalance ◽

Support Vector ◽

Efficiency Gain ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Class Imbalance Problem ◽

Sampling Algorithm ◽

Imbalance Problem ◽

Hybrid Sampling

The uneven distribution of classes in any dataset poses a tendency of biasness toward the majority class when analyzed using any standard classifier. The instances of the significant class being deficient in numbers are generally ignored and their correct classification which is of paramount interest is often overlooked in calculating overall accuracy. Therefore, the conventional machine learning approaches are rigorously refined to address this class imbalance problem. This challenge of imbalanced classes is more prevalent in big data scenario due to its high volume. This study deals with acknowledging a sampling solution based on cluster computing in handling class imbalance problems in the case of big data. The newly proposed approach hybrid sampling algorithm (HSA) is assessed using three popular classification algorithms namely, support vector machine, decision tree and k-nearest neighbor based on balanced accuracy and elapsed time. The results obtained from the experiment are considered promising with an efficiency gain of 42% in comparison to the traditional sampling solution synthetic minority oversampling technique (SMOTE). This work proves the effectiveness of the distribution and clustering principle in imbalanced big data scenarios.

Download Full-text

Unbalanced sentiment classification: an assessment of ANN in the context of sampling the majority class

10.7287/peerj.preprints.26618 ◽

2018 ◽

Author(s):

Rodrigo Moraes ◽

João Francisco Valiati ◽

Wilson Pires Gavião Neto

Keyword(s):

Sampling Method ◽

Class Imbalance ◽

Sentiment Classification ◽

Support Vector ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Learning Techniques ◽

Vector Machines ◽

Weighting Methods ◽

Document Level

Many people make their opinions available on the Internet nowadays, and researchers have been proposing methods to automate the task of classifying textual reviews as positive or negative. Usual supervised learning techniques have been adopted to accomplish such a task. In practice, positive reviews are abundant in comparison to negative's. This context poses challenges to learning-based methods and data undersampling/oversampling are popular preprocessing techniques to overcome the problem. A combination of sampling techniques and learning methods, like Artificial Neural Networks (ANN) or Support Vector Machines (SVM), has been successfully adopted as a classification approach in many areas, while the sentiment classification literature has not explored ANN in studies that involve sampling methods to balance data. Even the performance of SVM, which is widely used as a sentiment learner, has been rarely addressed under the context of a preceding sampling method. This paper addresses document-level sentiment analysis with unbalanced data and focus on empirically assessing the performance of ANN in the context of undersampling the (majority) set of positive reviews. We adopted the performance of SVM as a baseline, since some studies have indicated SVM as being less subject to the class imbalance problem. Results are produced in terms of a traditional bag-of-words model with popular feature selection and weighting methods. Our experiments indicated that SVM are more stable than ANN in highly unbalanced (80%) data scenarios. However, under the discarding of information generated by random undersampling, ANN outperform SVM or produce comparable results.

Download Full-text

A New Method of Support Vector Machine for Class Imbalance Problem

2009 International Joint Conference on Computational Sciences and Optimization ◽

10.1109/cso.2009.169 ◽

2009 ◽

Cited By ~ 1

Author(s):

Li Yan ◽

Danrui Xei ◽

Zhe Du

Keyword(s):

Support Vector Machine ◽

Class Imbalance ◽

New Method ◽

Support Vector ◽

Class Imbalance Problem ◽

Imbalance Problem

Download Full-text