Classification of Unbalanced Data Based on RSM and Binomial Distribution

Fuzzy Systems and Data Mining VI - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200684 ◽

2020 ◽

Author(s):

Rong Li ◽

Wei-Bai Zhou

Keyword(s):

Dimension Reduction ◽

Binomial Distribution ◽

Classification Accuracy ◽

Classification Algorithm ◽

Unbalanced Data ◽

Minority Class ◽

Model Interpretation ◽

High Classification Accuracy ◽

Random Part

In the case of extremely unbalanced data, the results of the traditional classification algorithm are very unbalanced, and most samples are often divided into the categories of majority samples, so the accuracy of judgment of the minority classes will be reduced. In this paper, we propose a classification algorithm for unbalanced data based on RSM and binomial undersampling. We use RSM’s random part features rather than all each classifier to make each training classifier reduce the dimensions, and dimension reduction makes relatively minority class samples indirectly lift. Using the above characteristics of the RSM to reduce dimension can solve the problem that unbalanced data classification in the minority class samples is too little, and it can also find the important attribute of variables to make the model have the ability of explanation. Experiments show that our algorithm has high classification accuracy and model interpretation ability when classifying unbalanced data.

Download Full-text

Classification of SSVEP-based BCIs using Genetic Algorithm

Journal Of Big Data ◽

10.1186/s40537-021-00478-y ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Hamideh Soltani ◽

Zahra Einalou ◽

Mehrdad Dadgostar ◽

Keivan Maghooli

Keyword(s):

Genetic Algorithm ◽

Dimension Reduction ◽

Classification Accuracy ◽

Bayesian Method ◽

Computer Interface ◽

Support Vector ◽

Svm Classifier ◽

Effective Dimension ◽

Effective Dimension Reduction

AbstractBrain computer interface (BCI) systems have been regarded as a new way of communication for humans. In this research, common methods such as wavelet transform are applied in order to extract features. However, genetic algorithm (GA), as an evolutionary method, is used to select features. Finally, classification was done using the two approaches support vector machine (SVM) and Bayesian method. Five features were selected and the accuracy of Bayesian classification was measured to be 80% with dimension reduction. Ultimately, the classification accuracy reached 90.4% using SVM classifier. The results of the study indicate a better feature selection and the effective dimension reduction of these features, as well as a higher percentage of classification accuracy in comparison with other studies.

Download Full-text

Enhanced Backpropagation Approach for Identifying Genetic Disease

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.622.75 ◽

2014 ◽

Vol 622 ◽

pp. 75-80

Author(s):

Baskar Nisha ◽

B. Madasamy ◽

J.Jebamalar Tamilselvi

Keyword(s):

Machine Learning ◽

Genetic Disease ◽

Classification Accuracy ◽

Gene Selection ◽

New Classification ◽

Backpropagation Algorithm ◽

High Classification Accuracy ◽

Disease Associations ◽

Disease Analysis

Classification of data on genetic disease is a useful application in microarray analysis. The genetic disease data analysis has the potential for discovering the diseased genes which may be the signature of certain diseases. Machine learning methodologies and data mining techniques are used to predict genetic disease associations of bio informatics data. Among numerous existing methods for gene selection, Backpropagation algorithm has become one of the leading methods and it gives less classification accuracy. It aims to develop a new classification algorithm (Enhanced Backpropagation Algorithm) for genetic disease analysis. Knowledge derived by the Enhanced Backpropagation Algorithm has high classification accuracy with the ability to identify the most significant genes.

Download Full-text

ROI-based features for classification of skin diseases using a multi-layer neural network

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v23.i1.pp216-228 ◽

2021 ◽

Vol 23 (1) ◽

pp. 216

Author(s):

Thanh-Hai Nguyen ◽

Ba-Viet Ngo

Keyword(s):

Skin Disease ◽

Classification Accuracy ◽

Skin Diseases ◽

Human Life ◽

Disease Diagnosis ◽

Classification Performance ◽

High Classification Accuracy ◽

Separate Region ◽

Occurrence Matrix

<p>Skin diseases have a serious impact on human life and health. This article aims to represent the classification accuracy of skin diseases for supporting the physicians’ correct decision on patients for early treatment. In particular, 100 images in each type of five skin diseases from ISIC database are used for balanced datasets related to the classification accuracy. In addition, this paper focuses on processing images for extracting six optimal types of eleven features of skin disease image for higher classification performance and also this takes less time for training. Therefore, skin disease images are filtered and segmented for separating region of interests (ROIs) before extracting optimal features. First, the skin disease images are processed by normalizing sizes, removing noises, segmenting to separate region of interests (ROIs) showing skin disease signs. Next, a gray-level co-occurrence matrix (GLCM) method is applied for texture analysis to extract eleven features. With the optimal six features chosen, the high classification accuracy of skin diseases is about 92% evaluated using a matrix confusion. The result showed to illustrate the effectiveness of the proposed method. Furthermore, this method can be developed for other medical datasets for supporting in disease diagnosis.</p>

Download Full-text

Deep Learning-Based Image Classification through a Multimode Fiber in the Presence of Wavelength Drift

Applied Sciences ◽

10.3390/app10113816 ◽

2020 ◽

Vol 10 (11) ◽

pp. 3816

Author(s):

Eirini Kakkava ◽

Navid Borhani ◽

Babak Rahmani ◽

Uğur Teğin ◽

Christophe Moser ◽

...

Keyword(s):

Classification Accuracy ◽

Deep Neural Networks ◽

Speckle Pattern ◽

Tunable Laser ◽

Multimode Fiber ◽

High Classification Accuracy ◽

Highly Sensitive ◽

Speckle Patterns ◽

Large Wavelength

Deep neural networks (DNNs) are employed to recover information after its propagation through a multimode fiber (MMF) in the presence of wavelength drift. The intensity distribution of the speckle patterns generated at the output of an MMF when an input wavefront propagates along its length is highly sensitive to wavelength changes. We use a tunable laser to implement a wavelength drift with a controlled bandwidth, aiming to estimate the DNN’s performance in different cases and identify the limitations. We find that when the DNNs are trained with a dataset which includes the noise induced by wavelength changes, successful classification of a speckle pattern can be performed even for a large wavelength bandwidth drift. A single training step is found to be sufficient for high classification accuracy, removing the need for time-consuming recalibration at each wavelength.

Download Full-text

Multi-Label Classification with PSO based Synthetic Minority Over-Sampling Technique (Psosmote) for Imbalanced Samples

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8437.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 4039-4042

Keyword(s):

Data Mining ◽

Sampling Rate ◽

Sampling Technique ◽

Unbalanced Data ◽

Optimal Sampling ◽

Minority Class ◽

Swarm Optimization ◽

F Measure ◽

Predictive Clustering Trees

Recently, the learning from unbalanced data has emerged to be a pre-dominant problem in several applications and in that multi label classification is an evolving data mining task, learning from unbalanced multilabel data is being examined. However, the available algorithms-based SMOTE makes use of the same sampling rate for every instance of the minority class. This leads to sub-optimal performance. To deal with this problem, a new Particle Swarm Optimization based SMOTE (PSOSMOTE) algorithm is proposed. The PSOSMOTE algorithm employs diverse sampling rates for multiple minority class instances and gets the fusion of optimal sampling rates and to deal with classification of unbalanced datasets. Then, Bayesian technique is combined with Random forest for multilabel classification (BARF-MLC) is to address the inherent label dependencies among samples such as ML-FOREST classifier, Predictive Clustering Trees (PCT), Hierarchy of Multi Label Classifier (HOMER) by taking the different metrics including precision, recall, F-measure, Accuracy and Error Rate.

Download Full-text

Research on Intelligent Recognition and Classification Algorithm of Music Emotion in Complex System of Music Performance

Complexity ◽

10.1155/2021/4251827 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Daliang Wang ◽

Xiaowen Guo

Keyword(s):

Complex System ◽

Classification Accuracy ◽

Music Performance ◽

Classification Algorithm ◽

Audio Signals ◽

Fusion Algorithm ◽

Before And After ◽

Intelligent Recognition ◽

Better Than

In the complex system of music performance, there are differences in the expression of music emotions by listeners, so it is of great significance to study the classification of different emotions under different audio signals. In this paper, the research of human emotional intelligence recognition and classification algorithm in the complex system of music performance is proposed. Through the recognition of SVM, KNN, ANN, and ID3 classifiers, the accuracy of a single classifier is compared, and then the four classifiers are combined to compare the classification accuracy of audio signals before and after preprocessing. The results show that the accuracy of SVM and ANN fusion is the highest. Finally, recall and F1 are comprehensively compared in the fusion algorithm, and the fusion classification effect of SVM and ANN is better than that of the algorithm model.

Download Full-text

Classification of SSVEP-based BCIs using Genetic Algorithm

10.21203/rs.3.rs-119561/v1 ◽

2020 ◽

Author(s):

Hamideh Soltani ◽

Zahra Einalou ◽

Keivan Maghooli

Keyword(s):

Genetic Algorithm ◽

Dimension Reduction ◽

Classification Accuracy ◽

Communication Systems ◽

Support Vector ◽

Svm Classifier ◽

Effective Dimension ◽

Computer Communication ◽

Effective Dimension Reduction

Abstract In recent years, brain-computer communication systems have been regarded as a new way of communication for humans. One of the applications of brain-computer communication is the development of systems which facilitates communication. To this end, it is necessary to extract the visually evoked signals from the EEG signal and classify it. In this research, common methods such as wavelet transform are applied in order to extract features. However, genetic algorithm, as an evolutionary method, is used to select features. Finally, after selecting features, the classification was done using the two approaches support vector machine and Bayesian method. Five features were selected and the accuracy of Bayesian classification was measured to be 80% with dimension reduction, and 78% without dimension reduction. Ultimately, the classification accuracy reached 90.4% using SVM classifier. The results of the study indicate a better feature selection and the effective dimension reduction of these features, as well as a higher percentage of classification accuracy in comparison with other studies.

Download Full-text

Fusion of hyperspectral and lidar data based on dimension reduction and maximum likelihood

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xl-7-w3-569-2015 ◽

2015 ◽

Vol XL-7/W3 ◽

pp. 569-573 ◽

Cited By ~ 4

Author(s):

B. Abbasi ◽

H. Arefi ◽

B. Bigdeli ◽

M. Motagh ◽

S. Roessner

Keyword(s):

Maximum Likelihood ◽

Dimension Reduction ◽

Soil Water ◽

Classification Accuracy ◽

Urban Areas ◽

Principal Component ◽

Hyperspectral Data ◽

High Spectral Resolution ◽

Lidar Data

Limitations and deficiencies of different remote sensing sensors in extraction of different objects caused fusion of data from different sensors to become more widespread for improving classification results. Using a variety of data which are provided from different sensors, increase the spatial and the spectral accuracy. Lidar (Light Detection and Ranging) data fused together with hyperspectral images (HSI) provide rich data for classification of the surface objects. Lidar data representing high quality geometric information plays a key role for segmentation and classification of elevated features such as buildings and trees. On the other hand, hyperspectral data containing high spectral resolution would support high distinction between the objects having different spectral information such as soil, water, and grass. This paper presents a fusion methodology on Lidar and hyperspectral data for improving classification accuracy in urban areas. In first step, we applied feature extraction strategies on each data separately. In this step, texture features based on GLCM (Grey Level Co-occurrence Matrix) from Lidar data and PCA (Principal Component Analysis) and MNF (Minimum Noise Fraction) based dimension reduction methods for HSI are generated. In second step, a Maximum Likelihood (ML) based classification method is applied on each feature spaces. Finally, a fusion method is applied to fuse the results of classification. A co-registered hyperspectral and Lidar data from University of Houston was utilized to examine the result of the proposed method. This data contains nine classes: Building, Tree, Grass, Soil, Water, Road, Parking, Tennis Court and Running Track. Experimental investigation proves the improvement of classification accuracy to 88%.

Download Full-text

Study on Method for External Characteristics Extraction and Classification of Tomatoes Used for Grading Robot

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.522.643 ◽

2012 ◽

Vol 522 ◽

pp. 643-648

Author(s):

Chang Yong Li ◽

Qi Xin Cao

Keyword(s):

Classification Accuracy ◽

Quality Characteristic ◽

Shape Information ◽

High Classification Accuracy ◽

Important Quality ◽

Histogram Feature ◽

External Characteristics ◽

Real Time Application ◽

Normal Angle

The color and shape feature are very important quality characteristic for classification of fruits. The dominant grading color histogram feature and radius normal angle histogram feature were presented in this paper. They can well represent the color and shape information of fruits respectively and are not sensitive to the changes of scale, translation and rotation. Experiment results showed both histogram features can effectively distinguish between different grade fruits and have high classification accuracy. They are suitable for real-time application.

Download Full-text

ImbTree: Minority Class Sensitive Weighted Decision Tree for Classification of Unbalanced Data

International Journal of Intelligent Systems and Applications in Engineering ◽

10.18201/ijisae.2021473633 ◽

2021 ◽

Vol 9 (4) ◽

pp. 152-158

Author(s):

Pratikkumar A. Barot ◽

Harikrishna B. Jethva

Keyword(s):

Decision Tree ◽

Unbalanced Data ◽

Minority Class

Download Full-text