scholarly journals Classification of Unbalanced Data Based on RSM and Binomial Distribution

Author(s):  
Rong Li ◽  
Wei-Bai Zhou

In the case of extremely unbalanced data, the results of the traditional classification algorithm are very unbalanced, and most samples are often divided into the categories of majority samples, so the accuracy of judgment of the minority classes will be reduced. In this paper, we propose a classification algorithm for unbalanced data based on RSM and binomial undersampling. We use RSM’s random part features rather than all each classifier to make each training classifier reduce the dimensions, and dimension reduction makes relatively minority class samples indirectly lift. Using the above characteristics of the RSM to reduce dimension can solve the problem that unbalanced data classification in the minority class samples is too little, and it can also find the important attribute of variables to make the model have the ability of explanation. Experiments show that our algorithm has high classification accuracy and model interpretation ability when classifying unbalanced data.

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Hamideh Soltani ◽  
Zahra Einalou ◽  
Mehrdad Dadgostar ◽  
Keivan Maghooli

AbstractBrain computer interface (BCI) systems have been regarded as a new way of communication for humans. In this research, common methods such as wavelet transform are applied in order to extract features. However, genetic algorithm (GA), as an evolutionary method, is used to select features. Finally, classification was done using the two approaches support vector machine (SVM) and Bayesian method. Five features were selected and the accuracy of Bayesian classification was measured to be 80% with dimension reduction. Ultimately, the classification accuracy reached 90.4% using SVM classifier. The results of the study indicate a better feature selection and the effective dimension reduction of these features, as well as a higher percentage of classification accuracy in comparison with other studies.


2014 ◽  
Vol 622 ◽  
pp. 75-80
Author(s):  
Baskar Nisha ◽  
B. Madasamy ◽  
J.Jebamalar Tamilselvi

Classification of data on genetic disease is a useful application in microarray analysis. The genetic disease data analysis has the potential for discovering the diseased genes which may be the signature of certain diseases. Machine learning methodologies and data mining techniques are used to predict genetic disease associations of bio informatics data. Among numerous existing methods for gene selection, Backpropagation algorithm has become one of the leading methods and it gives less classification accuracy. It aims to develop a new classification algorithm (Enhanced Backpropagation Algorithm) for genetic disease analysis. Knowledge derived by the Enhanced Backpropagation Algorithm has high classification accuracy with the ability to identify the most significant genes.


Author(s):  
Thanh-Hai Nguyen ◽  
Ba-Viet Ngo

<p>Skin diseases have a serious impact on human life and health. This article aims to represent the classification accuracy of skin diseases for supporting the physicians’ correct decision on patients for early treatment. In particular, 100 images in each type of five skin diseases from ISIC database are used for balanced datasets related to the classification accuracy. In addition, this paper focuses on processing images for extracting six optimal types of eleven features of skin disease image for higher classification performance and also this takes less time for training. Therefore, skin disease images are filtered and segmented for separating region of interests (ROIs) before extracting optimal features. First, the skin disease images are processed by normalizing sizes, removing noises, segmenting to separate region of interests (ROIs) showing skin disease signs. Next, a gray-level co-occurrence matrix (GLCM) method is applied for texture analysis to extract eleven features. With the optimal six features chosen, the high classification accuracy of skin diseases is about 92% evaluated using a matrix confusion. The result showed to illustrate the effectiveness of the proposed method. Furthermore, this method can be developed for other medical datasets for supporting in disease diagnosis.</p>


2020 ◽  
Vol 10 (11) ◽  
pp. 3816
Author(s):  
Eirini Kakkava ◽  
Navid Borhani ◽  
Babak Rahmani ◽  
Uğur Teğin ◽  
Christophe Moser ◽  
...  

Deep neural networks (DNNs) are employed to recover information after its propagation through a multimode fiber (MMF) in the presence of wavelength drift. The intensity distribution of the speckle patterns generated at the output of an MMF when an input wavefront propagates along its length is highly sensitive to wavelength changes. We use a tunable laser to implement a wavelength drift with a controlled bandwidth, aiming to estimate the DNN’s performance in different cases and identify the limitations. We find that when the DNNs are trained with a dataset which includes the noise induced by wavelength changes, successful classification of a speckle pattern can be performed even for a large wavelength bandwidth drift. A single training step is found to be sufficient for high classification accuracy, removing the need for time-consuming recalibration at each wavelength.


2019 ◽  
Vol 8 (4) ◽  
pp. 4039-4042

Recently, the learning from unbalanced data has emerged to be a pre-dominant problem in several applications and in that multi label classification is an evolving data mining task, learning from unbalanced multilabel data is being examined. However, the available algorithms-based SMOTE makes use of the same sampling rate for every instance of the minority class. This leads to sub-optimal performance. To deal with this problem, a new Particle Swarm Optimization based SMOTE (PSOSMOTE) algorithm is proposed. The PSOSMOTE algorithm employs diverse sampling rates for multiple minority class instances and gets the fusion of optimal sampling rates and to deal with classification of unbalanced datasets. Then, Bayesian technique is combined with Random forest for multilabel classification (BARF-MLC) is to address the inherent label dependencies among samples such as ML-FOREST classifier, Predictive Clustering Trees (PCT), Hierarchy of Multi Label Classifier (HOMER) by taking the different metrics including precision, recall, F-measure, Accuracy and Error Rate.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Daliang Wang ◽  
Xiaowen Guo

In the complex system of music performance, there are differences in the expression of music emotions by listeners, so it is of great significance to study the classification of different emotions under different audio signals. In this paper, the research of human emotional intelligence recognition and classification algorithm in the complex system of music performance is proposed. Through the recognition of SVM, KNN, ANN, and ID3 classifiers, the accuracy of a single classifier is compared, and then the four classifiers are combined to compare the classification accuracy of audio signals before and after preprocessing. The results show that the accuracy of SVM and ANN fusion is the highest. Finally, recall and F1 are comprehensively compared in the fusion algorithm, and the fusion classification effect of SVM and ANN is better than that of the algorithm model.


2020 ◽  
Author(s):  
Hamideh Soltani ◽  
Zahra Einalou ◽  
Keivan Maghooli

Abstract In recent years, brain-computer communication systems have been regarded as a new way of communication for humans. One of the applications of brain-computer communication is the development of systems which facilitates communication. To this end, it is necessary to extract the visually evoked signals from the EEG signal and classify it. In this research, common methods such as wavelet transform are applied in order to extract features. However, genetic algorithm, as an evolutionary method, is used to select features. Finally, after selecting features, the classification was done using the two approaches support vector machine and Bayesian method. Five features were selected and the accuracy of Bayesian classification was measured to be 80% with dimension reduction, and 78% without dimension reduction. Ultimately, the classification accuracy reached 90.4% using SVM classifier. The results of the study indicate a better feature selection and the effective dimension reduction of these features, as well as a higher percentage of classification accuracy in comparison with other studies.


Author(s):  
B. Abbasi ◽  
H. Arefi ◽  
B. Bigdeli ◽  
M. Motagh ◽  
S. Roessner

Limitations and deficiencies of different remote sensing sensors in extraction of different objects caused fusion of data from different sensors to become more widespread for improving classification results. Using a variety of data which are provided from different sensors, increase the spatial and the spectral accuracy. Lidar (Light Detection and Ranging) data fused together with hyperspectral images (HSI) provide rich data for classification of the surface objects. Lidar data representing high quality geometric information plays a key role for segmentation and classification of elevated features such as buildings and trees. On the other hand, hyperspectral data containing high spectral resolution would support high distinction between the objects having different spectral information such as soil, water, and grass. This paper presents a fusion methodology on Lidar and hyperspectral data for improving classification accuracy in urban areas. In first step, we applied feature extraction strategies on each data separately. In this step, texture features based on GLCM (Grey Level Co-occurrence Matrix) from Lidar data and PCA (Principal Component Analysis) and MNF (Minimum Noise Fraction) based dimension reduction methods for HSI are generated. In second step, a Maximum Likelihood (ML) based classification method is applied on each feature spaces. Finally, a fusion method is applied to fuse the results of classification. A co-registered hyperspectral and Lidar data from University of Houston was utilized to examine the result of the proposed method. This data contains nine classes: Building, Tree, Grass, Soil, Water, Road, Parking, Tennis Court and Running Track. Experimental investigation proves the improvement of classification accuracy to 88%.


2012 ◽  
Vol 522 ◽  
pp. 643-648
Author(s):  
Chang Yong Li ◽  
Qi Xin Cao

The color and shape feature are very important quality characteristic for classification of fruits. The dominant grading color histogram feature and radius normal angle histogram feature were presented in this paper. They can well represent the color and shape information of fruits respectively and are not sensitive to the changes of scale, translation and rotation. Experiment results showed both histogram features can effectively distinguish between different grade fruits and have high classification accuracy. They are suitable for real-time application.


Sign in / Sign up

Export Citation Format

Share Document