Multiclass Boosting with Adaptive Group-BasedkNN and Its Application in Text Categorization

Mathematical Problems in Engineering ◽

10.1155/2012/793490 ◽

2012 ◽

Vol 2012 ◽

pp. 1-24 ◽

Cited By ~ 6

Author(s):

Lei La ◽

Qiao Guo ◽

Dequan Yang ◽

Qimin Cao

Keyword(s):

Chinese Text ◽

Text Categorization ◽

Nearest Neighbor ◽

Classification Problem ◽

Support Vector ◽

Classification Problems ◽

Adaboost Algorithm ◽

Novel Method ◽

Categorization System ◽

Multi Class Classification

AdaBoost is an excellent committee-based tool for classification. However, its effectiveness and efficiency in multiclass categorization face the challenges from methods based on support vector machine (SVM), neural networks (NN), naïve Bayes, andk-nearest neighbor (kNN). This paper uses a novel multi-class AdaBoost algorithm to avoid reducing the multi-class classification problem to multiple two-class classification problems. This novel method is more effective. In addition, it keeps the accuracy advantage of existing AdaBoost. An adaptive group-basedkNN method is proposed in this paper to build more accurate weak classifiers and in this way control the number of basis classifiers in an acceptable range. To further enhance the performance, weak classifiers are combined into a strong classifier through a double iterative weighted way and construct an adaptive group-basedkNN boosting algorithm (AGkNN-AdaBoost). We implement AGkNN-AdaBoost in a Chinese text categorization system. Experimental results showed that the classification algorithm proposed in this paper has better performance both in precision and recall than many other text categorization methods including traditional AdaBoost. In addition, the processing speed is significantly enhanced than original AdaBoost and many other classic categorization algorithms.

Download Full-text

Improving AdaBoost Classifier to Predict Enterprise Performance after COVID-19

Mathematics ◽

10.3390/math9182215 ◽

2021 ◽

Vol 9 (18) ◽

pp. 2215

Author(s):

Jung-Kai Tsai ◽

Chih-Hsing Hung

Keyword(s):

Machine Learning ◽

Back Propagation ◽

Classification Problem ◽

Back Propagation Neural Network ◽

Future Research ◽

Support Vector ◽

Enterprise Performance ◽

Classification Problems ◽

Proposed Model ◽

Multi Class Classification

Because COVID-19 occurred in 2019, the behavioxr of humans has been changed and it will influence the business model of enterprise. Enterprise cannot predict its development according to past knowledge and experiment; so, it needs a new machine learning framework to predict enterprise performance. The goal of this research is to modify AdaBoost to reasonably predict the enterprise performance. In order to justify the usefulness of the proposed model, enterprise data will be collected and the proposed model can be used to predict the enterprise performance after COVID-19. The test data correct rate of the proposed model will be compared with some of the traditional machine learning models. Compared with the traditional AdaBoost, back propagation neural network (BPNN), regression classifier, support vector machine (SVM) and support vector regression (SVR), the proposed method possesses the better classification ability (average correct rate of the proposed method is 88.04%) in handling two classification problems. Compared with traditional AdaBoost, one-against-all SVM, one-against-one SVM, one-against-all SVR and one-against-one SVR, the classification ability of the proposed method is also relatively better for coping with the multi-class classification problem. Finally, some conclusions and future research will be discussed at the end.

Download Full-text

BVDT: A Boosted Vector Decision Tree Algorithm for Multi-Class Classification Problems

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001417500161 ◽

2017 ◽

Vol 31 (05) ◽

pp. 1750016 ◽

Cited By ~ 3

Author(s):

Kaiyuan Wu ◽

Zhiming Zheng ◽

Shaoting Tang

Keyword(s):

Decision Tree ◽

Loss Function ◽

Nearest Neighbor ◽

State Of The Art ◽

Support Vector ◽

Classification Problems ◽

Regression Problem ◽

Weak Learner ◽

Boosting Algorithms ◽

Multi Class Classification

In this paper, we propose a powerful weak learner (Vector Decision Tree (VDT)) and a new Boosted Vector Decision Tree (BVDT) algorithm framework for the task of multi-class classification. Unlike the traditional scalar valued boosting algorithms, the BVDT algorithm directly maps the feature space to the decision space in the multi-class setting, which facilitates convenient implementations of the multi-class classification algorithms using diverse loss functions. By viewing the explicit hard threshold on the leaf node value applied in the LogitBoost as a constraint optimization problem, we further develop two new variants of the BVDT algorithm: the [Formula: see text]-BVDT and the [Formula: see text]-BVDT. The performance of the proposed algorithm is evaluated on different datasets and compared with three state-of-the-art boosting algorithms, [Formula: see text]-Nearest Neighbor (KNN) and Support Vector Machine (SVM). The results show that the performance of the proposed algorithm ranks first in all but one dataset and reduces the test error rate by 4% up to 58% with respect to the state-of-the-art boosting algorithms based on the scalar-valued weak learner. Furthermore, we present a case study on the Abalone dataset by designing a new loss function that combines the negative log-likelihood loss function of classification problem and square loss function of regression problem.

Download Full-text

NONPARALLEL HYPERPLANES PROXIMAL CLASSIFIERS BASED ON MANIFOLD REGULARIZATION FOR LABELED AND UNLABELED EXAMPLES

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001413500158 ◽

2013 ◽

Vol 27 (05) ◽

pp. 1350015 ◽

Cited By ~ 4

Author(s):

ZHI-XIA YANG

Keyword(s):

Linear Equations ◽

Classification Problem ◽

Twin Support Vector Machine ◽

Support Vector ◽

Manifold Regularization ◽

Classification Problems ◽

Intrinsic Structure ◽

Benchmark Datasets ◽

Multi Class Classification ◽

Proximal Classifiers

In this paper, we propose two Laplacian nonparallel hyperplane proximal classifiers (LapNPPCs) for semi-supervised and full-supervised classification problem respectively by adding manifold regularization terms. Due to the manifold regularization terms, our LapNPPCs are able to exploit the intrinsic structure of the patterns of the training set. Furthermore, our classifiers only need to solve two systems of linear equations rather than two quadratic programming (QP) problems as needed in Laplacian twin support vector machine (LapTSVM) (Z. Qi, Y. Tian and Y. Shi, Neural Netw.35 (2012) 46–53). Numerical experiments on toy and UCI benchmark datasets show that the accuracy of our LapNPPCs is comparable with other classifiers, such as the standard SVM, TWSVM and LapTSVM, etc. It is also the case that based on our LapNPPCs, some other TWSVM type classifiers with manifold regularization can be constructed by choosing different norms and loss functions to deal with semi-supervised binary and multi-class classification problems.

Download Full-text

Confidence interval for micro-averaged F1 and macro-averaged F1 scores

Applied Intelligence ◽

10.1007/s10489-021-02635-5 ◽

2021 ◽

Author(s):

Kanae Takahashi ◽

Kouji Yamamoto ◽

Aya Kuchiba ◽

Tatsuki Koyama

Keyword(s):

Binary Classification ◽

Classification Problem ◽

Classification Problems ◽

Summary Measure ◽

Medical Field ◽

Predictive Values ◽

Binary Classification Problem ◽

Multi Class Classification ◽

Sensitivity Specificity ◽

Measures Of Performance

AbstractA binary classification problem is common in medical field, and we often use sensitivity, specificity, accuracy, negative and positive predictive values as measures of performance of a binary predictor. In computer science, a classifier is usually evaluated with precision (positive predictive value) and recall (sensitivity). As a single summary measure of a classifier’s performance, F1 score, defined as the harmonic mean of precision and recall, is widely used in the context of information retrieval and information extraction evaluation since it possesses favorable characteristics, especially when the prevalence is low. Some statistical methods for inference have been developed for the F1 score in binary classification problems; however, they have not been extended to the problem of multi-class classification. There are three types of F1 scores, and statistical properties of these F1 scores have hardly ever been discussed. We propose methods based on the large sample multivariate central limit theorem for estimating F1 scores with confidence intervals.

Download Full-text

Design of Text Categorization System Based on SVM

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.532-533.1191 ◽

2012 ◽

Vol 532-533 ◽

pp. 1191-1195 ◽

Cited By ~ 1

Author(s):

Zhen Yan Liu ◽

Wei Ping Wang ◽

Yong Wang

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Text Categorization ◽

Feature Selection Method ◽

Extraction Methods ◽

Support Vector ◽

Text Representation ◽

Text Feature ◽

Categorization System ◽

Classifier Training

This paper introduces the design of a text categorization system based on Support Vector Machine (SVM). It analyzes the high dimensional characteristic of text data, the reason why SVM is suitable for text categorization. According to system data flow this system is constructed. This system consists of three subsystems which are text representation, classifier training and text classification. The core of this system is the classifier training, but text representation directly influences the currency of classifier and the performance of the system. Text feature vector space can be built by different kinds of feature selection and feature extraction methods. No research can indicate which one is the best method, so many feature selection and feature extraction methods are all developed in this system. For a specific classification task every feature selection method and every feature extraction method will be tested, and then a set of the best methods will be adopted.

Download Full-text

Age Classification Based on Feature Fusion

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.519-520.644 ◽

2014 ◽

Vol 519-520 ◽

pp. 644-650

Author(s):

Mian Shui Yu ◽

Yu Xie ◽

Xiao Meng Xie

Keyword(s):

Feature Fusion ◽

Principal Component ◽

Directional Pattern ◽

Gabor Wavelet ◽

Support Vector ◽

Classification Problems ◽

Pca Method ◽

Multi Class Classification ◽

Global And Local ◽

Fusion Theory

Age classification based on facial images is attracting wide attention with its broad application to human-computer interaction (HCI). Since human senescence is a tremendously complex process, age classification is still a highly challenging issue. In our study, Local Directional Pattern (LDP) and Gabor wavelet transform were used to extract global and local facial features, respectively, that were fused based on information fusion theory. The Principal Component Analysis (PCA) method was used for dimensionality reduction of the fused features, to obtain a lower-dimensional age characteristic vector. A Support Vector Machine (SVM) multi-class classifier with Error Correcting Output Codes (ECOC) was proposed in the paper. This was aimed at multi-class classification problems, such as age classification. Experiments on a public FG-NET age database proved the efficiency of our method.

Download Full-text

Spectrogram Classification Using Dissimilarity Space

Applied Sciences ◽

10.3390/app10124176 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4176 ◽

Cited By ~ 1

Author(s):

Loris Nanni ◽

Andrea Rigo ◽

Alessandra Lumini ◽

Sheryl Brahnam

Keyword(s):

Ad Hoc ◽

Classification Problem ◽

Space Representation ◽

Support Vector ◽

Clustering Methods ◽

Audio Classification ◽

Classification Problems ◽

Clustering Techniques ◽

Vector Space Representation ◽

Better Than

In this work, we combine a Siamese neural network and different clustering techniques to generate a dissimilarity space that is then used to train an SVM for automated animal audio classification. The animal audio datasets used are (i) birds and (ii) cat sounds, which are freely available. We exploit different clustering methods to reduce the spectrograms in the dataset to a number of centroids that are used to generate the dissimilarity space through the Siamese network. Once computed, we use the dissimilarity space to generate a vector space representation of each pattern, which is then fed into an support vector machine (SVM) to classify a spectrogram by its dissimilarity vector. Our study shows that the proposed approach based on dissimilarity space performs well on both classification problems without ad-hoc optimization of the clustering methods. Moreover, results show that the fusion of CNN-based approaches applied to the animal audio classification problem works better than the stand-alone CNNs.

Download Full-text

Imprecise Imputation as a Tool for Solving Classification Problems with Mean Values of Unobserved Features

Advances in Artificial Intelligence ◽

10.1155/2013/176890 ◽

2013 ◽

Vol 2013 ◽

pp. 1-12

Author(s):

Lev V. Utkin ◽

Yulia A. Zhuk

Keyword(s):

Partial Information ◽

Probability Distributions ◽

Classification Problem ◽

Support Vector ◽

Classification Problems ◽

Mean Values ◽

Numerical Examples ◽

The Standard Model ◽

The Mean ◽

Available Information

A method for solving a classification problem when there is only partial information about some features is proposed. This partial information comprises the mean values of features for every class and the bounds of the features. In order to maximally exploit the available information, a set of probability distributions is constructed such that two distributions are selected from the set which define the minimax and minimin strategies. Random values of features are generated in accordance with the selected distributions by using the Monte Carlo technique. As a result, the classification problem is reduced to the standard model which is solved by means of the support vector machine. Numerical examples illustrate the proposed method.

Download Full-text

Piecewise Combination of Hyper-Sphere Support Vector Machine for Multi-Class Classification Problems

10.23940/ijpe.19.06.p12.16111619 ◽

2019 ◽

Author(s):

Shuang Liu

Keyword(s):

Support Vector Machine ◽

Support Vector ◽

Classification Problems ◽

Multi Class Classification

Download Full-text

Hierarchical Multi Twin Support Vector Machine

HUE UNIVERSITY JOURNAL OF SCIENCE TECHNIQUES AND TECHNOLOGY ◽

10.26459/hueunijtt.v130i2b.5829 ◽

2021 ◽

Vol 130 (2B) ◽

Author(s):

Nguyen The Cuong

Keyword(s):

Support Vector Machine ◽

Structural Information ◽

Binary Classification ◽

Classification Problem ◽

Data Type ◽

Twin Support Vector Machine ◽

Support Vector ◽

Classification Problems ◽

Data Simulation ◽

Better Than

In binary classification problems, two classes normally have different tendencies. More complex, the clusters in each class also tend to be different. Traditional algorithms as Support Vector Machine (SVM) or Twin Support Vector Machine (TWSVM) don't sufficiently exploit structural information with cluster granularity of the data, cause of restricts the capability of simulation of data trends. Structural twin support vector machine (S-TWSVM) sufficiently exploits structural information with cluster granularity of one class for learning a represented hyperplane of that class. This makes S-TWSVM's data simulation capabilities better than TWSVM. However, for the data type that each class consists of clusters of different trends, the capability of simulation of S-TWSVM is restricted. In this paper, we propose a new Hierarchical Multi Twin Support Vector Machine (called HM-TWSVM) for classification problem with each cluster-vs-class strategy. HM-TWSVM overcomes the limitations of S-TWSVM. Experiment results show that HM-TWSVM could describe the tendency of each cluster.

Download Full-text