scholarly journals Multiclass Boosting with Adaptive Group-BasedkNN and Its Application in Text Categorization

2012 ◽  
Vol 2012 ◽  
pp. 1-24 ◽  
Author(s):  
Lei La ◽  
Qiao Guo ◽  
Dequan Yang ◽  
Qimin Cao

AdaBoost is an excellent committee-based tool for classification. However, its effectiveness and efficiency in multiclass categorization face the challenges from methods based on support vector machine (SVM), neural networks (NN), naïve Bayes, andk-nearest neighbor (kNN). This paper uses a novel multi-class AdaBoost algorithm to avoid reducing the multi-class classification problem to multiple two-class classification problems. This novel method is more effective. In addition, it keeps the accuracy advantage of existing AdaBoost. An adaptive group-basedkNN method is proposed in this paper to build more accurate weak classifiers and in this way control the number of basis classifiers in an acceptable range. To further enhance the performance, weak classifiers are combined into a strong classifier through a double iterative weighted way and construct an adaptive group-basedkNN boosting algorithm (AGkNN-AdaBoost). We implement AGkNN-AdaBoost in a Chinese text categorization system. Experimental results showed that the classification algorithm proposed in this paper has better performance both in precision and recall than many other text categorization methods including traditional AdaBoost. In addition, the processing speed is significantly enhanced than original AdaBoost and many other classic categorization algorithms.

Mathematics ◽  
2021 ◽  
Vol 9 (18) ◽  
pp. 2215
Author(s):  
Jung-Kai Tsai ◽  
Chih-Hsing Hung

Because COVID-19 occurred in 2019, the behavioxr of humans has been changed and it will influence the business model of enterprise. Enterprise cannot predict its development according to past knowledge and experiment; so, it needs a new machine learning framework to predict enterprise performance. The goal of this research is to modify AdaBoost to reasonably predict the enterprise performance. In order to justify the usefulness of the proposed model, enterprise data will be collected and the proposed model can be used to predict the enterprise performance after COVID-19. The test data correct rate of the proposed model will be compared with some of the traditional machine learning models. Compared with the traditional AdaBoost, back propagation neural network (BPNN), regression classifier, support vector machine (SVM) and support vector regression (SVR), the proposed method possesses the better classification ability (average correct rate of the proposed method is 88.04%) in handling two classification problems. Compared with traditional AdaBoost, one-against-all SVM, one-against-one SVM, one-against-all SVR and one-against-one SVR, the classification ability of the proposed method is also relatively better for coping with the multi-class classification problem. Finally, some conclusions and future research will be discussed at the end.


Author(s):  
Kaiyuan Wu ◽  
Zhiming Zheng ◽  
Shaoting Tang

In this paper, we propose a powerful weak learner (Vector Decision Tree (VDT)) and a new Boosted Vector Decision Tree (BVDT) algorithm framework for the task of multi-class classification. Unlike the traditional scalar valued boosting algorithms, the BVDT algorithm directly maps the feature space to the decision space in the multi-class setting, which facilitates convenient implementations of the multi-class classification algorithms using diverse loss functions. By viewing the explicit hard threshold on the leaf node value applied in the LogitBoost as a constraint optimization problem, we further develop two new variants of the BVDT algorithm: the [Formula: see text]-BVDT and the [Formula: see text]-BVDT. The performance of the proposed algorithm is evaluated on different datasets and compared with three state-of-the-art boosting algorithms, [Formula: see text]-Nearest Neighbor (KNN) and Support Vector Machine (SVM). The results show that the performance of the proposed algorithm ranks first in all but one dataset and reduces the test error rate by 4% up to 58% with respect to the state-of-the-art boosting algorithms based on the scalar-valued weak learner. Furthermore, we present a case study on the Abalone dataset by designing a new loss function that combines the negative log-likelihood loss function of classification problem and square loss function of regression problem.


Author(s):  
ZHI-XIA YANG

In this paper, we propose two Laplacian nonparallel hyperplane proximal classifiers (LapNPPCs) for semi-supervised and full-supervised classification problem respectively by adding manifold regularization terms. Due to the manifold regularization terms, our LapNPPCs are able to exploit the intrinsic structure of the patterns of the training set. Furthermore, our classifiers only need to solve two systems of linear equations rather than two quadratic programming (QP) problems as needed in Laplacian twin support vector machine (LapTSVM) (Z. Qi, Y. Tian and Y. Shi, Neural Netw.35 (2012) 46–53). Numerical experiments on toy and UCI benchmark datasets show that the accuracy of our LapNPPCs is comparable with other classifiers, such as the standard SVM, TWSVM and LapTSVM, etc. It is also the case that based on our LapNPPCs, some other TWSVM type classifiers with manifold regularization can be constructed by choosing different norms and loss functions to deal with semi-supervised binary and multi-class classification problems.


Author(s):  
Kanae Takahashi ◽  
Kouji Yamamoto ◽  
Aya Kuchiba ◽  
Tatsuki Koyama

AbstractA binary classification problem is common in medical field, and we often use sensitivity, specificity, accuracy, negative and positive predictive values as measures of performance of a binary predictor. In computer science, a classifier is usually evaluated with precision (positive predictive value) and recall (sensitivity). As a single summary measure of a classifier’s performance, F1 score, defined as the harmonic mean of precision and recall, is widely used in the context of information retrieval and information extraction evaluation since it possesses favorable characteristics, especially when the prevalence is low. Some statistical methods for inference have been developed for the F1 score in binary classification problems; however, they have not been extended to the problem of multi-class classification. There are three types of F1 scores, and statistical properties of these F1 scores have hardly ever been discussed. We propose methods based on the large sample multivariate central limit theorem for estimating F1 scores with confidence intervals.


2012 ◽  
Vol 532-533 ◽  
pp. 1191-1195 ◽  
Author(s):  
Zhen Yan Liu ◽  
Wei Ping Wang ◽  
Yong Wang

This paper introduces the design of a text categorization system based on Support Vector Machine (SVM). It analyzes the high dimensional characteristic of text data, the reason why SVM is suitable for text categorization. According to system data flow this system is constructed. This system consists of three subsystems which are text representation, classifier training and text classification. The core of this system is the classifier training, but text representation directly influences the currency of classifier and the performance of the system. Text feature vector space can be built by different kinds of feature selection and feature extraction methods. No research can indicate which one is the best method, so many feature selection and feature extraction methods are all developed in this system. For a specific classification task every feature selection method and every feature extraction method will be tested, and then a set of the best methods will be adopted.


2014 ◽  
Vol 519-520 ◽  
pp. 644-650
Author(s):  
Mian Shui Yu ◽  
Yu Xie ◽  
Xiao Meng Xie

Age classification based on facial images is attracting wide attention with its broad application to human-computer interaction (HCI). Since human senescence is a tremendously complex process, age classification is still a highly challenging issue. In our study, Local Directional Pattern (LDP) and Gabor wavelet transform were used to extract global and local facial features, respectively, that were fused based on information fusion theory. The Principal Component Analysis (PCA) method was used for dimensionality reduction of the fused features, to obtain a lower-dimensional age characteristic vector. A Support Vector Machine (SVM) multi-class classifier with Error Correcting Output Codes (ECOC) was proposed in the paper. This was aimed at multi-class classification problems, such as age classification. Experiments on a public FG-NET age database proved the efficiency of our method.


2020 ◽  
Vol 10 (12) ◽  
pp. 4176 ◽  
Author(s):  
Loris Nanni ◽  
Andrea Rigo ◽  
Alessandra Lumini ◽  
Sheryl Brahnam

In this work, we combine a Siamese neural network and different clustering techniques to generate a dissimilarity space that is then used to train an SVM for automated animal audio classification. The animal audio datasets used are (i) birds and (ii) cat sounds, which are freely available. We exploit different clustering methods to reduce the spectrograms in the dataset to a number of centroids that are used to generate the dissimilarity space through the Siamese network. Once computed, we use the dissimilarity space to generate a vector space representation of each pattern, which is then fed into an support vector machine (SVM) to classify a spectrogram by its dissimilarity vector. Our study shows that the proposed approach based on dissimilarity space performs well on both classification problems without ad-hoc optimization of the clustering methods. Moreover, results show that the fusion of CNN-based approaches applied to the animal audio classification problem works better than the stand-alone CNNs.


2013 ◽  
Vol 2013 ◽  
pp. 1-12
Author(s):  
Lev V. Utkin ◽  
Yulia A. Zhuk

A method for solving a classification problem when there is only partial information about some features is proposed. This partial information comprises the mean values of features for every class and the bounds of the features. In order to maximally exploit the available information, a set of probability distributions is constructed such that two distributions are selected from the set which define the minimax and minimin strategies. Random values of features are generated in accordance with the selected distributions by using the Monte Carlo technique. As a result, the classification problem is reduced to the standard model which is solved by means of the support vector machine. Numerical examples illustrate the proposed method.


Author(s):  
Nguyen The Cuong

In binary classification problems, two classes normally have different tendencies. More complex, the clusters in each class also tend to be different. Traditional algorithms as Support Vector Machine (SVM) or Twin Support Vector Machine (TWSVM) don't sufficiently exploit structural information with cluster granularity of the data, cause of restricts the capability of simulation of data trends. Structural twin support vector machine (S-TWSVM) sufficiently exploits structural information with cluster granularity of one class for learning a represented hyperplane of that class. This makes S-TWSVM's data simulation capabilities better than TWSVM. However, for the data type that each class consists of clusters of different trends, the capability of simulation of S-TWSVM is restricted. In this paper, we propose a new Hierarchical Multi Twin Support Vector Machine (called HM-TWSVM) for classification problem with each cluster-vs-class strategy. HM-TWSVM overcomes the limitations of S-TWSVM. Experiment results show that HM-TWSVM could describe the tendency of each cluster.


Sign in / Sign up

Export Citation Format

Share Document