scholarly journals Many Are Better Than One: Improving Probabilistic Estimates from Decision Trees

Author(s):  
Nitesh V. Chawla
Keyword(s):  
1993 ◽  
Vol 02 (02) ◽  
pp. 219-234 ◽  
Author(s):  
ROBERT G. REYNOLDS ◽  
JONATHAN I. MALETIC

The Version Space Controlled Genetic Algorithms (VGA) uses the structure of the version space to cache generalizations about the performance history of chromosomes in the genetic algorithm. This cached experience is used to constrain the generation of new members of the genetic algorithms population. The VGA is shown to be a specific instantiation of a more general framework, Autonomous Learning Elements (ALE). The capabilities of the VGA system are demonstrated using the Boole problem suggested by Wilson [Wilson 1987]. The performance of the VGA is compared to that of decision trees and genetic algorithms. The results suggest that the VGA is able to exploit a certain set of symbiotic relationships between its components, so that the resulting system performs better than either component individually.


2020 ◽  
Vol 1 (1) ◽  
pp. 1-14
Author(s):  
Yousef Elgimati

The main focus of this paper is on the use of resampling techniques to construct predictive models from data and the goal is to identify the best possible model which can produce better predications. Bagging or Bootstrap aggregating is a general method for improving the performance of given learning algorithm by using a majority vote to combine multiple classifier outputs derived from a single classifier on a bootstrap resample version of a training set. A bootstrap sample is generated by a random sample with replacement from the original training set. Inspired by the idea of bagging, we present an improved method based on a distance function in decision trees, called modified bagging (or weighted Bagging) in this study. The experimental results show that modified bagging is superior to the usual majority vote. These results are confirmed by both real data and artificial data sets with random noise. The Modified bagged classifier performs significantly better than usual bagging on various tree levels for all sample sizes. An interesting observation is that the weighted bagging performs somewhat better than usual bagging with sumps.


2019 ◽  
Vol 5 (1) ◽  
pp. 1
Author(s):  
Saha Dauji

Single angle struts are used as compression members for many structures including roof trusses and transmission towers. The exact analysis and design of such members is challenging due to various uncertainties such as the end fixity or eccentricity of the applied loads. The design standards provide guidelines that have been found inaccurate towards the conservative side. Artificial Neural Networks (ANN) have been observed to perform better than the design standards, when trained with experimental data and this has been reported literature. However, practical implementation of ANN poses problem as the trained network as well as the knowhow regarding the application should be accessible to practitioners. In another data-driven tool, the Decision Trees (DT), the practical application is easier as decision based rules are generated, which are readily comprehended and implemented by designers. Hence, in this paper, DT was explored for the evaluation of capacity of eccentrically loaded single angle struts and was found to be robust and yielded comparable accuracy as ANN, and better than design code (AISC). This has enormous potential for easy and straightforward implementation by practicing engineers through the logic based decision rules, which would be easily programmable on computer. For this application, use of dimensionless ratios as inputs for the development of DT was found to yield better results when compared to the approach of using the original variables as inputs.


2010 ◽  
Vol 26-28 ◽  
pp. 776-779
Author(s):  
Wei She ◽  
Hong Li ◽  
Guo Qing Yu ◽  
Rui Deng

How to construct the “appropriate” split hyper-plane in test nodes is the key of building decision trees. Unlike a univariate decision tree, a multivariate (oblique) decision tree could find the hyper-plane that is not orthogonal to the features’ axes. In this paper, we re-explain the process of building test nodes in terms of geometry. Based on this, we propose a method of learning the hyper-plane with two stages. The tree (TSDT) induced in this way keeps the interpretability of univariate decision trees and the trait of multivariate decision trees which could find oblique hyper-plane. The tests of the impact of Combination methods tell us that TSDT based combination algorithm is much better than other tree based combination methods in accuracy.


Risks ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. 42 ◽  
Author(s):  
Mohamed Hanafy ◽  
Ruixing Ming

The growing trend in the number and severity of auto insurance claims creates a need for new methods to efficiently handle these claims. Machine learning (ML) is one of the methods that solves this problem. As car insurers aim to improve their customer service, these companies have started adopting and applying ML to enhance the interpretation and comprehension of their data for efficiency, thus improving their customer service through a better understanding of their needs. This study considers how automotive insurance providers incorporate machinery learning in their company, and explores how ML models can apply to insurance big data. We utilize various ML methods, such as logistic regression, XGBoost, random forest, decision trees, naïve Bayes, and K-NN, to predict claim occurrence. Furthermore, we evaluate and compare these models’ performances. The results showed that RF is better than other methods with the accuracy, kappa, and AUC values of 0.8677, 0.7117, and 0.840, respectively.


Entropy ◽  
2021 ◽  
Vol 23 (12) ◽  
pp. 1641
Author(s):  
Mohammad Azad ◽  
Igor Chikalov ◽  
Shahid Hussain ◽  
Mikhail Moshkov ◽  
Beata Zielosko

Conventional decision trees use queries each of which is based on one attribute. In this study, we also examine decision trees that handle additional queries based on hypotheses. This kind of query is similar to the equivalence queries considered in exact learning. Earlier, we designed dynamic programming algorithms for the computation of the minimum depth and the minimum number of internal nodes in decision trees that have hypotheses. Modification of these algorithms considered in the present paper permits us to build decision trees with hypotheses that are optimal relative to the depth or relative to the number of the internal nodes. We compare the length and coverage of decision rules extracted from optimal decision trees with hypotheses and decision rules extracted from optimal conventional decision trees to choose the ones that are preferable as a tool for the representation of information. To this end, we conduct computer experiments on various decision tables from the UCI Machine Learning Repository. In addition, we also consider decision tables for randomly generated Boolean functions. The collected results show that the decision rules derived from decision trees with hypotheses in many cases are better than the rules extracted from conventional decision trees.


2016 ◽  
Vol 1 (1) ◽  
pp. 30-35 ◽  
Author(s):  
Sahil Sharma ◽  
Vinod Sharma

Classification is an important supervised learning technique that is used by many applications. An important factor on which the performance of a classifier depends is the size of the dataset using which the classifier is going to be trained. In this manuscript the authors analyzed five different classification techniques (namely decision trees, KNN, SVM, linear discriminant and Ensemble method) in terms of AUC and predictive accuracy when trained using small datasets with different dimensionalities. The study was done using a dataset with 24 features and 400 instances (samples). The results showed that in general ensemble method (using boosted trees) performed better than others but its performance degraded a bit with reduced dimensionality.


2014 ◽  
Vol 644-650 ◽  
pp. 2551-2555
Author(s):  
Rong Xiang Li ◽  
Zeng Lei Zhang ◽  
Yun Liu ◽  
Shan Chao Tu

The Basic Principles of Data mining Decision-tree ID3 is opened out. The main deficiencies are analysed. An improved algorithm based on the ID3 is calculated. For fault diagnosis of engine exemple, traditional ID3 algorithm and the improved algorithm are applied to estimate the fault diagnosis of engine separately. Decision Trees of traditional ID3 algorithm and the improved algorithm are construct. Experiment result display the accuracy of improved algorithm is better than traditional ID3. The improved algorithm is more fit to applied to the equipment fault diagnosis.


1972 ◽  
Vol 1 ◽  
pp. 27-38
Author(s):  
J. Hers

In South Africa the modern outlook towards time may be said to have started in 1948. Both the two major observatories, The Royal Observatory in Cape Town and the Union Observatory (now known as the Republic Observatory) in Johannesburg had, of course, been involved in the astronomical determination of time almost from their inception, and the Johannesburg Observatory has been responsible for the official time of South Africa since 1908. However the pendulum clocks then in use could not be relied on to provide an accuracy better than about 1/10 second, which was of the same order as that of the astronomical observations. It is doubtful if much use was made of even this limited accuracy outside the two observatories, and although there may – occasionally have been a demand for more accurate time, it was certainly not voiced.


Sign in / Sign up

Export Citation Format

Share Document