Forest Pruning Based on Branch Importance

Computational Intelligence and Neuroscience ◽

10.1155/2017/3162571 ◽

2017 ◽

Vol 2017 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Xiangkui Jiang ◽

Chang-an Wu ◽

Huaping Guo

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Selection Algorithm ◽

Ensemble Size ◽

Ensemble Pruning ◽

Generalization Ability ◽

Ensemble Selection ◽

Novel Strategy

A forest is an ensemble with decision trees as members. This paper proposes a novel strategy to pruning forest to enhance ensemble generalization ability and reduce ensemble size. Unlike conventional ensemble pruning approaches, the proposed method tries to evaluate the importance of branches of trees with respect to the whole ensemble using a novel proposed metric called importance gain. The importance of a branch is designed by considering ensemble accuracy and the diversity of ensemble members, and thus the metric reasonably evaluates how much improvement of the ensemble accuracy can be achieved when a branch is pruned. Our experiments show that the proposed method can significantly reduce ensemble size and improve ensemble accuracy, no matter whether ensembles are constructed by a certain algorithm such as bagging or obtained by an ensemble selection algorithm, no matter whether each decision tree is pruned or unpruned.

Download Full-text

REDUCING DECISION TREE ENSEMBLE SIZE USING PARALLEL DECISION DAGS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213009000305 ◽

2009 ◽

Vol 18 (04) ◽

pp. 613-620 ◽

Cited By ~ 3

Author(s):

ADAM H. PETERSON ◽

TONY R. MARTINEZ

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Data Structures ◽

High Probability ◽

Learning Model ◽

Ensemble Size ◽

New Model ◽

Ensemble Approach ◽

Redundant Data ◽

New Learning

This research presents a new learning model, the Parallel Decision DAG (PDDAG), and shows how to use it to represent an ensemble of decision trees while using significantly less storage. Ensembles such as Bagging and Boosting have a high probability of encoding redundant data structures, and PDDAGs provide a way to remove this redundancy in decision tree based ensembles. When trained by encoding an ensemble, the new model behaves similar to the original ensemble, and can be made to perform identically to it. The reduced storage requirements allow an ensemble approach to be used in cases where storage requirements would normally be exceeded, and the smaller model can potentially execute faster by reducing redundant computation.

Download Full-text

Conceptual Approach to Predict Loan Defaults Using Decision Trees

Advances in Business Information Systems and Analytics - Sentiment Analysis and Knowledge Discovery in Contemporary Business ◽

10.4018/978-1-5225-4999-4.ch009 ◽

2019 ◽

pp. 148-161 ◽

Cited By ~ 1

Author(s):

Syed Muzamil Basha ◽

Dharmendra Singh Rajput ◽

N. Ch. S. N. Iyengar

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Prediction Algorithm ◽

Classification Error ◽

Selection Algorithm ◽

Decision Tree Classifier ◽

Time Data ◽

Conceptual Approach ◽

Tree Classifier ◽

Loan Defaults

In this chapter, the authors show how to build a decision tree from given real-time data. They interpret the output of decision tree by learning decision tree classifier using really recursive greedy algorithm. Feature selection is made based on classification error using the algorithm called feature split selection algorithm (FSSA), with all different possible stopping conditions for splitting. The authors perform prediction with decision trees using decision tree prediction algorithm (DTPA), followed by multiclass predictions and their probabilities. Finally, they perform splitting procedure on real continuous value input using threshold split selection algorithm (TSSA).

Download Full-text

Automated Development of Clinical Strategies Using Multistage Decision Analysis

Methods of Information in Medicine ◽

10.1055/s-0038-1635469 ◽

1986 ◽

Vol 25 (04) ◽

pp. 207-214 ◽

Cited By ~ 3

Author(s):

P. Glasziou

Keyword(s):

Decision Tree ◽

Decision Analysis ◽

Decision Trees ◽

Optimal Strategy ◽

Cholestatic Jaundice ◽

Clinical Strategies ◽

Simultaneous Study

SummaryThe development of investigative strategies by decision analysis has been achieved by explicitly drawing the decision tree, either by hand or on computer. This paper discusses the feasibility of automatically generating and analysing decision trees from a description of the investigations and the treatment problem. The investigation of cholestatic jaundice is used to illustrate the technique.Methods to decrease the number of calculations required are presented. It is shown that this method makes practical the simultaneous study of at least half a dozen investigations. However, some new problems arise due to the possible complexity of the resulting optimal strategy. If protocol errors and delays due to testing are considered, simpler strategies become desirable. Generation and assessment of these simpler strategies are discussed with examples.

Download Full-text

Performance Improvement of Decision Tree: A Robust Classifier Using Tabu Search Algorithm

Applied Sciences ◽

10.3390/app11156728 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6728

Author(s):

Muhammad Asfand Hafeez ◽

Muhammad Rashid ◽

Hassan Tariq ◽

Zain Ul Abideen ◽

Saud S. Alotaibi ◽

...

Keyword(s):

Machine Learning ◽

Tabu Search ◽

Decision Tree ◽

Decision Trees ◽

Search Algorithm ◽

Learning Algorithms ◽

Performance Comparison ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Tabu Search Algorithm

Classification and regression are the major applications of machine learning algorithms which are widely used to solve problems in numerous domains of engineering and computer science. Different classifiers based on the optimization of the decision tree have been proposed, however, it is still evolving over time. This paper presents a novel and robust classifier based on a decision tree and tabu search algorithms, respectively. In the aim of improving performance, our proposed algorithm constructs multiple decision trees while employing a tabu search algorithm to consistently monitor the leaf and decision nodes in the corresponding decision trees. Additionally, the used tabu search algorithm is responsible to balance the entropy of the corresponding decision trees. For training the model, we used the clinical data of COVID-19 patients to predict whether a patient is suffering. The experimental results were obtained using our proposed classifier based on the built-in sci-kit learn library in Python. The extensive analysis for the performance comparison was presented using Big O and statistical analysis for conventional supervised machine learning algorithms. Moreover, the performance comparison to optimized state-of-the-art classifiers is also presented. The achieved accuracy of 98%, the required execution time of 55.6 ms and the area under receiver operating characteristic (AUROC) for proposed method of 0.95 reveals that the proposed classifier algorithm is convenient for large datasets.

Download Full-text

A Practical Tutorial for Decision Tree Induction

ACM Computing Surveys ◽

10.1145/3429739 ◽

2021 ◽

Vol 54 (1) ◽

pp. 1-38

Author(s):

Víctor Adrián Sosa Hernández ◽

Raúl Monroy ◽

Miguel Angel Medina-Pérez ◽

Octavio Loyola-González ◽

Francisco Herrera

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Machine Learning Techniques ◽

Evaluation Measures ◽

Decision Tree Induction ◽

Learning Techniques ◽

Tree Models ◽

Evaluation Measure ◽

Main Components ◽

Support Decision Making

Experts from different domains have resorted to machine learning techniques to produce explainable models that support decision-making. Among existing techniques, decision trees have been useful in many application domains for classification. Decision trees can make decisions in a language that is closer to that of the experts. Many researchers have attempted to create better decision tree models by improving the components of the induction algorithm. One of the main components that have been studied and improved is the evaluation measure for candidate splits. In this article, we introduce a tutorial that explains decision tree induction. Then, we present an experimental framework to assess the performance of 21 evaluation measures that produce different C4.5 variants considering 110 databases, two performance measures, and 10× 10-fold cross-validation. Furthermore, we compare and rank the evaluation measures by using a Bayesian statistical analysis. From our experimental results, we present the first two performance rankings in the literature of C4.5 variants. Moreover, we organize the evaluation measures into two groups according to their performance. Finally, we introduce meta-models that automatically determine the group of evaluation measures to produce a C4.5 variant for a new database and some further opportunities for decision tree models.

Download Full-text

Analysis of the generalization ability of a full decision tree

Computational Mathematics and Mathematical Physics ◽

10.1134/s0965542514060074 ◽

2014 ◽

Vol 54 (6) ◽

pp. 1046-1059

Author(s):

I. E. Genrikhov

Keyword(s):

Decision Tree ◽

Generalization Ability ◽

Full Decision Tree

Download Full-text

Evolutionary Algorithm for Improving Decision Tree with Global Discretization in Manufacturing

Sensors ◽

10.3390/s21082849 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2849

Author(s):

Sungbum Jun

Keyword(s):

Decision Tree ◽

Evolutionary Algorithm ◽

Decision Trees ◽

Manufacturing Systems ◽

Ensemble Methods ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Industrial Internet ◽

Tree Models ◽

Real World Datasets

Due to the recent advance in the industrial Internet of Things (IoT) in manufacturing, the vast amount of data from sensors has triggered the need for leveraging such big data for fault detection. In particular, interpretable machine learning techniques, such as tree-based algorithms, have drawn attention to the need to implement reliable manufacturing systems, and identify the root causes of faults. However, despite the high interpretability of decision trees, tree-based models make a trade-off between accuracy and interpretability. In order to improve the tree’s performance while maintaining its interpretability, an evolutionary algorithm for discretization of multiple attributes, called Decision tree Improved by Multiple sPLits with Evolutionary algorithm for Discretization (DIMPLED), is proposed. The experimental results with two real-world datasets from sensors showed that the decision tree improved by DIMPLED outperformed the performances of single-decision-tree models (C4.5 and CART) that are widely used in practice, and it proved competitive compared to the ensemble methods, which have multiple decision trees. Even though the ensemble methods could produce slightly better performances, the proposed DIMPLED has a more interpretable structure, while maintaining an appropriate performance level.

Download Full-text

Induction of fuzzy decision trees and its refinement using gradient projected-neuro-fuzzy decision tree

International Journal of Advanced Intelligence Paradigms ◽

10.1504/ijaip.2014.066983 ◽

2014 ◽

Vol 6 (4) ◽

pp. 346 ◽

Cited By ~ 6

Author(s):

Swathi Jamjala Narayanan ◽

Rajen B. Bhatt ◽

Ilango Paramasivam ◽

M. Khalid ◽

B.K. Tripathy

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Fuzzy Decision ◽

Fuzzy Decision Tree ◽

Neuro Fuzzy ◽

Fuzzy Decision Trees

Download Full-text

Optimal Decision Trees on Simplicial Complexes

The Electronic Journal of Combinatorics ◽

10.37236/1900 ◽

2005 ◽

Vol 12 (1) ◽

Cited By ~ 1

Author(s):

Jakob Jonsson

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Simplicial Complex ◽

Elementary Theory ◽

Simplicial Complexes ◽

Optimal Decision ◽

Property A ◽

Recursive Definition ◽

Topological Combinatorics ◽

Definition Of

We consider topological aspects of decision trees on simplicial complexes, concentrating on how to use decision trees as a tool in topological combinatorics. By Robin Forman's discrete Morse theory, the number of evasive faces of a given dimension $i$ with respect to a decision tree on a simplicial complex is greater than or equal to the $i$th reduced Betti number (over any field) of the complex. Under certain favorable circumstances, a simplicial complex admits an "optimal" decision tree such that equality holds for each $i$; we may hence read off the homology directly from the tree. We provide a recursive definition of the class of semi-nonevasive simplicial complexes with this property. A certain generalization turns out to yield the class of semi-collapsible simplicial complexes that admit an optimal discrete Morse function in the analogous sense. In addition, we develop some elementary theory about semi-nonevasive and semi-collapsible complexes. Finally, we provide explicit optimal decision trees for several well-known simplicial complexes.

Download Full-text

Analisis Rekam Medis Pasien Diabetes Mellitus Melalui Implementasi Teknik Data Mining di RSUP Dr. Sardjito Yogyakarta

Jurnal Kesehatan Vokasional ◽

10.22146/jkesvo.30331 ◽

2018 ◽

Vol 2 (2) ◽

pp. 167

Author(s):

Marko Ferdian Salim ◽

Sugeng Sugeng

Keyword(s):

Diabetes Mellitus ◽

Data Mining ◽

Decision Tree ◽

Decision Trees ◽

Cross Sectional ◽

Diabetes Melitus

Latar Belakang: Diabetes mellitus adalah penyakit kronis yang mempengaruhi beban ekonomi dan sosial secara luas. Data pasien dicatat melalui sistem rekam medis pasien yang tersimpan dalam database sistem informasi rumah sakit, data yang tercatat belum dianalisis secara efektif untuk menghasilkan informasi yang berharga. Teknik data mining bisa digunakan untuk menghasilkan informasi yang berharga tersebut.Tujuan: Mengidentifikasi karakteristik pasien Diabetes mellitus, kecenderungan dan tipe Diabetes melitus melalui penerapan teknik data mining di RSUP Dr. Sardjito Yogyakarta.Metode: Penelitian ini merupakan penelitian deskriptif observasional dengan rancangan cross sectional. Teknik pengumpulan data dilakukan secara retrospektif melalui observasi dan studi dokumentasi rekam medis elektronik di RSUP Dr. Sardjito Yogyakarta. Data yang terkumpul kemudian dilakukan analisis dengan menggunakan aplikasi Weka.Hasil: Pasien Diabetes mellitus di RSUP Dr. Sardjito tahun 2011-2016 berjumlah 1.554 orang dengan tren yang cenderung menurun. Pasien paling banyak berusia 56 - 63 tahun (27,86%). Kejadian Diabetes mellitus didominasi oleh Diabetes mellitus tipe 2 dengan komplikasi tertinggi adalah hipertensi, nefropati, dan neuropati. Dengan menggunakan teknik data mining dengan algoritma decision tree J48 (akurasi 88.42%) untuk analisis rekam medis pasien telah menghasilkan beberapa rule.Kesimpulan: Teknik klasifikasi data mining (akurasi 88.42%) dan decision trees telah berhasil mengidentifikasi karakteristik pasien dan menemukan beberapa rules yang dapat digunakan pihak rumah sakit dalam pengambilan keputusan mengenai penyakit Diabetes mellitus.

Download Full-text