scholarly journals Forest Pruning Based on Branch Importance

2017 ◽  
Vol 2017 ◽  
pp. 1-11 ◽  
Author(s):  
Xiangkui Jiang ◽  
Chang-an Wu ◽  
Huaping Guo

A forest is an ensemble with decision trees as members. This paper proposes a novel strategy to pruning forest to enhance ensemble generalization ability and reduce ensemble size. Unlike conventional ensemble pruning approaches, the proposed method tries to evaluate the importance of branches of trees with respect to the whole ensemble using a novel proposed metric called importance gain. The importance of a branch is designed by considering ensemble accuracy and the diversity of ensemble members, and thus the metric reasonably evaluates how much improvement of the ensemble accuracy can be achieved when a branch is pruned. Our experiments show that the proposed method can significantly reduce ensemble size and improve ensemble accuracy, no matter whether ensembles are constructed by a certain algorithm such as bagging or obtained by an ensemble selection algorithm, no matter whether each decision tree is pruned or unpruned.

2009 ◽  
Vol 18 (04) ◽  
pp. 613-620 ◽  
Author(s):  
ADAM H. PETERSON ◽  
TONY R. MARTINEZ

This research presents a new learning model, the Parallel Decision DAG (PDDAG), and shows how to use it to represent an ensemble of decision trees while using significantly less storage. Ensembles such as Bagging and Boosting have a high probability of encoding redundant data structures, and PDDAGs provide a way to remove this redundancy in decision tree based ensembles. When trained by encoding an ensemble, the new model behaves similar to the original ensemble, and can be made to perform identically to it. The reduced storage requirements allow an ensemble approach to be used in cases where storage requirements would normally be exceeded, and the smaller model can potentially execute faster by reducing redundant computation.


Author(s):  
Syed Muzamil Basha ◽  
Dharmendra Singh Rajput ◽  
N. Ch. S. N. Iyengar

In this chapter, the authors show how to build a decision tree from given real-time data. They interpret the output of decision tree by learning decision tree classifier using really recursive greedy algorithm. Feature selection is made based on classification error using the algorithm called feature split selection algorithm (FSSA), with all different possible stopping conditions for splitting. The authors perform prediction with decision trees using decision tree prediction algorithm (DTPA), followed by multiclass predictions and their probabilities. Finally, they perform splitting procedure on real continuous value input using threshold split selection algorithm (TSSA).


1986 ◽  
Vol 25 (04) ◽  
pp. 207-214 ◽  
Author(s):  
P. Glasziou

SummaryThe development of investigative strategies by decision analysis has been achieved by explicitly drawing the decision tree, either by hand or on computer. This paper discusses the feasibility of automatically generating and analysing decision trees from a description of the investigations and the treatment problem. The investigation of cholestatic jaundice is used to illustrate the technique.Methods to decrease the number of calculations required are presented. It is shown that this method makes practical the simultaneous study of at least half a dozen investigations. However, some new problems arise due to the possible complexity of the resulting optimal strategy. If protocol errors and delays due to testing are considered, simpler strategies become desirable. Generation and assessment of these simpler strategies are discussed with examples.


2021 ◽  
Vol 11 (15) ◽  
pp. 6728
Author(s):  
Muhammad Asfand Hafeez ◽  
Muhammad Rashid ◽  
Hassan Tariq ◽  
Zain Ul Abideen ◽  
Saud S. Alotaibi ◽  
...  

Classification and regression are the major applications of machine learning algorithms which are widely used to solve problems in numerous domains of engineering and computer science. Different classifiers based on the optimization of the decision tree have been proposed, however, it is still evolving over time. This paper presents a novel and robust classifier based on a decision tree and tabu search algorithms, respectively. In the aim of improving performance, our proposed algorithm constructs multiple decision trees while employing a tabu search algorithm to consistently monitor the leaf and decision nodes in the corresponding decision trees. Additionally, the used tabu search algorithm is responsible to balance the entropy of the corresponding decision trees. For training the model, we used the clinical data of COVID-19 patients to predict whether a patient is suffering. The experimental results were obtained using our proposed classifier based on the built-in sci-kit learn library in Python. The extensive analysis for the performance comparison was presented using Big O and statistical analysis for conventional supervised machine learning algorithms. Moreover, the performance comparison to optimized state-of-the-art classifiers is also presented. The achieved accuracy of 98%, the required execution time of 55.6 ms and the area under receiver operating characteristic (AUROC) for proposed method of 0.95 reveals that the proposed classifier algorithm is convenient for large datasets.


2021 ◽  
Vol 54 (1) ◽  
pp. 1-38
Author(s):  
Víctor Adrián Sosa Hernández ◽  
Raúl Monroy ◽  
Miguel Angel Medina-Pérez ◽  
Octavio Loyola-González ◽  
Francisco Herrera

Experts from different domains have resorted to machine learning techniques to produce explainable models that support decision-making. Among existing techniques, decision trees have been useful in many application domains for classification. Decision trees can make decisions in a language that is closer to that of the experts. Many researchers have attempted to create better decision tree models by improving the components of the induction algorithm. One of the main components that have been studied and improved is the evaluation measure for candidate splits. In this article, we introduce a tutorial that explains decision tree induction. Then, we present an experimental framework to assess the performance of 21 evaluation measures that produce different C4.5 variants considering 110 databases, two performance measures, and 10× 10-fold cross-validation. Furthermore, we compare and rank the evaluation measures by using a Bayesian statistical analysis. From our experimental results, we present the first two performance rankings in the literature of C4.5 variants. Moreover, we organize the evaluation measures into two groups according to their performance. Finally, we introduce meta-models that automatically determine the group of evaluation measures to produce a C4.5 variant for a new database and some further opportunities for decision tree models.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2849
Author(s):  
Sungbum Jun

Due to the recent advance in the industrial Internet of Things (IoT) in manufacturing, the vast amount of data from sensors has triggered the need for leveraging such big data for fault detection. In particular, interpretable machine learning techniques, such as tree-based algorithms, have drawn attention to the need to implement reliable manufacturing systems, and identify the root causes of faults. However, despite the high interpretability of decision trees, tree-based models make a trade-off between accuracy and interpretability. In order to improve the tree’s performance while maintaining its interpretability, an evolutionary algorithm for discretization of multiple attributes, called Decision tree Improved by Multiple sPLits with Evolutionary algorithm for Discretization (DIMPLED), is proposed. The experimental results with two real-world datasets from sensors showed that the decision tree improved by DIMPLED outperformed the performances of single-decision-tree models (C4.5 and CART) that are widely used in practice, and it proved competitive compared to the ensemble methods, which have multiple decision trees. Even though the ensemble methods could produce slightly better performances, the proposed DIMPLED has a more interpretable structure, while maintaining an appropriate performance level.


2014 ◽  
Vol 6 (4) ◽  
pp. 346 ◽  
Author(s):  
Swathi Jamjala Narayanan ◽  
Rajen B. Bhatt ◽  
Ilango Paramasivam ◽  
M. Khalid ◽  
B.K. Tripathy

10.37236/1900 ◽  
2005 ◽  
Vol 12 (1) ◽  
Author(s):  
Jakob Jonsson

We consider topological aspects of decision trees on simplicial complexes, concentrating on how to use decision trees as a tool in topological combinatorics. By Robin Forman's discrete Morse theory, the number of evasive faces of a given dimension $i$ with respect to a decision tree on a simplicial complex is greater than or equal to the $i$th reduced Betti number (over any field) of the complex. Under certain favorable circumstances, a simplicial complex admits an "optimal" decision tree such that equality holds for each $i$; we may hence read off the homology directly from the tree. We provide a recursive definition of the class of semi-nonevasive simplicial complexes with this property. A certain generalization turns out to yield the class of semi-collapsible simplicial complexes that admit an optimal discrete Morse function in the analogous sense. In addition, we develop some elementary theory about semi-nonevasive and semi-collapsible complexes. Finally, we provide explicit optimal decision trees for several well-known simplicial complexes.


2018 ◽  
Vol 2 (2) ◽  
pp. 167
Author(s):  
Marko Ferdian Salim ◽  
Sugeng Sugeng

Latar Belakang: Diabetes mellitus adalah penyakit kronis yang mempengaruhi beban ekonomi dan sosial secara luas. Data pasien dicatat melalui sistem rekam medis pasien yang tersimpan dalam database sistem informasi rumah sakit, data yang tercatat belum dianalisis secara efektif untuk menghasilkan informasi yang berharga. Teknik data mining bisa digunakan untuk menghasilkan informasi yang berharga tersebut.Tujuan: Mengidentifikasi karakteristik pasien Diabetes mellitus, kecenderungan dan tipe Diabetes melitus melalui penerapan teknik data mining di RSUP Dr. Sardjito Yogyakarta.Metode: Penelitian ini merupakan penelitian deskriptif observasional dengan rancangan cross sectional. Teknik pengumpulan data dilakukan secara retrospektif melalui observasi dan studi dokumentasi rekam medis elektronik di RSUP Dr. Sardjito Yogyakarta. Data yang terkumpul kemudian dilakukan analisis dengan menggunakan aplikasi Weka.Hasil: Pasien Diabetes mellitus di RSUP Dr. Sardjito tahun 2011-2016 berjumlah 1.554 orang dengan tren yang cenderung menurun. Pasien paling banyak berusia 56 - 63 tahun (27,86%). Kejadian Diabetes mellitus didominasi oleh Diabetes mellitus tipe 2 dengan komplikasi tertinggi adalah hipertensi, nefropati, dan neuropati. Dengan menggunakan teknik data mining dengan algoritma decision tree J48 (akurasi 88.42%) untuk analisis rekam medis pasien telah menghasilkan beberapa rule.Kesimpulan: Teknik klasifikasi data mining (akurasi 88.42%) dan decision trees telah berhasil mengidentifikasi karakteristik pasien dan menemukan beberapa rules yang dapat digunakan pihak rumah sakit dalam pengambilan keputusan mengenai penyakit Diabetes mellitus.


Sign in / Sign up

Export Citation Format

Share Document