A Novel Multiway Splits Decision Tree for Multiple Types of Data

Mathematical Problems in Engineering ◽

10.1155/2020/7870534 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Zhenyu Liu ◽

Tao Wen ◽

Wei Sun ◽

Qilong Zhang

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Feature Space ◽

Feature Weighting ◽

Boundary Structure ◽

Linear Combinations ◽

Mixed Features ◽

Axis Parallel ◽

Classical Decision ◽

Generalization Accuracy

Classical decision trees such as C4.5 and CART partition the feature space using axis-parallel splits. Oblique decision trees use the oblique splits based on linear combinations of features to potentially simplify the boundary structure. Although oblique decision trees have higher generalization accuracy, most oblique split methods are not directly conducive to the categorical data and are computationally expensive. In this paper, we propose a multiway splits decision tree (MSDT) algorithm, which adopts feature weighting and clustering. This method can combine multiple numerical features, multiple categorical features, or multiple mixed features. Experimental results show that MSDT has excellent performance for multiple types of data.

Download Full-text

Global Induction of Decision Trees

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch145 ◽

2011 ◽

pp. 937-942

Author(s):

Marek Kretowski ◽

Marek Grzes

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Decision Rules ◽

Knowledge Discovery In Databases ◽

Binary Outcomes ◽

Multivariate Tests ◽

Axis Parallel ◽

Hyper Plane ◽

Single Attribute ◽

Multivariate Systems

Decision trees are, besides decision rules, one of the most popular forms of knowledge representation in Knowledge Discovery in Databases process (Fayyad, Piatetsky-Shapiro, Smyth & Uthurusamy, 1996) and implementations of the classical decision tree induction algorithms are included in the majority of data mining systems. A hierarchical structure of a tree-based classifier, where appropriate tests from consecutive nodes are subsequently applied, closely resembles a human way of decision making. This makes decision trees natural and easy to understand even for an inexperienced analyst. The popularity of the decision tree approach can also be explained by their ease of application, fast classification and what may be the most important, their effectiveness. Two main types of decision trees can be distinguished by the type of tests in non-terminal nodes: univariate and multivariate decision trees. In the first group, a single attribute is used in each test. For a continuousvalued feature usually an inequality test with binary outcomes is applied and for a nominal attribute mutually exclusive groups of attribute values are associated with outcomes. As a good representative of univariate inducers, the well-known C4.5 system developed by Quinlan (1993) should be mentioned. In univariate trees a split is equivalent to partitioning the feature space with an axis-parallel hyper-plane. If decision boundaries of a particular dataset are not axis-parallel, using such tests may lead to an overcomplicated classifier. This situation is known as the “staircase effect”. The problem can be mitigated by applying more sophisticated multivariate tests, where more than one feature can be taken into account. The most common form of such tests is an oblique split, which is based on a linear combination of features (hyper-plane). The decision tree which applies only oblique tests is often called oblique or linear, whereas heterogeneous trees with univariate, linear and other multivariate (e.g., instance-based) tests can be called mixed decision trees (Llora & Wilson, 2004). It should be emphasized that computational complexity of the multivariate induction is generally significantly higher than the univariate induction. CART (Breiman, Friedman, Olshen & Stone, 1984) and OC1 (Murthy, Kasif & Salzberg, 1994) are well known examples of multivariate systems.

Download Full-text

Weighted Oblique Decision Trees

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015621 ◽

2019 ◽

Vol 33 ◽

pp. 5621-5627

Author(s):

Bin-Bin Yang ◽

Song-Qing Shen ◽

Wei Gao

Keyword(s):

Decision Tree ◽

Objective Function ◽

Decision Trees ◽

Information Entropy ◽

Heuristic Algorithms ◽

Continuous Optimization ◽

Tree Structure ◽

The Past ◽

Axis Parallel ◽

Random Initialization

Decision trees have attracted much attention during the past decades. Previous decision trees include axis-parallel and oblique decision trees; both of them try to find the best splits via exhaustive search or heuristic algorithms in each iteration. Oblique decision trees generally simplify tree structure and take better performance, but are always accompanied with higher computation, as well as the initialization with the best axis-parallel splits. This work presents the Weighted Oblique Decision Tree (WODT) based on continuous optimization with random initialization. We consider different weights of each instance for child nodes at all internal nodes, and then obtain a split by optimizing the continuous and differentiable objective function of weighted information entropy. Extensive experiments show the effectiveness of the proposed algorithm.

Download Full-text

A System for Induction of Oblique Decision Trees

Journal of Artificial Intelligence Research ◽

10.1613/jair.63 ◽

1994 ◽

Vol 2 ◽

pp. 1-32 ◽

Cited By ~ 432

Author(s):

S. K. Murthy ◽

S. Kasif ◽

S. Salzberg

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Empirical Studies ◽

Hill Climbing ◽

Artificial Data ◽

Axis Parallel ◽

Tree Methods ◽

Decision Tree Methods ◽

Numeric Attributes ◽

New System

This article describes a new system for induction ofoblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a goodoblique split (in the form of a hyperplane) at each node of a decisiontree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We presentextensive empirical studies, using both real and artificial data, thatanalyze OC1's ability to construct oblique trees that are smaller and more accurate than their axis-parallel counterparts. We also examinethe benefits of randomization for the construction of oblique decisiontrees.

Download Full-text

Automated Development of Clinical Strategies Using Multistage Decision Analysis

Methods of Information in Medicine ◽

10.1055/s-0038-1635469 ◽

1986 ◽

Vol 25 (04) ◽

pp. 207-214 ◽

Cited By ~ 3

Author(s):

P. Glasziou

Keyword(s):

Decision Tree ◽

Decision Analysis ◽

Decision Trees ◽

Optimal Strategy ◽

Cholestatic Jaundice ◽

Clinical Strategies ◽

Simultaneous Study

SummaryThe development of investigative strategies by decision analysis has been achieved by explicitly drawing the decision tree, either by hand or on computer. This paper discusses the feasibility of automatically generating and analysing decision trees from a description of the investigations and the treatment problem. The investigation of cholestatic jaundice is used to illustrate the technique.Methods to decrease the number of calculations required are presented. It is shown that this method makes practical the simultaneous study of at least half a dozen investigations. However, some new problems arise due to the possible complexity of the resulting optimal strategy. If protocol errors and delays due to testing are considered, simpler strategies become desirable. Generation and assessment of these simpler strategies are discussed with examples.

Download Full-text

A branch & bound algorithm to determine optimal bivariate splits for oblique decision tree induction

Applied Intelligence ◽

10.1007/s10489-021-02281-x ◽

2021 ◽

Author(s):

Ferdinand Bollwein ◽

Stephan Westphal

Keyword(s):

Decision Tree ◽

Feature Space ◽

Classification Problems ◽

Decision Tree Induction ◽

Single Attribute ◽

Global Optimal ◽

The Individual ◽

Tree Building ◽

Very High ◽

Multiclass Classification Problems

AbstractUnivariate decision tree induction methods for multiclass classification problems such as CART, C4.5 and ID3 continue to be very popular in the context of machine learning due to their major benefit of being easy to interpret. However, as these trees only consider a single attribute per node, they often get quite large which lowers their explanatory value. Oblique decision tree building algorithms, which divide the feature space by multidimensional hyperplanes, often produce much smaller trees but the individual splits are hard to interpret. Moreover, the effort of finding optimal oblique splits is very high such that heuristics have to be applied to determine local optimal solutions. In this work, we introduce an effective branch and bound procedure to determine global optimal bivariate oblique splits for concave impurity measures. Decision trees based on these bivariate oblique splits remain fairly interpretable due to the restriction to two attributes per split. The resulting trees are significantly smaller and more accurate than their univariate counterparts due to their ability of adapting better to the underlying data and capturing interactions of attribute pairs. Moreover, our evaluation shows that our algorithm even outperforms algorithms based on heuristically obtained multivariate oblique splits despite the fact that we are focusing on two attributes only.

Download Full-text

Robot Perceptual Classification Method Based on Mixed Features of Decision Tree and Random Forest

2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE) ◽

10.1109/icbaie52039.2021.9389973 ◽

2021 ◽

Author(s):

Yifan Song ◽

Jiankai Zuo ◽

Jiehong Wu ◽

Zeyuan Liu ◽

Ziheng Li

Keyword(s):

Random Forest ◽

Decision Tree ◽

Classification Method ◽

Perceptual Classification ◽

Mixed Features

Download Full-text

Performance Improvement of Decision Tree: A Robust Classifier Using Tabu Search Algorithm

Applied Sciences ◽

10.3390/app11156728 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6728

Author(s):

Muhammad Asfand Hafeez ◽

Muhammad Rashid ◽

Hassan Tariq ◽

Zain Ul Abideen ◽

Saud S. Alotaibi ◽

...

Keyword(s):

Machine Learning ◽

Tabu Search ◽

Decision Tree ◽

Decision Trees ◽

Search Algorithm ◽

Learning Algorithms ◽

Performance Comparison ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Tabu Search Algorithm

Classification and regression are the major applications of machine learning algorithms which are widely used to solve problems in numerous domains of engineering and computer science. Different classifiers based on the optimization of the decision tree have been proposed, however, it is still evolving over time. This paper presents a novel and robust classifier based on a decision tree and tabu search algorithms, respectively. In the aim of improving performance, our proposed algorithm constructs multiple decision trees while employing a tabu search algorithm to consistently monitor the leaf and decision nodes in the corresponding decision trees. Additionally, the used tabu search algorithm is responsible to balance the entropy of the corresponding decision trees. For training the model, we used the clinical data of COVID-19 patients to predict whether a patient is suffering. The experimental results were obtained using our proposed classifier based on the built-in sci-kit learn library in Python. The extensive analysis for the performance comparison was presented using Big O and statistical analysis for conventional supervised machine learning algorithms. Moreover, the performance comparison to optimized state-of-the-art classifiers is also presented. The achieved accuracy of 98%, the required execution time of 55.6 ms and the area under receiver operating characteristic (AUROC) for proposed method of 0.95 reveals that the proposed classifier algorithm is convenient for large datasets.

Download Full-text

A Practical Tutorial for Decision Tree Induction

ACM Computing Surveys ◽

10.1145/3429739 ◽

2021 ◽

Vol 54 (1) ◽

pp. 1-38

Author(s):

Víctor Adrián Sosa Hernández ◽

Raúl Monroy ◽

Miguel Angel Medina-Pérez ◽

Octavio Loyola-González ◽

Francisco Herrera

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Machine Learning Techniques ◽

Evaluation Measures ◽

Decision Tree Induction ◽

Learning Techniques ◽

Tree Models ◽

Evaluation Measure ◽

Main Components ◽

Support Decision Making

Experts from different domains have resorted to machine learning techniques to produce explainable models that support decision-making. Among existing techniques, decision trees have been useful in many application domains for classification. Decision trees can make decisions in a language that is closer to that of the experts. Many researchers have attempted to create better decision tree models by improving the components of the induction algorithm. One of the main components that have been studied and improved is the evaluation measure for candidate splits. In this article, we introduce a tutorial that explains decision tree induction. Then, we present an experimental framework to assess the performance of 21 evaluation measures that produce different C4.5 variants considering 110 databases, two performance measures, and 10× 10-fold cross-validation. Furthermore, we compare and rank the evaluation measures by using a Bayesian statistical analysis. From our experimental results, we present the first two performance rankings in the literature of C4.5 variants. Moreover, we organize the evaluation measures into two groups according to their performance. Finally, we introduce meta-models that automatically determine the group of evaluation measures to produce a C4.5 variant for a new database and some further opportunities for decision tree models.

Download Full-text

Evolutionary Algorithm for Improving Decision Tree with Global Discretization in Manufacturing

Sensors ◽

10.3390/s21082849 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2849

Author(s):

Sungbum Jun

Keyword(s):

Decision Tree ◽

Evolutionary Algorithm ◽

Decision Trees ◽

Manufacturing Systems ◽

Ensemble Methods ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Industrial Internet ◽

Tree Models ◽

Real World Datasets

Due to the recent advance in the industrial Internet of Things (IoT) in manufacturing, the vast amount of data from sensors has triggered the need for leveraging such big data for fault detection. In particular, interpretable machine learning techniques, such as tree-based algorithms, have drawn attention to the need to implement reliable manufacturing systems, and identify the root causes of faults. However, despite the high interpretability of decision trees, tree-based models make a trade-off between accuracy and interpretability. In order to improve the tree’s performance while maintaining its interpretability, an evolutionary algorithm for discretization of multiple attributes, called Decision tree Improved by Multiple sPLits with Evolutionary algorithm for Discretization (DIMPLED), is proposed. The experimental results with two real-world datasets from sensors showed that the decision tree improved by DIMPLED outperformed the performances of single-decision-tree models (C4.5 and CART) that are widely used in practice, and it proved competitive compared to the ensemble methods, which have multiple decision trees. Even though the ensemble methods could produce slightly better performances, the proposed DIMPLED has a more interpretable structure, while maintaining an appropriate performance level.

Download Full-text

Induction of fuzzy decision trees and its refinement using gradient projected-neuro-fuzzy decision tree

International Journal of Advanced Intelligence Paradigms ◽

10.1504/ijaip.2014.066983 ◽

2014 ◽

Vol 6 (4) ◽

pp. 346 ◽

Cited By ~ 6

Author(s):

Swathi Jamjala Narayanan ◽

Rajen B. Bhatt ◽

Ilango Paramasivam ◽

M. Khalid ◽

B.K. Tripathy

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Fuzzy Decision ◽

Fuzzy Decision Tree ◽

Neuro Fuzzy ◽

Fuzzy Decision Trees

Download Full-text