scholarly journals A Novel Multiway Splits Decision Tree for Multiple Types of Data

2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Zhenyu Liu ◽  
Tao Wen ◽  
Wei Sun ◽  
Qilong Zhang

Classical decision trees such as C4.5 and CART partition the feature space using axis-parallel splits. Oblique decision trees use the oblique splits based on linear combinations of features to potentially simplify the boundary structure. Although oblique decision trees have higher generalization accuracy, most oblique split methods are not directly conducive to the categorical data and are computationally expensive. In this paper, we propose a multiway splits decision tree (MSDT) algorithm, which adopts feature weighting and clustering. This method can combine multiple numerical features, multiple categorical features, or multiple mixed features. Experimental results show that MSDT has excellent performance for multiple types of data.

Author(s):  
Marek Kretowski ◽  
Marek Grzes

Decision trees are, besides decision rules, one of the most popular forms of knowledge representation in Knowledge Discovery in Databases process (Fayyad, Piatetsky-Shapiro, Smyth & Uthurusamy, 1996) and implementations of the classical decision tree induction algorithms are included in the majority of data mining systems. A hierarchical structure of a tree-based classifier, where appropriate tests from consecutive nodes are subsequently applied, closely resembles a human way of decision making. This makes decision trees natural and easy to understand even for an inexperienced analyst. The popularity of the decision tree approach can also be explained by their ease of application, fast classification and what may be the most important, their effectiveness. Two main types of decision trees can be distinguished by the type of tests in non-terminal nodes: univariate and multivariate decision trees. In the first group, a single attribute is used in each test. For a continuousvalued feature usually an inequality test with binary outcomes is applied and for a nominal attribute mutually exclusive groups of attribute values are associated with outcomes. As a good representative of univariate inducers, the well-known C4.5 system developed by Quinlan (1993) should be mentioned. In univariate trees a split is equivalent to partitioning the feature space with an axis-parallel hyper-plane. If decision boundaries of a particular dataset are not axis-parallel, using such tests may lead to an overcomplicated classifier. This situation is known as the “staircase effect”. The problem can be mitigated by applying more sophisticated multivariate tests, where more than one feature can be taken into account. The most common form of such tests is an oblique split, which is based on a linear combination of features (hyper-plane). The decision tree which applies only oblique tests is often called oblique or linear, whereas heterogeneous trees with univariate, linear and other multivariate (e.g., instance-based) tests can be called mixed decision trees (Llora & Wilson, 2004). It should be emphasized that computational complexity of the multivariate induction is generally significantly higher than the univariate induction. CART (Breiman, Friedman, Olshen & Stone, 1984) and OC1 (Murthy, Kasif & Salzberg, 1994) are well known examples of multivariate systems.


Author(s):  
Bin-Bin Yang ◽  
Song-Qing Shen ◽  
Wei Gao

Decision trees have attracted much attention during the past decades. Previous decision trees include axis-parallel and oblique decision trees; both of them try to find the best splits via exhaustive search or heuristic algorithms in each iteration. Oblique decision trees generally simplify tree structure and take better performance, but are always accompanied with higher computation, as well as the initialization with the best axis-parallel splits. This work presents the Weighted Oblique Decision Tree (WODT) based on continuous optimization with random initialization. We consider different weights of each instance for child nodes at all internal nodes, and then obtain a split by optimizing the continuous and differentiable objective function of weighted information entropy. Extensive experiments show the effectiveness of the proposed algorithm.


1994 ◽  
Vol 2 ◽  
pp. 1-32 ◽  
Author(s):  
S. K. Murthy ◽  
S. Kasif ◽  
S. Salzberg

This article describes a new system for induction ofoblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a goodoblique split (in the form of a hyperplane) at each node of a decisiontree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We presentextensive empirical studies, using both real and artificial data, thatanalyze OC1's ability to construct oblique trees that are smaller and more accurate than their axis-parallel counterparts. We also examinethe benefits of randomization for the construction of oblique decisiontrees.


1986 ◽  
Vol 25 (04) ◽  
pp. 207-214 ◽  
Author(s):  
P. Glasziou

SummaryThe development of investigative strategies by decision analysis has been achieved by explicitly drawing the decision tree, either by hand or on computer. This paper discusses the feasibility of automatically generating and analysing decision trees from a description of the investigations and the treatment problem. The investigation of cholestatic jaundice is used to illustrate the technique.Methods to decrease the number of calculations required are presented. It is shown that this method makes practical the simultaneous study of at least half a dozen investigations. However, some new problems arise due to the possible complexity of the resulting optimal strategy. If protocol errors and delays due to testing are considered, simpler strategies become desirable. Generation and assessment of these simpler strategies are discussed with examples.


Author(s):  
Ferdinand Bollwein ◽  
Stephan Westphal

AbstractUnivariate decision tree induction methods for multiclass classification problems such as CART, C4.5 and ID3 continue to be very popular in the context of machine learning due to their major benefit of being easy to interpret. However, as these trees only consider a single attribute per node, they often get quite large which lowers their explanatory value. Oblique decision tree building algorithms, which divide the feature space by multidimensional hyperplanes, often produce much smaller trees but the individual splits are hard to interpret. Moreover, the effort of finding optimal oblique splits is very high such that heuristics have to be applied to determine local optimal solutions. In this work, we introduce an effective branch and bound procedure to determine global optimal bivariate oblique splits for concave impurity measures. Decision trees based on these bivariate oblique splits remain fairly interpretable due to the restriction to two attributes per split. The resulting trees are significantly smaller and more accurate than their univariate counterparts due to their ability of adapting better to the underlying data and capturing interactions of attribute pairs. Moreover, our evaluation shows that our algorithm even outperforms algorithms based on heuristically obtained multivariate oblique splits despite the fact that we are focusing on two attributes only.


2021 ◽  
Vol 11 (15) ◽  
pp. 6728
Author(s):  
Muhammad Asfand Hafeez ◽  
Muhammad Rashid ◽  
Hassan Tariq ◽  
Zain Ul Abideen ◽  
Saud S. Alotaibi ◽  
...  

Classification and regression are the major applications of machine learning algorithms which are widely used to solve problems in numerous domains of engineering and computer science. Different classifiers based on the optimization of the decision tree have been proposed, however, it is still evolving over time. This paper presents a novel and robust classifier based on a decision tree and tabu search algorithms, respectively. In the aim of improving performance, our proposed algorithm constructs multiple decision trees while employing a tabu search algorithm to consistently monitor the leaf and decision nodes in the corresponding decision trees. Additionally, the used tabu search algorithm is responsible to balance the entropy of the corresponding decision trees. For training the model, we used the clinical data of COVID-19 patients to predict whether a patient is suffering. The experimental results were obtained using our proposed classifier based on the built-in sci-kit learn library in Python. The extensive analysis for the performance comparison was presented using Big O and statistical analysis for conventional supervised machine learning algorithms. Moreover, the performance comparison to optimized state-of-the-art classifiers is also presented. The achieved accuracy of 98%, the required execution time of 55.6 ms and the area under receiver operating characteristic (AUROC) for proposed method of 0.95 reveals that the proposed classifier algorithm is convenient for large datasets.


2021 ◽  
Vol 54 (1) ◽  
pp. 1-38
Author(s):  
Víctor Adrián Sosa Hernández ◽  
Raúl Monroy ◽  
Miguel Angel Medina-Pérez ◽  
Octavio Loyola-González ◽  
Francisco Herrera

Experts from different domains have resorted to machine learning techniques to produce explainable models that support decision-making. Among existing techniques, decision trees have been useful in many application domains for classification. Decision trees can make decisions in a language that is closer to that of the experts. Many researchers have attempted to create better decision tree models by improving the components of the induction algorithm. One of the main components that have been studied and improved is the evaluation measure for candidate splits. In this article, we introduce a tutorial that explains decision tree induction. Then, we present an experimental framework to assess the performance of 21 evaluation measures that produce different C4.5 variants considering 110 databases, two performance measures, and 10× 10-fold cross-validation. Furthermore, we compare and rank the evaluation measures by using a Bayesian statistical analysis. From our experimental results, we present the first two performance rankings in the literature of C4.5 variants. Moreover, we organize the evaluation measures into two groups according to their performance. Finally, we introduce meta-models that automatically determine the group of evaluation measures to produce a C4.5 variant for a new database and some further opportunities for decision tree models.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2849
Author(s):  
Sungbum Jun

Due to the recent advance in the industrial Internet of Things (IoT) in manufacturing, the vast amount of data from sensors has triggered the need for leveraging such big data for fault detection. In particular, interpretable machine learning techniques, such as tree-based algorithms, have drawn attention to the need to implement reliable manufacturing systems, and identify the root causes of faults. However, despite the high interpretability of decision trees, tree-based models make a trade-off between accuracy and interpretability. In order to improve the tree’s performance while maintaining its interpretability, an evolutionary algorithm for discretization of multiple attributes, called Decision tree Improved by Multiple sPLits with Evolutionary algorithm for Discretization (DIMPLED), is proposed. The experimental results with two real-world datasets from sensors showed that the decision tree improved by DIMPLED outperformed the performances of single-decision-tree models (C4.5 and CART) that are widely used in practice, and it proved competitive compared to the ensemble methods, which have multiple decision trees. Even though the ensemble methods could produce slightly better performances, the proposed DIMPLED has a more interpretable structure, while maintaining an appropriate performance level.


2014 ◽  
Vol 6 (4) ◽  
pp. 346 ◽  
Author(s):  
Swathi Jamjala Narayanan ◽  
Rajen B. Bhatt ◽  
Ilango Paramasivam ◽  
M. Khalid ◽  
B.K. Tripathy

Sign in / Sign up

Export Citation Format

Share Document