Validation of machine learning techniques: decision trees and finite training set

Geoffrey A. W. West

doi:10.1117/1.482630

Machine Learning Techniques Applied to Profile Mobile Banking Users in India

International Journal of Information Systems in the Service Sector ◽

10.4018/jisss.2013010105 ◽

2013 ◽

Vol 5 (1) ◽

pp. 82-92 ◽

Cited By ~ 8

Author(s):

M. Carr ◽

V. Ravi ◽

G. Sridharan Reddy ◽

D. Veranna

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Decision Tree ◽

Decision Trees ◽

Multilayer Perceptron ◽

Machine Learning Techniques ◽

Mobile Banking ◽

Classification Rules ◽

Learning Techniques ◽

Potential Customers

This paper profiles mobile banking users using machine learning techniques viz. Decision Tree, Logistic Regression, Multilayer Perceptron, and SVM to test a research model with fourteen independent variables and a dependent variable (adoption). A survey was conducted and the results were analysed using these techniques. Using Decision Trees the profile of the mobile banking adopter’s profile was identified. Comparing different machine learning techniques it was found that Decision Trees outperformed the Logistic Regression and Multilayer Perceptron and SVM. Out of all the techniques, Decision Tree is recommended for profiling studies because apart from obtaining high accurate results, it also yields ‘if–then’ classification rules. The classification rules provided here can be used to target potential customers to adopt mobile banking by offering them appropriate incentives.

Download Full-text

A Hybrid Vision-Map Method for Urban Road Detection

Journal of Advanced Transportation ◽

10.1155/2017/7090549 ◽

2017 ◽

Vol 2017 ◽

pp. 1-21 ◽

Cited By ~ 6

Author(s):

Carlos Fernández ◽

David Fernández-Llorca ◽

Miguel A. Sotelo

Keyword(s):

Machine Learning ◽

Urban Environments ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Classification Problems ◽

Road Detection ◽

Training Set ◽

Digital Maps ◽

The Road ◽

Learning Techniques

A hybrid vision-map system is presented to solve the road detection problem in urban scenarios. The standardized use of machine learning techniques in classification problems has been merged with digital navigation map information to increase system robustness. The objective of this paper is to create a new environment perception method to detect the road in urban environments, fusing stereo vision with digital maps by detecting road appearance and road limits such as lane markings or curbs. Deep learning approaches make the system hard-coupled to the training set. Even though our approach is based on machine learning techniques, the features are calculated from different sources (GPS, map, curbs, etc.), making our system less dependent on the training set.

Download Full-text

Development and Optimization of VGF-GaAs Crystal Growth Process Using Data Mining and Machine Learning Techniques

Crystals ◽

10.3390/cryst11101218 ◽

2021 ◽

Vol 11 (10) ◽

pp. 1218

Author(s):

Natasha Dropka ◽

Klaus Böttcher ◽

Martin Holena

Keyword(s):

Machine Learning ◽

Data Mining ◽

Crystal Growth ◽

Decision Trees ◽

Growth Process ◽

Training Data ◽

Machine Learning Techniques ◽

Interface Position ◽

Crystal Growth Process ◽

Learning Techniques

The aim of this study was to assess the ability of the various data mining and supervised machine learning techniques: correlation analysis, k-means clustering, principal component analysis and decision trees (regression and classification), to derive, optimize and understand the factors influencing VGF-GaAs growth. Training data were generated by Computational Fluid Dynamics (CFD) simulations and consisted of 130 datasets with 6 inputs (growth rate and power of 5 heaters) and 5 outputs (interface position and deflection, and temperatures at various positions in GaAs). Data mining results confirmed a good dispersion of the training data without the feasibility of a dimensionality reduction. Data clustering was observed in relation to the position of the crystallization front relative to the side heaters. Based on the statistical performance criteria and training results, decision trees identified the most decisive inputs and their ranges for a favorable interface shape and to keep GaAs temperature beyond limits for heavy arsenic evaporation. Decision trees are a recommendable machine learning technique with short training times and acceptable predictive accuracy based on small volume of CFD training data, capable of providing guidelines for understanding the crystal growth process, which is a prerequisite for the growth of low-cost, high-quality bulk crystals.

Download Full-text

Capacity Control in Indoor Spaces Using Machine Learning Techniques Together with BLE Technology

Journal of Sensor and Actuator Networks ◽

10.3390/jsan10020035 ◽

2021 ◽

Vol 10 (2) ◽

pp. 35

Author(s):

M. Encarnación Beato Gutiérrez ◽

Montserrat Mateos Sánchez ◽

Roberto Berjón Gallinas ◽

Ana M. Fermoso García

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Prediction Models ◽

Research Work ◽

Machine Learning Techniques ◽

Capacity Control ◽

The People ◽

Learning Techniques ◽

Enclosed Space ◽

Indoor Spaces

At present, capacity control in indoor spaces is critical in the current situation in which we are living in, due to the pandemic. In this work, we propose a new solution using machine learning techniques with BLE technology. This study presents a real experiment in a university environment and we study three different prediction models using machine learning techniques—specifically, logistic regression, decision trees and artificial neural networks. As a conclusion, the study shows that machine learning techniques, in particular decision trees, together with BLE technology, provide a solution to the problem. The contribution of this research work shows that the prediction model obtained is capable of detecting when the COVID capacity of an enclosed space is exceeded. In addition, it ensures that no false negatives are produced, i.e., all the people inside the laboratory will be correctly counted.

Download Full-text

Machine learning for transient recognition in difference imaging with minimum sampling effort

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa3096 ◽

2020 ◽

Vol 499 (4) ◽

pp. 6009-6017

Author(s):

Y-L Mong ◽

K Ackley ◽

D K Galloway ◽

T Killestein ◽

J Lyman ◽

...

Keyword(s):

Machine Learning ◽

Feature Representation ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Sampling Effort ◽

Training Set ◽

The Real ◽

Learning Techniques ◽

The Difference ◽

Difference Imaging

ABSTRACT The amount of observational data produced by time-domain astronomy is exponentially increasing. Human inspection alone is not an effective way to identify genuine transients from the data. An automatic real-bogus classifier is needed and machine learning techniques are commonly used to achieve this goal. Building a training set with a sufficiently large number of verified transients is challenging, due to the requirement of human verification. We present an approach for creating a training set by using all detections in the science images to be the sample of real detections and all detections in the difference images, which are generated by the process of difference imaging to detect transients, to be the samples of bogus detections. This strategy effectively minimizes the labour involved in the data labelling for supervised machine learning methods. We demonstrate the utility of the training set by using it to train several classifiers utilizing as the feature representation the normalized pixel values in 21 × 21 pixel stamps centred at the detection position, observed with the Gravitational-wave Optical Transient Observer (GOTO) prototype. The real-bogus classifier trained with this strategy can provide up to $95{{\ \rm per\ cent}}$ prediction accuracy on the real detections at a false alarm rate of $1{{\ \rm per\ cent}}$.

Download Full-text

Using machine learning techniques and different color spaces for the classification of Cape gooseberry (Physalis peruviana L.) fruits according to ripeness level

10.7287/peerj.preprints.26691 ◽

2019 ◽

Author(s):

Wilson Castro ◽

Jimy Oblitas ◽

Miguel De-la-Torre ◽

Carlos Cotrina ◽

Karen Bazán ◽

...

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Color Space ◽

Machine Learning Techniques ◽

Support Vector ◽

Color Spaces ◽

Learning Techniques ◽

Different Color ◽

Cape Gooseberry

The classification of fresh fruits according to their ripeness is typically a subjective and tedious task; consequently, there is growing interest in the use of non-contact techniques such as those based on computer vision and machine learning. In this paper, we propose the use of non-intrusive techniques for the classification of Cape gooseberry fruits. The proposal is based on the use of machine learning techniques combined with different color spaces. Given the success of techniques such as artificial neural networks,support vector machines, decision trees, and K-nearest neighbors in addressing classification problems, we decided to use these approaches in this research work. A sample of 926 Cape gooseberry fruits was obtained, and fruits were classified manually according to their level of ripeness into seven different classes. Images of each fruit were acquired in the RGB format through a system developed for this purpose. These images were preprocessed, filtered and segmented until the fruits were identified. For each piece of fruit, the median color parameter values in the RGB space were obtained, and these results were subsequently transformed into the HSV and L*a*b* color spaces. The values of each piece of fruit in the three color spaces and their corresponding degrees of ripeness were arranged for use in the creation, testing, and comparison of the developed classification models. The classification of gooseberry fruits by ripening level was found to be sensitive to both the color space used and the classification technique, e.g., the models based on decision trees are the most accurate, and the models based on the L*a*b* color space obtain the best mean accuracy. However, the model that best classifies the cape gooseberry fruits based on ripeness level is that resulting from the combination of the SVM technique and the RGB color space.

Download Full-text

Decision Trees and Random Forests: Machine Learning Techniques to Classify Rare Events

European Policy Analysis ◽

10.18278/epa.2.1.7 ◽

2016 ◽

Vol 2 (1) ◽

Cited By ~ 3

Author(s):

Simon Hegelich

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Random Forests ◽

Rare Events ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Prediction of Mean Wave Overtopping Discharge Using Gradient Boosting Decision Trees

Water ◽

10.3390/w12061703 ◽

2020 ◽

Vol 12 (6) ◽

pp. 1703 ◽

Cited By ~ 3

Author(s):

Joost P. den Bieman ◽

Josefine M. Wilms ◽

Henk F. P. van den Boogaard ◽

Marcel R. A. van Gent

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Numerical Models ◽

Input Parameter ◽

Design Criterion ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Wave Overtopping ◽

Learning Techniques ◽

Machine Learning Model

Wave overtopping is an important design criterion for coastal structures such as dikes, breakwaters and promenades. Hence, the prediction of the expected wave overtopping discharge is an important research topic. Existing prediction tools consist of empirical overtopping formulae, machine learning techniques like neural networks, and numerical models. In this paper, an innovative machine learning method—gradient boosting decision trees—is applied to the prediction of mean wave overtopping discharges. This new machine learning model is trained using the CLASH wave overtopping database. Optimizations to its performance are realized by using feature engineering and hyperparameter tuning. The model is shown to outperform an existing neural network model by reducing the error on the prediction of the CLASH database by a factor of 2.8. The model predictions follow physically realistic trends for variations of important features, and behave regularly in regions of the input parameter space with little or no data coverage.

Download Full-text

Using machine learning techniques and different color spaces for the classification of Cape gooseberry (Physalis peruviana L.) fruits according to ripeness level

10.7287/peerj.preprints.26691v2 ◽

2019 ◽

Cited By ~ 1

Author(s):

Wilson Castro ◽

Jimy Oblitas ◽

Miguel De-la-Torre ◽

Carlos Cotrina ◽

Karen Bazán ◽

...

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Color Space ◽

Machine Learning Techniques ◽

Support Vector ◽

Color Spaces ◽

Learning Techniques ◽

Different Color ◽

Cape Gooseberry

The classification of fresh fruits according to their ripeness is typically a subjective and tedious task; consequently, there is growing interest in the use of non-contact techniques such as those based on computer vision and machine learning. In this paper, we propose the use of non-intrusive techniques for the classification of Cape gooseberry fruits. The proposal is based on the use of machine learning techniques combined with different color spaces. Given the success of techniques such as artificial neural networks,support vector machines, decision trees, and K-nearest neighbors in addressing classification problems, we decided to use these approaches in this research work. A sample of 926 Cape gooseberry fruits was obtained, and fruits were classified manually according to their level of ripeness into seven different classes. Images of each fruit were acquired in the RGB format through a system developed for this purpose. These images were preprocessed, filtered and segmented until the fruits were identified. For each piece of fruit, the median color parameter values in the RGB space were obtained, and these results were subsequently transformed into the HSV and L*a*b* color spaces. The values of each piece of fruit in the three color spaces and their corresponding degrees of ripeness were arranged for use in the creation, testing, and comparison of the developed classification models. The classification of gooseberry fruits by ripening level was found to be sensitive to both the color space used and the classification technique, e.g., the models based on decision trees are the most accurate, and the models based on the L*a*b* color space obtain the best mean accuracy. However, the model that best classifies the cape gooseberry fruits based on ripeness level is that resulting from the combination of the SVM technique and the RGB color space.

Download Full-text

BUILDING CONSISTENCIES FOR PARTIALLY DEFINED CONSTRAINTS WITH DECISION TREES AND NEURAL NETWORKS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213007003503 ◽

2007 ◽

Vol 16 (04) ◽

pp. 683-706 ◽

Cited By ~ 2

Author(s):

ARNAUD LALLOUET ◽

ANDREI LEGTCHENKO

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Artificial Neural Networks ◽

Decision Trees ◽

Optimization Problems ◽

Machine Learning Techniques ◽

Incomplete Knowledge ◽

Learning Techniques ◽

Variable Domains ◽

Constraint Learning

Partially Defined Constraints can be used to model the incomplete knowledge of a concept or a relation. Instead of only computing with the known part of the constraint, we propose to complete its definition by using Machine Learning techniques. Since constraints are actively used during solving for pruning domains, building a classifier for instances is not enough: we need a solver able to reduce variable domains. Our technique is composed of two steps: first we learn a classifier for each constraint projections and then we transform the classifiers into a propagator. The first contribution is a generic meta-technique for classifier improvement showing performances comparable to boosting. The second lies in the ability of using the learned concept in constraint-based decision or optimization problems. We presents results using Decision Trees and Artificial Neural Networks for constraint learning and propagation. It opens a new way of integrating Machine Learning in Decision Support Systems.

Download Full-text