Applied Identification of Industry Data Science Using an Advanced Multi-Componential Discretization Model

You-Shyang Chen; Arun Kumar Sangaiah; Su-Fen Chen; Hsiu-Chen Huang

doi:10.3390/sym12101620

Applied Identification of Industry Data Science Using an Advanced Multi-Componential Discretization Model

Symmetry ◽

10.3390/sym12101620 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1620

Author(s):

You-Shyang Chen ◽

Arun Kumar Sangaiah ◽

Su-Fen Chen ◽

Hsiu-Chen Huang

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Cross Validation ◽

Time Lag ◽

Industry Data ◽

Decision Tree Learning ◽

Managerial Implication ◽

Split Method ◽

Model C ◽

Data Discretization

Applied human large-scale data are collected from heterogeneous science or industry databases for the purposes of achieving data utilization in complex application environments, such as in financial applications. This has posed great opportunities and challenges to all kinds of scientific data researchers. Thus, finding an intelligent hybrid model that solves financial application problems of the stock market is an important issue for financial analysts. In practice, classification applications that focus on the earnings per share (EPS) with financial ratios from an industry database often demonstrate that the data meet the abovementioned standards and have particularly high application value. This study proposes several advanced multicomponential discretization models, named Models A–E, where each model identifies and presents a positive/negative diagnosis based on the experiences of the latest financial statements from six different industries. The varied components of the model test performance measurements comparatively by using data-preprocessing, data-discretization, feature-selection, two data split methods, machine learning, rule-based decision tree knowledge, time-lag effects, different times of running experiments, and two different class types. The experimental dataset had 24 condition features and a decision feature EPS that was used to classify the data into two and three classes for comparison. Empirically, the analytical results of this study showed that three main determinants were identified: total asset growth rate, operating income per share, and times interest earned. The core components of the following techniques are as follows: data-discretization and feature-selection, with some noted classifiers that had significantly better accuracy. Total solution results demonstrated the following key points: (1) The highest accuracy, 92.46%, occurred in Model C from the use of decision tree learning with a percentage-split method for two classes in one run; (2) the highest accuracy mean, 91.44%, occurred in Models D and E from the use of naïve Bayes learning for cross-validation and percentage-split methods for each class for 10 runs; (3) the highest average accuracy mean, 87.53%, occurred in Models D and E with a cross-validation method for each class; (4) the highest accuracy, 92.46%, occurred in Model C from the use of decision tree learning-C4.5 with the percentage-split method and no time-lag for each class. This study concludes that its contribution is regarded as managerial implication and technical direction for practical finance in which a multicomponential discretization model has limited use and is rarely seen as applied by scientific industry data due to various restrictions.

Download Full-text

Predictor Selection for Bacterial Vaginosis Diagnosis Using Decision Tree and Relief Algorithms

Applied Sciences ◽

10.3390/app10093291 ◽

2020 ◽

Vol 10 (9) ◽

pp. 3291

Author(s):

Jesús F. Pérez-Gómez ◽

Juana Canul-Reich ◽

José Hernández-Torruco ◽

Betania Hernández-Ocaña

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Bacterial Vaginosis ◽

Cross Validation ◽

Performance Comparison ◽

Support Vector ◽

Ongoing Research ◽

Selection For ◽

Comparison Of The Results ◽

Fold Cross Validation

Requiring only a few relevant characteristics from patients when diagnosing bacterial vaginosis is highly useful for physicians as it makes it less time consuming to collect these data. This would result in having a dataset of patients that can be more accurately diagnosed using only a subset of informative or relevant features in contrast to using the entire set of features. As such, this is a feature selection (FS) problem. In this work, decision tree and Relief algorithms were used as feature selectors. Experiments were conducted on a real dataset for bacterial vaginosis with 396 instances and 252 features/attributes. The dataset was obtained from universities located in Baltimore and Atlanta. The FS algorithms utilized feature rankings, from which the top fifteen features formed a new dataset that was used as input for both support vector machine (SVM) and logistic regression (LR) algorithms for classification. For performance evaluation, averages of 30 runs of 10-fold cross-validation were reported, along with balanced accuracy, sensitivity, and specificity as performance measures. A performance comparison of the results was made between using the total number of features against using the top fifteen. These results found similar attributes from our rankings compared to those reported in the literature. This study is part of ongoing research that is investigating a range of feature selection and classification methods.

Download Full-text

Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning

Journal of Systems and Software ◽

10.1016/j.jss.2010.01.002 ◽

2010 ◽

Vol 83 (7) ◽

pp. 1137-1147 ◽

Cited By ~ 18

Author(s):

Tao Wang ◽

Zhenxing Qin ◽

Zhi Jin ◽

Shichao Zhang

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Decision Tree Learning ◽

Test Cost ◽

Fitting In

Download Full-text

Modified Mutual Information-based Feature Selection for Intrusion Detection Systems in Decision Tree Learning

Journal of Computers ◽

10.4304/jcp.9.7.1542-1546 ◽

2014 ◽

Vol 9 (7) ◽

Cited By ~ 2

Author(s):

Jingping Song ◽

Zhiliang Zhu ◽

Peter Scully ◽

Chris Price

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Intrusion Detection ◽

Decision Tree ◽

Intrusion Detection Systems ◽

Decision Tree Learning ◽

Detection Systems ◽

Selection For

Download Full-text

PREDIKSI KUALITAS AIR SUNGAI CILIWUNG DENGAN MENGGUNAKAN ALGORITMA POHON KEPUTUSAN

Jurnal Air Indonesia ◽

10.29122/jai.v12i2.4364 ◽

2021 ◽

Vol 12 (2) ◽

Author(s):

Mohammad Haekal ◽

Henki Bayu Seta ◽

Mayanda Mega Santoni

Keyword(s):

Data Mining ◽

Decision Tree ◽

Cross Validation ◽

Online Monitoring ◽

Training Set ◽

Microsoft Excel ◽

Test Set

Untuk memprediksi kualitas air sungai Ciliwung, telah dilakukan pengolahan data-data hasil pemantauan secara Online Monitoring dengan menggunakan Metode Data Mining. Pada metode ini, pertama-tama data-data hasil pemantauan dibuat dalam bentuk tabel Microsoft Excel, kemudian diolah menjadi bentuk Pohon Keputusan yang disebut Algoritma Pohon Keputusan (Decision Tree) mengunakan aplikasi WEKA. Metode Pohon Keputusan dipilih karena lebih sederhana, mudah dipahami dan mempunyai tingkat akurasi yang sangat tinggi. Jumlah data hasil pemantauan kualitas air sungai Ciliwung yang diolah sebanyak 5.476 data. Hasil klarifikasi dengan Pohon Keputusan, dari 5.476 data ini diperoleh jumlah data yang mengindikasikan sungai Ciliwung Tidak Tercemar sebanyak 1.059 data atau sebesar 19,3242%, dan yang mengindikasikan Tercemar sebanyak 4.417 data atau 80,6758%. Selanjutnya data-data hasil pemantauan ini dievaluasi menggunakan 4 Opsi Tes (Test Option) yaitu dengan Use Training Set, Supplied Test Set, Cross-Validation folds 10, dan Percentage Split 66%. Hasil evaluasi dengan 4 opsi tes yang digunakan ini, semuanya menunjukkan tingkat akurasi yang sangat tinggi, yaitu diatas 99%. Dari data-data hasil peneltian ini dapat diprediksi bahwa sungai Ciliwung terindikasi sebagai sungai tercemar bila mereferensi kepada Peraturan Pemerintah Republik Indonesia nomor 82 tahun 2001 dan diketahui pula bahwa penggunaan aplikasi WEKA dengan Algoritma Pohon Keputusan untuk mengolah data-data hasil pemantauan dengan mengambil tiga parameter (pH, DO dan Nitrat) adalah sangat akuran dan tepat. Kata Kunci : Kualitas air sungai, Data Mining, Algoritma Pohon Keputusan, Aplikasi WEKA.

Download Full-text

DIFFERENTIAL EVOLUTION IN THE DECISION TREE LEARNING ALGORITHM

Siberian Journal of Science and Technology ◽

10.31772/2587-6066-2019-20-3-312-319 ◽

2019 ◽

Vol 20 (3) ◽

pp. 312-319

Author(s):

S. A. Mitrofanov ◽

◽

E. S. Semenkin ◽

Keyword(s):

Decision Tree ◽

Differential Evolution ◽

Learning Algorithm ◽

Decision Tree Learning

Download Full-text

Applying design patterns to decision tree learning system

ACM SIGSOFT Software Engineering Notes ◽

10.1145/291252.288279 ◽

1998 ◽

Vol 23 (6) ◽

pp. 111-120 ◽

Cited By ~ 1

Author(s):

Gou Masuda ◽

Norihiro Sakamoto ◽

Kazuo Ushijima

Keyword(s):

Decision Tree ◽

Design Patterns ◽

Learning System ◽

Decision Tree Learning

Download Full-text

Diabetes disease prediction using decision tree for feature selection

Journal of Physics Conference Series ◽

10.1088/1742-6596/1964/6/062116 ◽

2021 ◽

Vol 1964 (6) ◽

pp. 062116

Author(s):

Jayakumar Sadhasivam ◽

V Muthukumaran ◽

J Thimmia Raja ◽

Rose Bindu Joseph ◽

Meram Munirathanam ◽

...

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Disease Prediction

Download Full-text

Recognition of Explosive Precursors Using Nanowire Sensor Array and Decision Tree Learning

IEEE Sensors Journal ◽

10.1109/jsen.2011.2182042 ◽

2012 ◽

Vol 12 (7) ◽

pp. 2384-2391 ◽

Cited By ~ 7

Author(s):

Junghwan Cho ◽

Xiaopeng Li ◽

Zhiyong Gu ◽

Pradeep U. Kurup

Keyword(s):

Decision Tree ◽

Sensor Array ◽

Decision Tree Learning ◽

Nanowire Sensor

Download Full-text

Towards an Approach for Fuel Poverty Detection from Gas Smart Meter Data using Decision Tree Learning

Proceedings of the 2020 3rd International Conference on Information Management and Management Science ◽

10.1145/3416028.3416034 ◽

2020 ◽

Author(s):

William Hurst ◽

Casimiro A. Curbelo Montanez ◽

Nathan Shone

Keyword(s):

Decision Tree ◽

Smart Meter ◽

Decision Tree Learning

Download Full-text

Process Mining Approach Based on Partial Structures of Event Logs and Decision Tree Learning

2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI) ◽

10.1109/iiai-aai.2016.174 ◽

2016 ◽

Cited By ~ 1

Author(s):

Hiroki Horita ◽

Hideaki Hirayama ◽

Takeo Hayase ◽

Yasuyuki Tahara ◽

Akihiko Ohsuga

Keyword(s):

Decision Tree ◽

Process Mining ◽

Decision Tree Learning ◽

Event Logs ◽

Partial Structures

Download Full-text