Dealing with Missing Values in a Probabilistic Decision Tree during Classification

A Contemporary Machine Learning Method for Accurate Prediction of Cervical Cancer

SHS Web of Conferences ◽

10.1051/shsconf/202110204004 ◽

2021 ◽

Vol 102 ◽

pp. 04004

Author(s):

Jesse Jeremiah Tanimu ◽

Mohamed Hamada ◽

Mohammed Hassan ◽

Saratu Yusuf Ilu

Keyword(s):

Machine Learning ◽

Cervical Cancer ◽

Feature Selection ◽

Decision Tree ◽

Sensitivity And Specificity ◽

Missing Values ◽

New Technologies ◽

Machine Learning Techniques ◽

Screening Tests ◽

Tree Classifier

With the advent of new technologies in the medical field, huge amounts of cancerous data have been collected and are readily accessible to the medical research community. Over the years, researchers have employed advanced data mining and machine learning techniques to develop better models that can analyze datasets to extract the conceived patterns, ideas, and hidden knowledge. The mined information can be used as a support in decision making for diagnostic processes. These techniques, while being able to predict future outcomes of certain diseases effectively, can discover and identify patterns and relationships between them from complex datasets. In this research, a predictive model for predicting the outcome of patients’ cervical cancer results has been developed, given risk patterns from individual medical records and preliminary screening tests. This work presents a Decision tree (DT) classification algorithm and shows the advantage of feature selection approaches in the prediction of cervical cancer using recursive feature elimination technique for dimensionality reduction for improving the accuracy, sensitivity, and specificity of the model. The dataset employed here suffers from missing values and is highly imbalanced. Therefore, a combination of under and oversampling techniques called SMOTETomek was employed. A comparative analysis of the proposed model has been performed to show the effectiveness of feature selection and class imbalance based on the classifier’s accuracy, sensitivity, and specificity. The DT with the selected features and SMOTETomek has better results with an accuracy of 98%, sensitivity of 100%, and specificity of 97%. Decision Tree classifier is shown to have excellent performance in handling classification assignment when the features are reduced, and the problem of imbalance class is addressed.

Download Full-text

The Complexity of a Probabilistic Approach to Deal with Missing Values in a Decision Tree

2006 Eighth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing ◽

10.1109/synasc.2006.70 ◽

2006 ◽

Cited By ~ 2

Author(s):

Lamis Hawarah ◽

Ana Simonet ◽

Michel Simonet

Keyword(s):

Decision Tree ◽

Missing Values ◽

Probabilistic Approach

Download Full-text

Dealing with Missing Values in a Probabilistic Decision Tree during Classification

Studies in Computational Intelligence - Mining Complex Data ◽

10.1007/978-3-540-88067-7_4 ◽

2009 ◽

pp. 55-74 ◽

Cited By ~ 2

Author(s):

Lamis Hawarah ◽

Ana Simonet ◽

Michel Simonet

Keyword(s):

Decision Tree ◽

Missing Values

Download Full-text

Comparison of robustness against missing values of alternative decision tree and multiple logistic regression for predicting clinical data in primary breast cancer

2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) ◽

10.1109/embc.2013.6610185 ◽

2013 ◽

Cited By ~ 3

Author(s):

Masahiro Sugimoto ◽

Masahiro Takada ◽

Masakazu Toi

Keyword(s):

Breast Cancer ◽

Logistic Regression ◽

Decision Tree ◽

Clinical Data ◽

Primary Breast Cancer ◽

Missing Values ◽

Multiple Logistic Regression ◽

Alternative Decision

Download Full-text

FINANCIAL FORECASTING USING DECISION TREE (REPTree & C4.5) AND NEURAL NETWORKS (K*) FOR HANDLING THE MISSING VALUES

ICTACT Journal on Soft Computing ◽

10.21917/ijsc.2017.0204 ◽

2017 ◽

Vol 7 (3) ◽

pp. 1473-1477

Author(s):

J Jayanthi ◽

◽

Gurpreet Kaur ◽

K Suresh Joseph ◽

◽

...

Keyword(s):

Neural Networks ◽

Decision Tree ◽

Missing Values ◽

Financial Forecasting

Download Full-text

Development of Web Tools to Predict Axillary lymph Node Metastasis and Pathological Response to Neoadjuvant Chemotherapy in Breast Cancer Patients

The International Journal of Biological Markers ◽

10.5301/jbm.5000103 ◽

2014 ◽

Vol 29 (4) ◽

pp. 372-379 ◽

Cited By ~ 2

Author(s):

Masahiro Sugimoto ◽

Masahiro Takada ◽

Masakazu Toi

Keyword(s):

Breast Cancer ◽

Lymph Node ◽

Neoadjuvant Chemotherapy ◽

Decision Tree ◽

Cancer Patients ◽

Axillary Lymph Node ◽

Missing Values ◽

Axillary Lymph ◽

Breast Cancer Patients ◽

Response To Neoadjuvant Chemotherapy

Nomograms are a standard computational tool to predict the likelihood of an outcome using multiple available patient features. We have developed a more powerful data mining methodology, to predict axillary lymph node (AxLN) metastasis and response to neoadjuvant chemotherapy (NAC) in primary breast cancer patients. We developed websites to use these tools. The tools calculate the probability of AxLN metastasis (AxLN model) and pathological complete response to NAC (NAC model). As a calculation algorithm, we employed a decision tree–based prediction model known as the alternative decision tree (ADTree), which is an analog development of if-then type decision trees. An ensemble technique was used to combine multiple ADTree predictions, resulting in higher generalization abilities and robustness against missing values. The AxLN model was developed with training datasets (n=148) and test datasets (n=143), and validated using an independent cohort (n=174), yielding an area under the receiver operating characteristic curve (AUC) of 0.768. The NAC model was developed and validated with n=150 and n=173 datasets from a randomized controlled trial, yielding an AUC of 0.787. AxLN and NAC models require users to input up to 17 and 16 variables, respectively. These include pathological features, including human epidermal growth factor receptor 2 (HER2) status and imaging findings. Each input variable has an option of “unknown,” to facilitate prediction for cases with missing values. The websites developed facilitate the use of these tools, and serve as a database for accumulating new datasets.

Download Full-text