Breast Tumor Classification Using an Ensemble Machine Learning Method

Adel S. Assiri; Saima Nazir; Sergio A. Velastin

doi:10.3390/jimaging6060039

Breast Tumor Classification Using an Ensemble Machine Learning Method

Journal of Imaging ◽

10.3390/jimaging6060039 ◽

2020 ◽

Vol 6 (6) ◽

pp. 39 ◽

Cited By ~ 1

Author(s):

Adel S. Assiri ◽

Saima Nazir ◽

Sergio A. Velastin

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

State Of The Art ◽

Majority Voting ◽

Ensemble Classification ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Breast Cancer Dataset ◽

Machine Learning Classification ◽

Voting Mechanism

Breast cancer is the most common cause of death for women worldwide. Thus, the ability of artificial intelligence systems to detect possible breast cancer is very important. In this paper, an ensemble classification mechanism is proposed based on a majority voting mechanism. First, the performance of different state-of-the-art machine learning classification algorithms were evaluated for the Wisconsin Breast Cancer Dataset (WBCD). The three best classifiers were then selected based on their F3 score. F3 score is used to emphasize the importance of false negatives (recall) in breast cancer classification. Then, these three classifiers, simple logistic regression learning, support vector machine learning with stochastic gradient descent optimization and multilayer perceptron network, are used for ensemble classification using a voting mechanism. We also evaluated the performance of hard and soft voting mechanism. For hard voting, majority-based voting mechanism was used and for soft voting we used average of probabilities, product of probabilities, maximum of probabilities and minimum of probabilities-based voting methods. The hard voting (majority-based voting) mechanism shows better performance with 99.42%, as compared to the state-of-the-art algorithm for WBCD.

Download Full-text

Classifications of Breast Cancer Diagnosis using Machine Learning

International Journal of Computers ◽

10.46300/9108.2020.14.13 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Random Forest ◽

Breast Cancer Diagnosis ◽

Performance Comparison ◽

Support Vector ◽

Breast Cancer Dataset ◽

K Nearest Neighbors ◽

Cancer Dataset ◽

Machine Learning Classification

Breast Cancer (BC) is amongst the most common and leading causes of deaths in women throughout the world. Recently, classification and data analysis tools are being widely used in the medical field for diagnosis, prognosis and decision making to help lower down the risks of people dying or suffering from diseases. Advanced machine learning methods have proven to give hope for patients as this has helped the doctors in early detection of diseases like Breast Cancer that can be fatal, in support with providing accurate outcomes. However, the results highly depend on the techniques used for feature selection and classification which will produce a strong machine learning model. In this paper, a performance comparison is conducted using four classifiers which are Multilayer Perceptron (MLP), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Random Forest on the Wisconsin Breast Cancer dataset to spot the most effective predictors. The main goal is to apply best machine learning classification methods to predict the Breast Cancer as benign or malignant using terms such as accuracy, f-measure, precision and recall. Experimental results show that Random forest is proven to achieve the highest accuracy of 99.26% on this dataset and features, while SVM and KNN show 97.78% and 97.04% accuracy respectively. MLP shows the least accuracy of 94.07%. All the experiments are conducted using RStudio as the data mining tool platform.

Download Full-text

Breast Cancer Prediction Using Machine Learning

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206457 ◽

2020 ◽

pp. 278-284

Author(s):

Gaurav Singh

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Support Vector ◽

Breast Cancer Dataset ◽

K Nearest Neighbor ◽

Cancer Dataset ◽

Implementation Phase ◽

Machine Learning Classification

Breast cancer may be a prevalent explanation for death, and it's the sole sort of cancer that's widespread among women worldwide. The prime objective of this paper creates the model for predicting breast cancer using various machine learning classification algorithms like k Nearest Neighbor (kNN), Support Vector Machine (SVM), Logistic Regression (LR), and Gaussian Naive Bayes (NB). And furthermore, assess and compare the performance of the varied classifiers as far as accuracy, precision, recall, f1-Score, and Jaccard index. The breast cancer dataset is publicly available on the UCI Machine Learning Repository and therefore the implementation phase dataset is going to be partitioned as 80% for the training phase and 20% for the testing phase then apply the machine learning algorithms. k Nearest Neighbors achieved a significant performance in respect of all parameters.

Download Full-text

A Novel Stacked Ensemble for Hate Speech Recognition

Applied Sciences ◽

10.3390/app112411684 ◽

2021 ◽

Vol 11 (24) ◽

pp. 11684

Author(s):

Mona Khalifa A. Aljero ◽

Nazife Dimililer

Keyword(s):

Machine Learning ◽

Hate Speech ◽

State Of The Art ◽

Majority Voting ◽

Support Vector ◽

Timely Manner ◽

Machine Learning Classifiers ◽

Significant Challenge ◽

Final Output ◽

Harmful Content

Detecting harmful content or hate speech on social media is a significant challenge due to the high throughput and large volume of content production on these platforms. Identifying hate speech in a timely manner is crucial in preventing its dissemination. We propose a novel stacked ensemble approach for detecting hate speech in English tweets. The proposed architecture employs an ensemble of three classifiers, namely support vector machine (SVM), logistic regression (LR), and XGBoost classifier (XGB), trained using word2vec and universal encoding features. The meta classifier, LR, combines the outputs of the three base classifiers and the features employed by the base classifiers to produce the final output. It is shown that the proposed architecture improves the performance of the widely used single classifiers as well as the standard stacking and classifier ensemble using majority voting. We also present results on the use of various combinations of machine learning classifiers as base classifiers. The experimental results from the proposed architecture indicated an improvement in the performance on all four datasets compared with the standard stacking, base classifiers, and majority voting. Furthermore, on three of these datasets, the proposed architecture outperformed all state-of-the-art systems.

Download Full-text

Sentiment classification for employees reviews using regression vector- stochastic gradient descent classifier (RV-SGDC)

PeerJ Computer Science ◽

10.7717/peerj-cs.712 ◽

2021 ◽

Vol 7 ◽

pp. e712

Author(s):

Babacar Gaye ◽

Dezheng Zhang ◽

Aziguli Wulamu

Keyword(s):

Machine Learning ◽

Gradient Descent ◽

Stochastic Gradient ◽

Sentiment Classification ◽

Majority Voting ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Hybrid Architecture ◽

Accuracy Score ◽

Learning Models

The satisfaction of employees is very important for any organization to make sufficient progress in production and to achieve its goals. Organizations try to keep their employees satisfied by making their policies according to employees’ demands which help to create a good environment for the collective. For this reason, it is beneficial for organizations to perform staff satisfaction surveys to be analyzed, allowing them to gauge the levels of satisfaction among employees. Sentiment analysis is an approach that can assist in this regard as it categorizes sentiments of reviews into positive and negative results. In this study, we perform experiments for the world’s big six companies and classify their employees’ reviews based on their sentiments. For this, we proposed an approach using lexicon-based and machine learning based techniques. Firstly, we extracted the sentiments of employees from text reviews and labeled the dataset as positive and negative using TextBlob. Then we proposed a hybrid/voting model named Regression Vector-Stochastic Gradient Descent Classifier (RV-SGDC) for sentiment classification. RV-SGDC is a combination of logistic regression, support vector machines, and stochastic gradient descent. We combined these models under a majority voting criteria. We also used other machine learning models in the performance comparison of RV-SGDC. Further, three feature extraction techniques: term frequency-inverse document frequency (TF-IDF), bag of words, and global vectors are used to train learning models. We evaluated the performance of all models in terms of accuracy, precision, recall, and F1 score. The results revealed that RV-SGDC outperforms with a 0.97 accuracy score using the TF-IDF feature due to its hybrid architecture.

Download Full-text

Diagnosis of breast cancer using machine learning algorithms based on features selected by Genetic Algorithm: Assessed on five datasets

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/11963 ◽

2021 ◽

Vol 23 (11) ◽

pp. 749-758

Author(s):

Saranya N ◽

◽

Kavi Priya S ◽

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Genetic Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Cancer Prognosis ◽

Support Vector ◽

Breast Cancer Dataset ◽

Human Beings ◽

Original Dataset

Breast Cancer is one of the chronic diseases occurred to human beings throughout the world. Early detection of this disease is the most promising way to improve patients’ chances of survival. The strategy employed in this paper is to select the best features from various breast cancer datasets using a genetic algorithm and machine learning algorithm is applied to predict the outcomes. Two machine learning algorithms such as Support Vector Machines and Decision Tree are used along with Genetic Algorithm. The proposed work is experimented on five datasets such as Wisconsin Breast Cancer-Diagnosis Dataset, Wisconsin Breast Cancer-Original Dataset, Wisconsin Breast Cancer-Prognosis Dataset, ISPY1 Clinical trial Dataset, and Breast Cancer Dataset. The results exploit that SVM-GA achieves higher accuracy of 98.16% than DT-GA of 97.44%.

Download Full-text

Prediction of Breast Cancer Using Machine Learning

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190617160834 ◽

2020 ◽

Vol 13 (5) ◽

pp. 901-908

Author(s):

Somil Jain ◽

Puneet Kumar

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Prediction Accuracy ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Classification Algorithms ◽

Breast Cancer Dataset

Background:: Breast cancer is one of the diseases which cause number of deaths ever year across the globe, early detection and diagnosis of such type of disease is a challenging task in order to reduce the number of deaths. Now a days various techniques of machine learning and data mining are used for medical diagnosis which has proven there metal by which prediction can be done for the chronic diseases like cancer which can save the life’s of the patients suffering from such type of disease. The major concern of this study is to find the prediction accuracy of the classification algorithms like Support Vector Machine, J48, Naïve Bayes and Random Forest and to suggest the best algorithm. Objective:: The objective of this study is to assess the prediction accuracy of the classification algorithms in terms of efficiency and effectiveness. Methods: This paper provides a detailed analysis of the classification algorithms like Support Vector Machine, J48, Naïve Bayes and Random Forest in terms of their prediction accuracy by applying 10 fold cross validation technique on the Wisconsin Diagnostic Breast Cancer dataset using WEKA open source tool. Results:: The result of this study states that Support Vector Machine has achieved the highest prediction accuracy of 97.89 % with low error rate of 0.14%. Conclusion:: This paper provides a clear view over the performance of the classification algorithms in terms of their predicting ability which provides a helping hand to the medical practitioners to diagnose the chronic disease like breast cancer effectively.

Download Full-text

Data Mining and Principal Component Analysis on Coimbra Breast Cancer Dataset

Proceedings of Intelligent Computing and Technologies Conference ◽

10.21467/proceedings.115.5 ◽

2021 ◽

Author(s):

Anupam Sen

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Breast Cancer Dataset ◽

Analysis Tool ◽

Machine Learning Classification

Machine Learning (ML) techniques play an important role in the medical field. Early diagnosis is required to improve the treatment of carcinoma. During this analysis Breast Cancer Coimbra dataset (BCCD) with ten predictors are analyzed to classify carcinoma. In this paper method for feature selection and Machine learning algorithms are applied to the dataset from the UCI repository. WEKA (“Waikato Environment for Knowledge Analysis”) tool is used for machine learning techniques. In this paper Principal Component Analysis (PCA) is used for feature extraction. Different Machine Learning classification algorithms are applied through WEKA such as Glmnet, Gbm, ada Boosting, Adabag Boosting, C50, Cforest, DcSVM, fnn, Ksvm, Node Harvest compares the accuracy and also compare values such as Kappa statistic, Mean Absolute Error (MAE), Root Mean Square Error (RMSE). Here the 10-fold cross validation method is used for training, testing and validation purposes.

Download Full-text

Research of Machine Learning algorithms using K-fold cross validation

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1043.0886s19 ◽

2019 ◽

Vol 8 (6S) ◽

pp. 215-218

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Research Area ◽

Machine Learning Algorithms ◽

Support Vector ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Validation Data ◽

Machine Learning Classification ◽

Fold Cross Validation

In machine learning, Classification is one of the most important research area. Classification allocates the given input to a known category. In this paper different machine algorithms like Logistic regression (LR), Decision tree (DT), Support vector machine (SVM), K nearest neighbors (KNN) were implemented on UCI breast cancer dataset with preprocessing. The models were trained and tested with k-fold cross validation data. Accuracy and run time execution of each classifier are implemented in python.

Download Full-text

Ensemble-Based Classification Using Neural Networks and Machine Learning Models for Windows PE Malware Detection

Electronics ◽

10.3390/electronics10040485 ◽

2021 ◽

Vol 10 (4) ◽

pp. 485 ◽

Cited By ~ 1

Author(s):

Robertas Damaševičius ◽

Algimantas Venčkauskas ◽

Jevgenijus Toldinas ◽

Šarūnas Grigaliūnas

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Information Systems ◽

Malware Detection ◽

Ensemble Classification ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Neural Net ◽

Linear Discriminant ◽

Stage Classification

The security of information is among the greatest challenges facing organizations and institutions. Cybercrime has risen in frequency and magnitude in recent years, with new ways to steal, change and destroy information or disable information systems appearing every day. Among the types of penetration into the information systems where confidential information is processed is malware. An attacker injects malware into a computer system, after which he has full or partial access to critical information in the information system. This paper proposes an ensemble classification-based methodology for malware detection. The first-stage classification is performed by a stacked ensemble of dense (fully connected) and convolutional neural networks (CNN), while the final stage classification is performed by a meta-learner. For a meta-learner, we explore and compare 14 classifiers. For a baseline comparison, 13 machine learning methods are used: K-Nearest Neighbors, Linear Support Vector Machine (SVM), Radial basis function (RBF) SVM, Random Forest, AdaBoost, Decision Tree, ExtraTrees, Linear Discriminant Analysis, Logistic, Neural Net, Passive Classifier, Ridge Classifier and Stochastic Gradient Descent classifier. We present the results of experiments performed on the Classification of Malware with PE headers (ClaMP) dataset. The best performance is achieved by an ensemble of five dense and CNN neural networks, and the ExtraTrees classifier as a meta-learner.

Download Full-text

THE EFFECT OF DATASETS ON BREAST CANCER DETECTION MODELS

FUDMA Journal of Sciences ◽

10.33003/fjs-2020-0404-487 ◽

2021 ◽

Vol 4 (4) ◽

pp. 309-315

Author(s):

Kumawuese Jennifer Kurugh ◽

Muhammad Aminu Ahmad ◽

Awwal Ahmad Babajo

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Learning Algorithms ◽

Cancer Classification ◽

Machine Learning Algorithms ◽

Support Vector ◽

Breast Cancer Dataset ◽

Machine Learning Classifiers ◽

Breast Cancer Classification ◽

Learning Classifiers

Datasets are a major requirement in the development of breast cancer classification/detection models using machine learning algorithms. These models can provide an effective, accurate and less expensive diagnosis method and reduce life losses. However, using the same machine learning algorithms on different datasets yields different results. This research developed several machine learning models for breast cancer classification/detection using Random forest, support vector machine, K Nearest Neighbors, Gaussian Naïve Bayes, Perceptron and Logistic regression. Three widely used test data sets were used; Wisconsin Breast Cancer (WBC) Original, Wisconsin Diagnostic Breast Cancer (WDBC) and Wisconsin Prognostic Breast Cancer (WPBC). The results show that datasets affect the performance of machine learning classifiers. Also, the machine learning classifiers have different performances with a given breast cancer dataset

Download Full-text