scholarly journals Comparative Study on Heart Disease Prediction Using Feature Selection Techniques on Classification Algorithms

2021 ◽  
Vol 2021 ◽  
pp. 1-17
Author(s):  
Kaushalya Dissanayake ◽  
Md Gapar Md Johar

Heart disease is recognized as one of the leading factors of death rate worldwide. Biomedical instruments and various systems in hospitals have massive quantities of clinical data. Therefore, understanding the data related to heart disease is very important to improve prediction accuracy. This article has conducted an experimental evaluation of the performance of models created using classification algorithms and relevant features selected using various feature selection approaches. For results of the exploratory analysis, ten feature selection techniques, i.e., ANOVA, Chi-square, mutual information, ReliefF, forward feature selection, backward feature selection, exhaustive feature selection, recursive feature elimination, Lasso regression, and Ridge regression, and six classification approaches, i.e., decision tree, random forest, support vector machine, K-nearest neighbor, logistic regression, and Gaussian naive Bayes, have been applied to Cleveland heart disease dataset. The feature subset selected by the backward feature selection technique has achieved the highest classification accuracy of 88.52%, precision of 91.30%, sensitivity of 80.76%, and f-measure of 85.71% with the decision tree classifier.

2018 ◽  
Vol 7 (3.6) ◽  
pp. 154
Author(s):  
S K. Sajan ◽  
M Germanus Alex

Breast cancer is a major threat humans are facing irrespective of geographical limits. The awareness about breast cancer has increased during the last decade and many preventive measures were in practice to detect the breast cancer before the symptoms were felt. Mammography is a screening methodology currently in practice. In this paper the mammogram image is analyzed using automated system. The automated system is designed to be capable of distinguishing the mammogram image into a normal or malignant. This process involves image enhancement and image segmentation at preprocessing level. Histogram equalization technique is used to transform low contrast region of the mammogram into region with higher contrast and Fuzzy C Means (FCM) algorithm is used to segment the mammogram image into regions suitable for further analysis. After enhancement and segmentation at preprocessing level the classification is done using three classification algorithms like decision tree classifier, Neural Network classifier and Support Vector Machine (SVM). The performance of the classification algorithms is evaluated using the following criteria like speed, flexibility, robustness, scalability, interpretability, Time complexity and also based on accuracy, sensitivity and specificity. The results obtained in classification are compared with other classification algorithms. It is found that the neural network classifier approach produces better results compared to other classifiers.The average accuracy in diagnosis by Neural Network approach classifier is around 91%.  Also it is found that the decision tree approach is much flexible and easy to use compared to other approaches.  


2020 ◽  
Vol 10 (22) ◽  
pp. 8137
Author(s):  
Sushruta Mishra ◽  
Pradeep Kumar Mallick ◽  
Hrudaya Kumar Tripathy ◽  
Akash Kumar Bhoi ◽  
Alfonso González-Briones

There is a consistent rise in chronic diseases worldwide. These diseases decrease immunity and the quality of daily life. The treatment of these disorders is a challenging task for medical professionals. Dimensionality reduction techniques make it possible to handle big data samples, providing decision support in relation to chronic diseases. These datasets contain a series of symptoms that are used in disease prediction. The presence of redundant and irrelevant symptoms in the datasets should be identified and removed using feature selection techniques to improve classification accuracy. Therefore, the main contribution of this paper is a comparative analysis of the impact of wrapper and filter selection methods on classification performance. The filter methods that have been considered include the Correlation Feature Selection (CFS) method, the Information Gain (IG) method and the Chi-Square (CS) method. The wrapper methods that have been considered include the Best First Search (BFS) method, the Linear Forward Selection (LFS) method and the Greedy Step Wise Search (GSS) method. A Decision Tree algorithm has been used as a classifier for this analysis and is implemented through the WEKA tool. An attribute significance analysis has been performed on the diabetes, breast cancer and heart disease datasets used in the study. It was observed that the CFS method outperformed other filter methods concerning the accuracy rate and execution time. The accuracy rate using the CFS method on the datasets for heart disease, diabetes, breast cancer was 93.8%, 89.5% and 96.8% respectively. Moreover, latency delays of 1.08 s, 1.02 s and 1.01 s were noted using the same method for the respective datasets. Among wrapper methods, BFS’ performance was impressive in comparison to other methods. Maximum accuracy of 94.7%, 95.8% and 96.8% were achieved on the datasets for heart disease, diabetes and breast cancer respectively. Latency delays of 1.42 s, 1.44 s and 132 s were recorded using the same method for the respective datasets. On the basis of the obtained result, a new hybrid Attribute Evaluator method has been proposed which effectively integrates enhanced K-Means clustering with the CFS filter method and the BFS wrapper method. Furthermore, the hybrid method was evaluated with an improved decision tree classifier. The improved decision tree classifier combined clustering with classification. It was validated on 14 different chronic disease datasets and its performance was recorded. A very optimal and consistent classification performance was observed. The mean values for accuracy, specificity, sensitivity and f-score metrics were 96.7%, 96.5%, 95.6% and 96.2% respectively.


In the growing era of technological world, the people are suffered with various diseases. The common disease faced by the population irrespective of the age is the heart disease. Though the world is blooming in technological aspects, the prediction and the identification of the heart disease still remains a challenging issue. Due to the deficiency of the availability of patient symptoms, the prediction of heart disease is a disputed charge. With this overview, we have used Heart Disease Prediction dataset extorted from UCI Machine Learning Repository for the analysis and comparison of various parameters in the classification algorithms. The parameter analysis of various classification algorithms of heart disease classes are done in five ways. Firstly, the analysis of dataset is done by exploiting the correlation matrix, feature importance analysis, Target distribution of the dataset and Disease probability based on the density distribution of age and sex. Secondly, the dataset is fitted to K-Nearest Neighbor classifier to analyze the performance for the various combinations of neighbors with and without PCA. Thirdly, the dataset is fitted to Support Vector classifier to analyze the performance for the various combinations of kernels with and without PCA. Fourth, the dataset is fitted to Decision Tree classifier to analyze the performance for the various combinations of features with and without PCA. Fifth, the dataset is fitted to Random Forest classifier to analyze the performance for the various levels of estimators with and without PCA. The implementation is done using python language under Spyder platform with Anaconda Navigator. Experimental results shows that for KNN classifier, the performance for 12 neighbours is found to be effective with 0.52 before applying PCA and 0.53 after applying PCA. For Support Vector classifier, the rbf kernel is found to be effective with the score of 0.519 with and without PCA. For Decision Tree classifier, before applying PCA, the score is 0.47 for 7 features and after applying PCA, the score is 0.49 for 4 features. For, Random Forest Classifier, before applying PCA, the score is 0.53 for 500 estimators and after applying PCA, the score is 0.52 for 500 estimators.


2019 ◽  
Vol 2019 ◽  
pp. 1-14 ◽  
Author(s):  
Wei Li ◽  
Kun Yu ◽  
Chaolu Feng ◽  
Dazhe Zhao

Background and Objective. Breast cancer is a major cause of mortality among women if not treated in early stages. Recognizing molecular markers from DCE-MRI directly to distinguish the four molecular subtypes without invasive biopsy is helpful for guiding treatment plans for breast cancer, which provides a fast way to consequential treatment plan decision in early time and best opportunity for patients. Methods. This study presents an approach of molecular subtypes recognition from breast cancer image phenotypes by radiomics. An improved region growth algorithm with dynamic threshold without user interaction is proposed for cancer lesion segmentation, which gives the precise border of lesion other than area with background. The lesions are extracted automatically based on radiologists’ annotation which guarantees the lesion is segmented correctly. Various features are extracted on lesions data including texture, morphology, dynamic kinetics, and statistics features carried out on a large patient cohort, which are used to validate the relationship between image phenotypes and the molecular subtypes. A new algorithm of multimodel-based recursive feature elimination is applied on the radiomics data generated by the feature extraction process. This method obtains the feature subset with stable performance for different classification models, and the gradient boosting decision tree model gets the best results of both classification performance and imbalance performance on molecular subtypes. Result. From the experimental results, 69 optimal features from 143 original features are found by the multimodel-based recursive feature elimination algorithms and the gradient boosting decision tree classifier obtains a good performance with accuracy 0.87, precise 0.88, recall 0.87, and F1-score 0.87. The dataset with 637 patients in this paper has serious imbalance problem on different molecular subtypes, and the the robust features that are generated by multimodel-based recursive feature eliminiation algorithm make the gradient boosting decision tree classifier have good behaviors. The recognition precision for the four molecular subtypes of luminal A, luminal B, HER-2, and basal-like are 0.91, 0.89, 0.83, and 0.87, respectively. Conclusions. The improved lesion segmentation method gives more precise lesion edge, which not only saves the time of automatic extraction of lesion region of interest without threshold setting for each case, but also prevents the segmentation error by manual and prejudice from different radiologists. The feature selection algorithm of multimodel-based recursive feature elimination has the ability to find robust and optimal features that distinguish the four molecular subtypes from image phenotypes. The gradient boosting decision tree classifier rather plays a main role in recognition than other models used in this paper.


2020 ◽  
Vol 2 (1) ◽  
pp. 62
Author(s):  
Luis F. Villamil-Cubillos ◽  
Jersson X. Leon-Medina ◽  
Maribel Anaya ◽  
Diego A. Tibaduiza

An electronic tongue is a device composed of a sensor array that takes advantage of the cross sensitivity property of several sensors to perform classification and quantification in liquid substances. In practice, electronic tongues generate a large amount of information that needs to be correctly analyzed, to define which interactions and features are more relevant to distinguish one substance from another. This work focuses on implementing and validating feature selection methodologies in the liquid classification process of a multifrequency large amplitude pulse voltammetric (MLAPV) electronic tongue. Multi-layer perceptron neural network (MLP NN) and support vector machine (SVM) were used as supervised machine learning classifiers. Different feature selection techniques were used, such as Variance filter, ANOVA F-value, Recursive Feature Elimination and model-based selection. Both 5-fold Cross validation and GridSearchCV were used in order to evaluate the performance of the feature selection methodology by testing various configurations and determining the best one. The methodology was validated in an imbalanced MLAPV electronic tongue dataset of 13 different liquid substances, reaching a 93.85% of classification accuracy.


Deriving the methodologies to detect heart issues at an earlier stage and intimating the patient to improve their health. To resolve this problem, we will use Machine Learning techniques to predict the incidence at an earlier stage. We have a tendency to use sure parameters like age, sex, height, weight, case history, smoking and alcohol consumption and test like pressure ,cholesterol, diabetes, ECG, ECHO for prediction. In machine learning there are many algorithms which will be used to solve this issue. The algorithms include K-Nearest Neighbour, Support vector classifier, decision tree classifier, logistic regression and Random Forest classifier. Using these parameters and algorithms we need to predict whether or not the patient has heart disease or not and recommend the patient to improve his/her health.


: In this era of Internet, the issue of security of information is at its peak. One of the main threats in this cyber world is phishing attacks which is an email or website fraud method that targets the genuine webpage or an email and hacks it without the consent of the end user. There are various techniques which help to classify whether the website or an email is legitimate or fake. The major contributors in the process of detection of these phishing frauds include the classification algorithms, feature selection techniques or dataset preparation methods and the feature extraction that plays an important role in detection as well as in prevention of these attacks. This Survey Paper studies the effect of all these contributors and the approaches that are applied in the study conducted on the recent papers. Some of the classification algorithms that are implemented includes Decision tree, Random Forest , Support Vector Machines, Logistic Regression , Lazy K Star, Naive Bayes and J48 etc.


2019 ◽  
Vol 11 (10-SPECIAL ISSUE) ◽  
pp. 1232-1237
Author(s):  
B. Bavani ◽  
S. Nirmala Sugirtha Rajini ◽  
M.S. Josephine ◽  
V. Prasannakumari

Sign in / Sign up

Export Citation Format

Share Document