Machine Learning for the Preliminary Diagnosis of Dementia

Scientific Programming ◽

10.1155/2020/5629090 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Fubao Zhu ◽

Xiaonan Li ◽

Haipeng Tang ◽

Zhuo He ◽

Chaoyang Zhang ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Information Gain ◽

Diagnostic Model ◽

Preliminary Diagnosis ◽

Early Stages ◽

Item Questionnaire ◽

Diagnostic Models ◽

Diagnosis Of Dementia ◽

Bayes Algorithm

Objective. The reliable diagnosis remains a challenging issue in the early stages of dementia. We aimed to develop and validate a new method based on machine learning to help the preliminary diagnosis of normal, mild cognitive impairment (MCI), very mild dementia (VMD), and dementia using an informant-based questionnaire. Methods. We enrolled 5,272 individuals who filled out a 37-item questionnaire. In order to select the most important features, three different techniques of feature selection were tested. Then, the top features combined with six classification algorithms were used to develop the diagnostic models. Results. Information Gain was the most effective among the three feature selection methods. The Naive Bayes algorithm performed the best (accuracy = 0.81, precision = 0.82, recall = 0.81, and F-measure = 0.81) among the six classification models. Conclusion. The diagnostic model proposed in this paper provides a powerful tool for clinicians to diagnose the early stages of dementia.

Download Full-text

Early Prediction of Malignant Mesothelioma: An Approach towards Non-invasive Method

Current Bioinformatics ◽

10.2174/1574893616666210616121023 ◽

2021 ◽

Vol 16 ◽

Author(s):

Shakir Shabbir ◽

M. Shahzad Asif ◽

Talha Mahboob Alam ◽

Zeeshan Ramzan

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Malignant Mesothelioma ◽

Information Gain ◽

Imbalanced Data ◽

Machine Learning Algorithms ◽

Diagnostic Methods ◽

Machine Learning Techniques ◽

Diagnostic Model ◽

Diagnostic Measures

Background: Malignant Mesothelioma (MM) is a rare but aggressive tumor that arises in the lungs. Commonly, costly imaging and laboratory resources, i.e., X-ray imaging, magnetic resonance imaging (MRI), positron emission tomography (PET) scans, biopsies, and blood tests, have already been utilized for the diagnosis of MM. Even though these diagnostic measures are expensive and unavailable in distant areas, some of these diagnostic methods are also very painful for the patient, including biopsy and cytology of pleural fluid. Objective: In this study, we proposed a diagnostic model for early identification of MM via machine learning techniques. We explored the health records of 324 Turkish patients, which showed the symptoms related to MM. The data of patients included socio-economic, geographical, and clinical features. Methods: Different feature selection methods have been employed for the selection of significant features. To overcome the data imbalance problem, various data-level resampling techniques have been utilized to obtain efficient results. The gradient boosted decision tree (GBDT) method has been used to develop the diagnostic model. The performance of the GBDT model is also compared with traditional machine learning algorithms. Results and Conclusion: Our model's results outperformed other models, both on balance and imbalance data. The results clearly show that undersampling techniques outperformed imbalanced data without resampling based on accuracy and receiving operating characteristic (ROC) value. Conversely, it has also been observed that oversampling techniques outperformed undersampling and imbalanced data based on accuracy and ROC. All classifiers employed in this study achieved efficient results utilizing feature selection-based methods (OneR, information gain, and Relief-F), but the other two methods (gain ratio and correlation) results were not entirely promising. Finally, when the combination of Synthetic Minority Oversampling Technique (SMOTE) and OneR was applied with GBDT, it gave the most favorable results based on accuracy, F-measure, and ROC. The diagnosis model has also been deployed to assist doctors, patients, medical practitioners, and other healthcare professionals for early diagnosis and better treatment of MM.

Download Full-text

Customer Churn Prediction in Telecom Sector with Machine Learning and Information Gain Filter Feature Selection Algorithms

10.1109/icdabi53623.2021.9655792 ◽

2021 ◽

Author(s):

Yakub K. Saheed ◽

Moshood A. Hambali

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Information Gain ◽

Churn Prediction ◽

Customer Churn ◽

Customer Churn Prediction ◽

Telecom Sector ◽

Selection Algorithms

Download Full-text

To use or not to use: Feature selection for sentiment analysis of highly imbalanced data

Natural Language Engineering ◽

10.1017/s1351324917000298 ◽

2017 ◽

Vol 24 (1) ◽

pp. 3-37 ◽

Cited By ~ 5

Author(s):

SANDRA KÜBLER ◽

CAN LIU ◽

ZEESHAN ALI SAYYED

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Sentiment Analysis ◽

Information Gain ◽

Binary Classification ◽

Small Subset ◽

Large Set ◽

Learning Approaches ◽

Selection Methods ◽

Data Set

AbstractWe investigate feature selection methods for machine learning approaches in sentiment analysis. More specifically, we use data from the cooking platform Epicurious and attempt to predict ratings for recipes based on user reviews. In machine learning approaches to such tasks, it is a common approach to use word or part-of-speech n-grams. This results in a large set of features, out of which only a small subset may be good indicators for the sentiment. One of the questions we investigate concerns the extension of feature selection methods from a binary classification setting to a multi-class problem. We show that an inherently multi-class approach, multi-class information gain, outperforms ensembles of binary methods. We also investigate how to mitigate the effects of extreme skewing in our data set by making our features more robust and by using review and recipe sampling. We show that over-sampling is the best method for boosting performance on the minority classes, but it also results in a severe drop in overall accuracy of at least 6 per cent points.

Download Full-text

Information gain and divergence-based feature selection for machine learning-based text categorization

Information Processing & Management ◽

10.1016/j.ipm.2004.08.006 ◽

2006 ◽

Vol 42 (1) ◽

pp. 155-165 ◽

Cited By ~ 167

Author(s):

Changki Lee ◽

Gary Geunbae Lee

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Text Categorization ◽

Information Gain ◽

Selection For

Download Full-text

Chinese Sentiment Classifier Machine Learning Based on Optimized Information Gain Feature Selection

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.988.511 ◽

2014 ◽

Vol 988 ◽

pp. 511-516 ◽

Cited By ~ 3

Author(s):

Jin Tao Shi ◽

Hui Liang Liu ◽

Yuan Xu ◽

Jun Feng Yan ◽

Jian Feng Xu

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Word Frequency ◽

Chinese Text ◽

Information Gain ◽

Classification Performance ◽

Selection Methods ◽

Text Feature ◽

Important Solution ◽

Feature Word

Machine learning is important solution in the research of Chinese text sentiment categorization , the text feature selection is critical to the classification performance. However, the classical feature selection methods have better effect on the global categories, but it misses many representative feature words of each category. This paper presents an improved information gain method that integrates word frequency and degree of feature word sentiment into traditional information gain methods. Experiments show that classifier improved by this method has better classification .

Download Full-text

Opinion Mining on Culinary Food Customer Satisfaction Using Naïve Bayes Based-on Hybrid Feature Selection

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v15.i1.pp468-475 ◽

2019 ◽

Vol 15 (1) ◽

pp. 468 ◽

Cited By ~ 3

Author(s):

Oman Somantri ◽

Dyah Apriliani

Keyword(s):

Feature Selection ◽

Opinion Mining ◽

Naive Bayes ◽

Information Gain ◽

Naïve Bayes ◽

Classification Model ◽

Consumer Ratings ◽

Bayes Algorithm ◽

Restaurant Owners

<p>Conducting an assessment of consumer sentiments taken from social media in assessing a culinary food gives useful information for everyone who wants to get this information especially for migrants and tourists, in th other hand that information is very valuable for food stall and restaurant owners as information in improvinf food quality. Overcoming this problem, a sentiment analysis classification model using naïve bayes algorithm (NB) was applied to get this information. This problem occurs is the level of accuracy of classification of consumer ratings of culinary food is still not optimal because the weight of values in the data preprocessing process are not optimal. In this paper proposed a hybrid feature selection models to overcome the problems in the process of selecting the feature attributes that have not been optimal by using a combination of information gain (IG) and genetic algorithm (GA) algorithms. The result of this research showed that after the experiment and compared to using others algorithms produce the best of the level occuracy is 93%.</p>

Download Full-text

An Empirical Evaluation of Feature Selection Methods

Improving Knowledge Discovery through the Integration of Data Mining Techniques - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-8513-0.ch012 ◽

2015 ◽

pp. 233-258 ◽

Cited By ~ 1

Author(s):

Mohsin Iqbal ◽

Saif Ur Rehman ◽

Saira Gillani ◽

Sohail Asghar

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Classification Accuracy ◽

Information Gain ◽

Learning Algorithm ◽

Empirical Evaluation ◽

Machine Learning Algorithms ◽

Selection Methods ◽

The One ◽

Processing And Storage

The key objective of the chapter would be to study the classification accuracy, using feature selection with machine learning algorithms. The dimensionality of the data is reduced by implementing Feature selection and accuracy of the learning algorithm improved. We test how an integrated feature selection could affect the accuracy of three classifiers by performing feature selection methods. The filter effects show that Information Gain (IG), Gain Ratio (GR) and Relief-f, and wrapper effect show that Bagging and Naive Bayes (NB), enabled the classifiers to give the highest escalation in classification accuracy about the average while reducing the volume of unnecessary attributes. The achieved conclusions can advise the machine learning users, which classifier and feature selection methods to use to optimize the classification accuracy, and this can be important, especially at risk-sensitive applying Machine Learning whereas in the one of the aim to reduce costs of collecting, processing and storage of unnecessary data.

Download Full-text

Android Adware Detection Using Machine Learning

International Journal of Cyber Research and Education ◽

10.4018/ijcre.2021070101 ◽

2021 ◽

Vol 3 (2) ◽

pp. 1-19

Author(s):

Sikha Bagui ◽

Daniel Benson

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Network Traffic ◽

Information Gain ◽

Malware Detection ◽

Classification Rate ◽

The Individual ◽

Evasive Malware

Adware, an advertising-supported software, becomes a type of malware when it automatically delivers unwanted advertisements to an infected device, steals user information, and opens other vulnerabilities that allow other malware and adware to be installed. With the rise of more and complex evasive malware, specifically adware, better methods of detecting adware are required. Though a lot of work has been done on malware detection in general, very little focus has been put on the adware family. The novelty of this paper lies in analyzing the individual adware families. To date, no work has been done on analyzing the individual adware families. In this paper, using the CICAndMal2017 dataset, feature selection is performed using information gain, and classification is performed using machine learning. The best attributes for classification of each of the individual adware families using network traffic samples are presented. The results present an average classification rate that is an improvement over previous works for classification of individual adware families.

Download Full-text

Methodology for synthesis of neural network diagnostic models of complex technical objects

Automation on Transport ◽

10.20295/2412-9186-2020-6-4-466-483 ◽

2020 ◽

Vol 6 (4) ◽

pp. 466-483

Author(s):

V. V. Grachev ◽

◽

A. V. Grishchenko ◽

V. A. Kruchek ◽

F. Yu. Bazilevsky ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Test Sample ◽

Learning Problems ◽

Current Status ◽

Design Stage ◽

Diagnostic Tools ◽

Diagnostic Model ◽

Technical Diagnostics ◽

Diagnostic Models

Despite the vast experience of using the neural networks for solving various machine learning problems, the numerous attempts to use them in technical diagnostics have not yet led to complete solutions so far (with rare exceptions). The reason is the specific nature of technical diagnostics that distinguishes such tasks from traditional machine learning problems. Having analyzed these specific features, the authors propose an approach to diagnosing complex technical objects that is focused on the use in built-in diagnostics systems and is based on the neural network reference diagnostic models of functionally isolated nodes and assemblies. The article describes the methodology for the synthesis of such models, their training on the data obtained by monitoring the object being tested using built-in diagnostic tools, determining the permissible response errors, and adapting to the current status of the object. The fuzzification of the diagnostic model results using the test sample proposed in the article makes it possible to standardize the approach to diagnosing complex technical objects designed for various purposes. The use of D. Trigg’s tracking control signal proposed by the authors to monitor regression residuals during the learning increases the training quality and generalization ability of models. The value of this signal determined by the model run on a test sample is an additional informative diagnostic parameter that increases the accuracy of classifying the status of the object under test. The proposed methodology applied at the complex technical object design stage allows optimizing the monitored parameters’ array and multiplying the efficiency of the diagnostic information recorded by the built-in diagnostic and monitoring tools.

Download Full-text

A Diagnostic Model for Cardiovascular Diseases Using Feature Selection and Data Clustering

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.8952 ◽

2020 ◽

Vol 17 (6) ◽

pp. 2859-2865

Author(s):

Shima Farahbakhsh

Keyword(s):

Data Mining ◽

Feature Selection ◽

Cardiovascular Diseases ◽

Data Clustering ◽

Information Gain ◽

Pearson Correlation ◽

Large Body ◽

Diagnostic Model ◽

Target Feature ◽

Data Mining Techniques

Cardiovascular diseases are one of the most common diseases and currently, the number of people with cardiovascular diseases is increasing. However, if necessary treatment is not provided for the patient at the right time, it might lead to patient death. Therefore, accurate diagnosis of cardiac problems during the first examination along with suitable treatment can decrease the rate of mortality due to cardiovascular diseases. To this end, data mining techniques can be used. Data mining extracts the necessary data from a large body of information. This data is then is used for data classification and prediction through clustering, classification and/or identification of hidden patterns. Many studies so far have focused on using data mining techniques to diagnose cardiovascular diseases. The present study aims to provide a diagnostic model for cardiovascular diseases using an approach based on feature selection and data clustering as pre-processing steps. The proposed model involves 4 main phases: (1) Pre-processing the data to eliminate null and outlier values from data sets; (2) Choosing effective features by using three methods of Pearson correlation coefficient, Information Gain algorithm, and analysis of the main components which try to remove the features that do not have a special relationship with target feature and the behavior of this feature is independent of the target feature; at the end of this phase, 5 features of 13 initial features are removed. (3) Using the KMeans algorithm in data clustering and developing pre-processes before creating the final cluster and developing a model for predicting the type of cardiovascular diseases. The results obtained from the proposed solution show that am4 algorithms of ID3, Naïve Bayes, SVM, and IBK used, IBK algorithm was the most accurate algorithm with 0.97 accuracy.

Download Full-text