scholarly journals Machine Learning for the Preliminary Diagnosis of Dementia

2020 ◽  
Vol 2020 ◽  
pp. 1-10 ◽  
Author(s):  
Fubao Zhu ◽  
Xiaonan Li ◽  
Haipeng Tang ◽  
Zhuo He ◽  
Chaoyang Zhang ◽  
...  

Objective. The reliable diagnosis remains a challenging issue in the early stages of dementia. We aimed to develop and validate a new method based on machine learning to help the preliminary diagnosis of normal, mild cognitive impairment (MCI), very mild dementia (VMD), and dementia using an informant-based questionnaire. Methods. We enrolled 5,272 individuals who filled out a 37-item questionnaire. In order to select the most important features, three different techniques of feature selection were tested. Then, the top features combined with six classification algorithms were used to develop the diagnostic models. Results. Information Gain was the most effective among the three feature selection methods. The Naive Bayes algorithm performed the best (accuracy = 0.81, precision = 0.82, recall = 0.81, and F-measure = 0.81) among the six classification models. Conclusion. The diagnostic model proposed in this paper provides a powerful tool for clinicians to diagnose the early stages of dementia.

2021 ◽  
Vol 16 ◽  
Author(s):  
Shakir Shabbir ◽  
M. Shahzad Asif ◽  
Talha Mahboob Alam ◽  
Zeeshan Ramzan

Background: Malignant Mesothelioma (MM) is a rare but aggressive tumor that arises in the lungs. Commonly, costly imaging and laboratory resources, i.e., X-ray imaging, magnetic resonance imaging (MRI), positron emission tomography (PET) scans, biopsies, and blood tests, have already been utilized for the diagnosis of MM. Even though these diagnostic measures are expensive and unavailable in distant areas, some of these diagnostic methods are also very painful for the patient, including biopsy and cytology of pleural fluid. Objective: In this study, we proposed a diagnostic model for early identification of MM via machine learning techniques. We explored the health records of 324 Turkish patients, which showed the symptoms related to MM. The data of patients included socio-economic, geographical, and clinical features. Methods: Different feature selection methods have been employed for the selection of significant features. To overcome the data imbalance problem, various data-level resampling techniques have been utilized to obtain efficient results. The gradient boosted decision tree (GBDT) method has been used to develop the diagnostic model. The performance of the GBDT model is also compared with traditional machine learning algorithms. Results and Conclusion: Our model's results outperformed other models, both on balance and imbalance data. The results clearly show that undersampling techniques outperformed imbalanced data without resampling based on accuracy and receiving operating characteristic (ROC) value. Conversely, it has also been observed that oversampling techniques outperformed undersampling and imbalanced data based on accuracy and ROC. All classifiers employed in this study achieved efficient results utilizing feature selection-based methods (OneR, information gain, and Relief-F), but the other two methods (gain ratio and correlation) results were not entirely promising. Finally, when the combination of Synthetic Minority Oversampling Technique (SMOTE) and OneR was applied with GBDT, it gave the most favorable results based on accuracy, F-measure, and ROC. The diagnosis model has also been deployed to assist doctors, patients, medical practitioners, and other healthcare professionals for early diagnosis and better treatment of MM.


2017 ◽  
Vol 24 (1) ◽  
pp. 3-37 ◽  
Author(s):  
SANDRA KÜBLER ◽  
CAN LIU ◽  
ZEESHAN ALI SAYYED

AbstractWe investigate feature selection methods for machine learning approaches in sentiment analysis. More specifically, we use data from the cooking platform Epicurious and attempt to predict ratings for recipes based on user reviews. In machine learning approaches to such tasks, it is a common approach to use word or part-of-speech n-grams. This results in a large set of features, out of which only a small subset may be good indicators for the sentiment. One of the questions we investigate concerns the extension of feature selection methods from a binary classification setting to a multi-class problem. We show that an inherently multi-class approach, multi-class information gain, outperforms ensembles of binary methods. We also investigate how to mitigate the effects of extreme skewing in our data set by making our features more robust and by using review and recipe sampling. We show that over-sampling is the best method for boosting performance on the minority classes, but it also results in a severe drop in overall accuracy of at least 6 per cent points.


2014 ◽  
Vol 988 ◽  
pp. 511-516 ◽  
Author(s):  
Jin Tao Shi ◽  
Hui Liang Liu ◽  
Yuan Xu ◽  
Jun Feng Yan ◽  
Jian Feng Xu

Machine learning is important solution in the research of Chinese text sentiment categorization , the text feature selection is critical to the classification performance. However, the classical feature selection methods have better effect on the global categories, but it misses many representative feature words of each category. This paper presents an improved information gain method that integrates word frequency and degree of feature word sentiment into traditional information gain methods. Experiments show that classifier improved by this method has better classification .


Author(s):  
Oman Somantri ◽  
Dyah Apriliani

<p>Conducting an assessment of consumer sentiments taken from social media in assessing a culinary food gives useful information for everyone who wants to get this information especially for migrants and tourists, in th other hand that information is very valuable for food stall and restaurant owners as information in improvinf food quality. Overcoming this problem, a sentiment analysis classification model using naïve bayes algorithm (NB) was applied to get this information. This problem occurs is the level of accuracy of classification of consumer ratings of culinary food is still not optimal because the weight of values in the data preprocessing process are not optimal. In this paper proposed a hybrid feature selection models to overcome the problems in the process of selecting the feature attributes that have not been optimal by using a combination of information gain (IG) and genetic algorithm (GA) algorithms. The result of this research showed that after the experiment and compared to using others algorithms produce the best of the level occuracy is 93%.</p>


Author(s):  
Mohsin Iqbal ◽  
Saif Ur Rehman ◽  
Saira Gillani ◽  
Sohail Asghar

The key objective of the chapter would be to study the classification accuracy, using feature selection with machine learning algorithms. The dimensionality of the data is reduced by implementing Feature selection and accuracy of the learning algorithm improved. We test how an integrated feature selection could affect the accuracy of three classifiers by performing feature selection methods. The filter effects show that Information Gain (IG), Gain Ratio (GR) and Relief-f, and wrapper effect show that Bagging and Naive Bayes (NB), enabled the classifiers to give the highest escalation in classification accuracy about the average while reducing the volume of unnecessary attributes. The achieved conclusions can advise the machine learning users, which classifier and feature selection methods to use to optimize the classification accuracy, and this can be important, especially at risk-sensitive applying Machine Learning whereas in the one of the aim to reduce costs of collecting, processing and storage of unnecessary data.


2021 ◽  
Vol 3 (2) ◽  
pp. 1-19
Author(s):  
Sikha Bagui ◽  
Daniel Benson

Adware, an advertising-supported software, becomes a type of malware when it automatically delivers unwanted advertisements to an infected device, steals user information, and opens other vulnerabilities that allow other malware and adware to be installed. With the rise of more and complex evasive malware, specifically adware, better methods of detecting adware are required. Though a lot of work has been done on malware detection in general, very little focus has been put on the adware family. The novelty of this paper lies in analyzing the individual adware families. To date, no work has been done on analyzing the individual adware families. In this paper, using the CICAndMal2017 dataset, feature selection is performed using information gain, and classification is performed using machine learning. The best attributes for classification of each of the individual adware families using network traffic samples are presented. The results present an average classification rate that is an improvement over previous works for classification of individual adware families.


2020 ◽  
Vol 6 (4) ◽  
pp. 466-483
Author(s):  
V. V. Grachev ◽  
◽  
A. V. Grishchenko ◽  
V. A. Kruchek ◽  
F. Yu. Bazilevsky ◽  
...  

Despite the vast experience of using the neural networks for solving various machine learning problems, the numerous attempts to use them in technical diagnostics have not yet led to complete solutions so far (with rare exceptions). The reason is the specific nature of technical diagnostics that distinguishes such tasks from traditional machine learning problems. Having analyzed these specific features, the authors propose an approach to diagnosing complex technical objects that is focused on the use in built-in diagnostics systems and is based on the neural network reference diagnostic models of functionally isolated nodes and assemblies. The article describes the methodology for the synthesis of such models, their training on the data obtained by monitoring the object being tested using built-in diagnostic tools, determining the permissible response errors, and adapting to the current status of the object. The fuzzification of the diagnostic model results using the test sample proposed in the article makes it possible to standardize the approach to diagnosing complex technical objects designed for various purposes. The use of D. Trigg’s tracking control signal proposed by the authors to monitor regression residuals during the learning increases the training quality and generalization ability of models. The value of this signal determined by the model run on a test sample is an additional informative diagnostic parameter that increases the accuracy of classifying the status of the object under test. The proposed methodology applied at the complex technical object design stage allows optimizing the monitored parameters’ array and multiplying the efficiency of the diagnostic information recorded by the built-in diagnostic and monitoring tools.


2020 ◽  
Vol 17 (6) ◽  
pp. 2859-2865
Author(s):  
Shima Farahbakhsh

Cardiovascular diseases are one of the most common diseases and currently, the number of people with cardiovascular diseases is increasing. However, if necessary treatment is not provided for the patient at the right time, it might lead to patient death. Therefore, accurate diagnosis of cardiac problems during the first examination along with suitable treatment can decrease the rate of mortality due to cardiovascular diseases. To this end, data mining techniques can be used. Data mining extracts the necessary data from a large body of information. This data is then is used for data classification and prediction through clustering, classification and/or identification of hidden patterns. Many studies so far have focused on using data mining techniques to diagnose cardiovascular diseases. The present study aims to provide a diagnostic model for cardiovascular diseases using an approach based on feature selection and data clustering as pre-processing steps. The proposed model involves 4 main phases: (1) Pre-processing the data to eliminate null and outlier values from data sets; (2) Choosing effective features by using three methods of Pearson correlation coefficient, Information Gain algorithm, and analysis of the main components which try to remove the features that do not have a special relationship with target feature and the behavior of this feature is independent of the target feature; at the end of this phase, 5 features of 13 initial features are removed. (3) Using the KMeans algorithm in data clustering and developing pre-processes before creating the final cluster and developing a model for predicting the type of cardiovascular diseases. The results obtained from the proposed solution show that am4 algorithms of ID3, Naïve Bayes, SVM, and IBK used, IBK algorithm was the most accurate algorithm with 0.97 accuracy.


Sign in / Sign up

Export Citation Format

Share Document