Identifying Optimal Wavelengths as Disease Signatures Using Hyperspectral Sensor and Machine Learning

Xing Wei; Marcela A. Johnson; David B. Langston; Hillary L. Mehl; Song Li

doi:10.3390/rs13142833

Identifying Optimal Wavelengths as Disease Signatures Using Hyperspectral Sensor and Machine Learning

Remote Sensing ◽

10.3390/rs13142833 ◽

2021 ◽

Vol 13 (14) ◽

pp. 2833

Author(s):

Xing Wei ◽

Marcela A. Johnson ◽

David B. Langston ◽

Hillary L. Mehl ◽

Song Li

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Optical Sensors ◽

Minimum Distance ◽

Classification Performance ◽

Stem Rot ◽

Recursive Feature Elimination ◽

Agricultural Crop ◽

Selection Methods ◽

20 Nm

Hyperspectral sensors combined with machine learning are increasingly utilized in agricultural crop systems for diverse applications, including plant disease detection. This study was designed to identify the most important wavelengths to discriminate between healthy and diseased peanut (Arachis hypogaea L.) plants infected with Athelia rolfsii, the causal agent of peanut stem rot, using in-situ spectroscopy and machine learning. In greenhouse experiments, daily measurements were conducted to inspect disease symptoms visually and to collect spectral reflectance of peanut leaves on lateral stems of plants mock-inoculated and inoculated with A. rolfsii. Spectrum files were categorized into five classes based on foliar wilting symptoms. Five feature selection methods were compared to select the top 10 ranked wavelengths with and without a custom minimum distance of 20 nm. Recursive feature elimination methods outperformed the chi-square and SelectFromModel methods. Adding the minimum distance of 20 nm into the top selected wavelengths improved classification performance. Wavelengths of 501–505, 690–694, 763 and 884 nm were repeatedly selected by two or more feature selection methods. These selected wavelengths can be applied in designing optical sensors for automated stem rot detection in peanut fields. The machine-learning-based methodology can be adapted to identify spectral signatures of disease in other plant-pathogen systems.

Download Full-text

Chinese Sentiment Classifier Machine Learning Based on Optimized Information Gain Feature Selection

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.988.511 ◽

2014 ◽

Vol 988 ◽

pp. 511-516 ◽

Cited By ~ 3

Author(s):

Jin Tao Shi ◽

Hui Liang Liu ◽

Yuan Xu ◽

Jun Feng Yan ◽

Jian Feng Xu

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Word Frequency ◽

Chinese Text ◽

Information Gain ◽

Classification Performance ◽

Selection Methods ◽

Text Feature ◽

Important Solution ◽

Feature Word

Machine learning is important solution in the research of Chinese text sentiment categorization , the text feature selection is critical to the classification performance. However, the classical feature selection methods have better effect on the global categories, but it misses many representative feature words of each category. This paper presents an improved information gain method that integrates word frequency and degree of feature word sentiment into traditional information gain methods. Experiments show that classifier improved by this method has better classification .

Download Full-text

Stability Evaluation of Brain Changes in Parkinson's Disease Based on Machine Learning

Frontiers in Computational Neuroscience ◽

10.3389/fncom.2021.735991 ◽

2021 ◽

Vol 15 ◽

Author(s):

Chenggang Song ◽

Weidong Zhao ◽

Hong Jiang ◽

Xiaoju Liu ◽

Yumei Duan ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Brain Regions ◽

Classification Performance ◽

Group Level ◽

Selection Methods ◽

Individual Level ◽

Brain Changes ◽

The Brain ◽

Potential Biomarkers

Structural MRI (sMRI) has been widely used to examine the cerebral changes that occur in Parkinson's disease (PD). However, previous studies have aimed for brain changes at the group level rather than at the individual level. Additionally, previous studies have been inconsistent regarding the changes they identified. It is difficult to identify which brain regions are the true biomarkers of PD. To overcome these two issues, we employed four different feature selection methods [ReliefF, graph-theory, recursive feature elimination (RFE), and stability selection] to obtain a minimal set of relevant features and nonredundant features from gray matter (GM) and white matter (WM). Then, a support vector machine (SVM) was utilized to learn decision models from selected features. Based on machine learning technique, this study has not only extended group level statistical analysis with identifying group difference to individual level with predicting patients with PD from healthy controls (HCs), but also identified most informative brain regions with feature selection methods. Furthermore, we conducted horizontal and vertical analyses to investigate the stability of the identified brain regions. On the one hand, we compared the brain changes found by different feature selection methods and considered these brain regions found by feature selection methods commonly as the potential biomarkers related to PD. On the other hand, we compared these brain changes with previous findings reported by conventional statistical analysis to evaluate their stability. Our experiments have demonstrated that the proposed machine learning techniques achieve satisfactory and robust classification performance. The highest classification performance was 92.24% (specificity), 92.42% (sensitivity), 89.58% (accuracy), and 89.77% (AUC) for GM and 71.93% (specificity), 74.87% (sensitivity), 71.18% (accuracy), and 71.82% (AUC) for WM. Moreover, most brain regions identified by machine learning were consistent with previous findings, which means that these brain regions are related to the pathological brain changes characteristic of PD and can be regarded as potential biomarkers of PD. Besides, we also found the brain abnormality of superior frontal gyrus (dorsolateral, SFGdor) and lingual gyrus (LING), which have been confirmed in other studies of PD. This further demonstrates that machine learning models are beneficial for clinicians as a decision support system in diagnosing PD.

Download Full-text

Comparing Methods of Feature Extraction of Brain Activities for Octave Illusion Classification Using Machine Learning

Sensors ◽

10.3390/s21196407 ◽

2021 ◽

Vol 21 (19) ◽

pp. 6407

Author(s):

Nina Pilyugina ◽

Akihiko Tsukahara ◽

Keita Tanaka

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Feature Selection ◽

Principal Component ◽

Machine Learning Algorithms ◽

Recursive Feature Elimination ◽

Support Vector ◽

Selection Methods ◽

Automatic Feature Extraction ◽

Octave Illusion

The aim of this study was to find an efficient method to determine features that characterize octave illusion data. Specifically, this study compared the efficiency of several automatic feature selection methods for automatic feature extraction of the auditory steady-state responses (ASSR) data in brain activities to distinguish auditory octave illusion and nonillusion groups by the difference in ASSR amplitudes using machine learning. We compared univariate selection, recursive feature elimination, principal component analysis, and feature importance by testifying the results of feature selection methods by using several machine learning algorithms: linear regression, random forest, and support vector machine. The univariate selection with the SVM as the classification method showed the highest accuracy result, 75%, compared to 66.6% without using feature selection. The received results will be used for future work on the explanation of the mechanism behind the octave illusion phenomenon and creating an algorithm for automatic octave illusion classification.

Download Full-text

Sentiment Analysis of Movie Reviews: A Study of Machine Learning Algorithms with Various Feature Selection Methods

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v5i9.113121 ◽

2017 ◽

Vol 5 (9) ◽

Cited By ~ 1

Author(s):

Rajwinder Kaur

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Sentiment Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Selection Methods

Download Full-text

Use of Machine Learning to Investigate the Quantitative Checklist for Autism in Toddlers (Q-CHAT) towards Early Autism Screening

Diagnostics ◽

10.3390/diagnostics11030574 ◽

2021 ◽

Vol 11 (3) ◽

pp. 574

Author(s):

Gennaro Tartarisco ◽

Giovanni Cicceri ◽

Davide Di Pietro ◽

Elisa Leonardi ◽

Stefania Aiello ◽

...

Keyword(s):

Machine Learning ◽

High Performance ◽

Behavioral Science ◽

Autistic Traits ◽

Classification Performance ◽

Recursive Feature Elimination ◽

Diagnostic Tools ◽

Support Vector ◽

K Nearest Neighbors ◽

Autism Screening

In the past two decades, several screening instruments were developed to detect toddlers who may be autistic both in clinical and unselected samples. Among others, the Quantitative CHecklist for Autism in Toddlers (Q-CHAT) is a quantitative and normally distributed measure of autistic traits that demonstrates good psychometric properties in different settings and cultures. Recently, machine learning (ML) has been applied to behavioral science to improve the classification performance of autism screening and diagnostic tools, but mainly in children, adolescents, and adults. In this study, we used ML to investigate the accuracy and reliability of the Q-CHAT in discriminating young autistic children from those without. Five different ML algorithms (random forest (RF), naïve Bayes (NB), support vector machine (SVM), logistic regression (LR), and K-nearest neighbors (KNN)) were applied to investigate the complete set of Q-CHAT items. Our results showed that ML achieved an overall accuracy of 90%, and the SVM was the most effective, being able to classify autism with 95% accuracy. Furthermore, using the SVM–recursive feature elimination (RFE) approach, we selected a subset of 14 items ensuring 91% accuracy, while 83% accuracy was obtained from the 3 best discriminating items in common to ours and the previously reported Q-CHAT-10. This evidence confirms the high performance and cross-cultural validity of the Q-CHAT, and supports the application of ML to create shorter and faster versions of the instrument, maintaining high classification accuracy, to be used as a quick, easy, and high-performance tool in primary-care settings.

Download Full-text

Radiogenomic modeling predicts survival-associated prognostic groups in glioblastoma

Neuro-Oncology Advances ◽

10.1093/noajnl/vdab004 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Nicholas Nuechterlein ◽

Beibin Li ◽

Abdullah Feroze ◽

Eric C Holland ◽

Linda Shapiro ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Molecular Subtypes ◽

Feature Selection Method ◽

Area Under The Curve ◽

Selection Method ◽

Recursive Feature Elimination ◽

Signal Abnormality ◽

Mri Features ◽

Mri Scans

Abstract Background Combined whole-exome sequencing (WES) and somatic copy number alteration (SCNA) information can separate isocitrate dehydrogenase (IDH)1/2-wildtype glioblastoma into two prognostic molecular subtypes, which cannot be distinguished by epigenetic or clinical features. The potential for radiographic features to discriminate between these molecular subtypes has yet to be established. Methods Radiologic features (n = 35 340) were extracted from 46 multisequence, pre-operative magnetic resonance imaging (MRI) scans of IDH1/2-wildtype glioblastoma patients from The Cancer Imaging Archive (TCIA), all of whom have corresponding WES/SCNA data. We developed a novel feature selection method that leverages the structure of extracted MRI features to mitigate the dimensionality challenge posed by the disparity between a large number of features and the limited patients in our cohort. Six traditional machine learning classifiers were trained to distinguish molecular subtypes using our feature selection method, which was compared to least absolute shrinkage and selection operator (LASSO) feature selection, recursive feature elimination, and variance thresholding. Results We were able to classify glioblastomas into two prognostic subgroups with a cross-validated area under the curve score of 0.80 (±0.03) using ridge logistic regression on the 15-dimensional principle component analysis (PCA) embedding of the features selected by our novel feature selection method. An interrogation of the selected features suggested that features describing contours in the T2 signal abnormality region on the T2-weighted fluid-attenuated inversion recovery (FLAIR) MRI sequence may best distinguish these two groups from one another. Conclusions We successfully trained a machine learning model that allows for relevant targeted feature extraction from standard MRI to accurately predict molecularly-defined risk-stratifying IDH1/2-wildtype glioblastoma patient groups.

Download Full-text

Source allocation of per- and polyfluoroalkyl substances (PFAS) with supervised machine learning: Classification performance and the role of feature selection in an expanded dataset

Chemosphere ◽

10.1016/j.chemosphere.2021.130124 ◽

2021 ◽

Vol 275 ◽

pp. 130124

Author(s):

Tohren C.G. Kibbey ◽

Rafal Jabrzemski ◽

Denis M. O’Carroll

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Classification Performance ◽

Supervised Machine Learning ◽

Machine Learning Classification ◽

Polyfluoroalkyl Substances ◽

Source Allocation

Download Full-text

Comparative study on total nitrogen prediction in wastewater treatment plant and effect of various feature selection methods on machine learning algorithms performance

Journal of Water Process Engineering ◽

10.1016/j.jwpe.2021.102033 ◽

2021 ◽

Vol 41 ◽

pp. 102033

Author(s):

Faramarz Bagherzadeh ◽

Mohamad-Javad Mehrani ◽

Milad Basirifard ◽

Javad Roostaei

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Wastewater Treatment ◽

Comparative Study ◽

Total Nitrogen ◽

Wastewater Treatment Plant ◽

Learning Algorithms ◽

Treatment Plant ◽

Machine Learning Algorithms ◽

Selection Methods

Download Full-text

Machine learning integrated ensemble of feature selection methods followed by survival analysis for predicting breast cancer subtype specific miRNA biomarkers

Computers in Biology and Medicine ◽

10.1016/j.compbiomed.2021.104244 ◽

2021 ◽

Vol 131 ◽

pp. 104244

Author(s):

Jnanendra Prasad Sarkar ◽

Indrajit Saha ◽

Anasua Sarkar ◽

Ujjwal Maulik

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Feature Selection ◽

Survival Analysis ◽

Breast Cancer Subtype ◽

Selection Methods ◽

Cancer Subtype

Download Full-text

Incorporate Syntactic Information for Short Text Classification

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.268-270.697 ◽

2011 ◽

Vol 268-270 ◽

pp. 697-700

Author(s):

Rui Xue Duan ◽

Xiao Jie Wang ◽

Wen Feng Li

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Environment ◽

Text Classification ◽

The Internet ◽

Selection Methods ◽

Text Documents ◽

Short Text ◽

Syntactic Information ◽

Dependency Relations

As the volume of online short text documents grow tremendously on the Internet, it is much more urgent to solve the task of organizing the short texts well. However, the traditional feature selection methods cannot suitable for the short text. In this paper, we proposed a method to incorporate syntactic information for the short text. It emphasizes the feature which has more dependency relations with other words. The classifier SVM and machine learning environment Weka are involved in our experiments. The experiment results show that incorporate syntactic information in the short text, we can get more powerful features than traditional feature selection methods, such as DF, CHI. The precision of short text classification improved from 86.2% to 90.8%.

Download Full-text