scholarly journals Identifying Optimal Wavelengths as Disease Signatures Using Hyperspectral Sensor and Machine Learning

2021 ◽  
Vol 13 (14) ◽  
pp. 2833
Author(s):  
Xing Wei ◽  
Marcela A. Johnson ◽  
David B. Langston ◽  
Hillary L. Mehl ◽  
Song Li

Hyperspectral sensors combined with machine learning are increasingly utilized in agricultural crop systems for diverse applications, including plant disease detection. This study was designed to identify the most important wavelengths to discriminate between healthy and diseased peanut (Arachis hypogaea L.) plants infected with Athelia rolfsii, the causal agent of peanut stem rot, using in-situ spectroscopy and machine learning. In greenhouse experiments, daily measurements were conducted to inspect disease symptoms visually and to collect spectral reflectance of peanut leaves on lateral stems of plants mock-inoculated and inoculated with A. rolfsii. Spectrum files were categorized into five classes based on foliar wilting symptoms. Five feature selection methods were compared to select the top 10 ranked wavelengths with and without a custom minimum distance of 20 nm. Recursive feature elimination methods outperformed the chi-square and SelectFromModel methods. Adding the minimum distance of 20 nm into the top selected wavelengths improved classification performance. Wavelengths of 501–505, 690–694, 763 and 884 nm were repeatedly selected by two or more feature selection methods. These selected wavelengths can be applied in designing optical sensors for automated stem rot detection in peanut fields. The machine-learning-based methodology can be adapted to identify spectral signatures of disease in other plant-pathogen systems.

2014 ◽  
Vol 988 ◽  
pp. 511-516 ◽  
Author(s):  
Jin Tao Shi ◽  
Hui Liang Liu ◽  
Yuan Xu ◽  
Jun Feng Yan ◽  
Jian Feng Xu

Machine learning is important solution in the research of Chinese text sentiment categorization , the text feature selection is critical to the classification performance. However, the classical feature selection methods have better effect on the global categories, but it misses many representative feature words of each category. This paper presents an improved information gain method that integrates word frequency and degree of feature word sentiment into traditional information gain methods. Experiments show that classifier improved by this method has better classification .


2021 ◽  
Vol 15 ◽  
Author(s):  
Chenggang Song ◽  
Weidong Zhao ◽  
Hong Jiang ◽  
Xiaoju Liu ◽  
Yumei Duan ◽  
...  

Structural MRI (sMRI) has been widely used to examine the cerebral changes that occur in Parkinson's disease (PD). However, previous studies have aimed for brain changes at the group level rather than at the individual level. Additionally, previous studies have been inconsistent regarding the changes they identified. It is difficult to identify which brain regions are the true biomarkers of PD. To overcome these two issues, we employed four different feature selection methods [ReliefF, graph-theory, recursive feature elimination (RFE), and stability selection] to obtain a minimal set of relevant features and nonredundant features from gray matter (GM) and white matter (WM). Then, a support vector machine (SVM) was utilized to learn decision models from selected features. Based on machine learning technique, this study has not only extended group level statistical analysis with identifying group difference to individual level with predicting patients with PD from healthy controls (HCs), but also identified most informative brain regions with feature selection methods. Furthermore, we conducted horizontal and vertical analyses to investigate the stability of the identified brain regions. On the one hand, we compared the brain changes found by different feature selection methods and considered these brain regions found by feature selection methods commonly as the potential biomarkers related to PD. On the other hand, we compared these brain changes with previous findings reported by conventional statistical analysis to evaluate their stability. Our experiments have demonstrated that the proposed machine learning techniques achieve satisfactory and robust classification performance. The highest classification performance was 92.24% (specificity), 92.42% (sensitivity), 89.58% (accuracy), and 89.77% (AUC) for GM and 71.93% (specificity), 74.87% (sensitivity), 71.18% (accuracy), and 71.82% (AUC) for WM. Moreover, most brain regions identified by machine learning were consistent with previous findings, which means that these brain regions are related to the pathological brain changes characteristic of PD and can be regarded as potential biomarkers of PD. Besides, we also found the brain abnormality of superior frontal gyrus (dorsolateral, SFGdor) and lingual gyrus (LING), which have been confirmed in other studies of PD. This further demonstrates that machine learning models are beneficial for clinicians as a decision support system in diagnosing PD.


Sensors ◽  
2021 ◽  
Vol 21 (19) ◽  
pp. 6407
Author(s):  
Nina Pilyugina ◽  
Akihiko Tsukahara ◽  
Keita Tanaka

The aim of this study was to find an efficient method to determine features that characterize octave illusion data. Specifically, this study compared the efficiency of several automatic feature selection methods for automatic feature extraction of the auditory steady-state responses (ASSR) data in brain activities to distinguish auditory octave illusion and nonillusion groups by the difference in ASSR amplitudes using machine learning. We compared univariate selection, recursive feature elimination, principal component analysis, and feature importance by testifying the results of feature selection methods by using several machine learning algorithms: linear regression, random forest, and support vector machine. The univariate selection with the SVM as the classification method showed the highest accuracy result, 75%, compared to 66.6% without using feature selection. The received results will be used for future work on the explanation of the mechanism behind the octave illusion phenomenon and creating an algorithm for automatic octave illusion classification.


Diagnostics ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 574
Author(s):  
Gennaro Tartarisco ◽  
Giovanni Cicceri ◽  
Davide Di Pietro ◽  
Elisa Leonardi ◽  
Stefania Aiello ◽  
...  

In the past two decades, several screening instruments were developed to detect toddlers who may be autistic both in clinical and unselected samples. Among others, the Quantitative CHecklist for Autism in Toddlers (Q-CHAT) is a quantitative and normally distributed measure of autistic traits that demonstrates good psychometric properties in different settings and cultures. Recently, machine learning (ML) has been applied to behavioral science to improve the classification performance of autism screening and diagnostic tools, but mainly in children, adolescents, and adults. In this study, we used ML to investigate the accuracy and reliability of the Q-CHAT in discriminating young autistic children from those without. Five different ML algorithms (random forest (RF), naïve Bayes (NB), support vector machine (SVM), logistic regression (LR), and K-nearest neighbors (KNN)) were applied to investigate the complete set of Q-CHAT items. Our results showed that ML achieved an overall accuracy of 90%, and the SVM was the most effective, being able to classify autism with 95% accuracy. Furthermore, using the SVM–recursive feature elimination (RFE) approach, we selected a subset of 14 items ensuring 91% accuracy, while 83% accuracy was obtained from the 3 best discriminating items in common to ours and the previously reported Q-CHAT-10. This evidence confirms the high performance and cross-cultural validity of the Q-CHAT, and supports the application of ML to create shorter and faster versions of the instrument, maintaining high classification accuracy, to be used as a quick, easy, and high-performance tool in primary-care settings.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Nicholas Nuechterlein ◽  
Beibin Li ◽  
Abdullah Feroze ◽  
Eric C Holland ◽  
Linda Shapiro ◽  
...  

Abstract Background Combined whole-exome sequencing (WES) and somatic copy number alteration (SCNA) information can separate isocitrate dehydrogenase (IDH)1/2-wildtype glioblastoma into two prognostic molecular subtypes, which cannot be distinguished by epigenetic or clinical features. The potential for radiographic features to discriminate between these molecular subtypes has yet to be established. Methods Radiologic features (n = 35 340) were extracted from 46 multisequence, pre-operative magnetic resonance imaging (MRI) scans of IDH1/2-wildtype glioblastoma patients from The Cancer Imaging Archive (TCIA), all of whom have corresponding WES/SCNA data. We developed a novel feature selection method that leverages the structure of extracted MRI features to mitigate the dimensionality challenge posed by the disparity between a large number of features and the limited patients in our cohort. Six traditional machine learning classifiers were trained to distinguish molecular subtypes using our feature selection method, which was compared to least absolute shrinkage and selection operator (LASSO) feature selection, recursive feature elimination, and variance thresholding. Results We were able to classify glioblastomas into two prognostic subgroups with a cross-validated area under the curve score of 0.80 (±0.03) using ridge logistic regression on the 15-dimensional principle component analysis (PCA) embedding of the features selected by our novel feature selection method. An interrogation of the selected features suggested that features describing contours in the T2 signal abnormality region on the T2-weighted fluid-attenuated inversion recovery (FLAIR) MRI sequence may best distinguish these two groups from one another. Conclusions We successfully trained a machine learning model that allows for relevant targeted feature extraction from standard MRI to accurately predict molecularly-defined risk-stratifying IDH1/2-wildtype glioblastoma patient groups.


2011 ◽  
Vol 268-270 ◽  
pp. 697-700
Author(s):  
Rui Xue Duan ◽  
Xiao Jie Wang ◽  
Wen Feng Li

As the volume of online short text documents grow tremendously on the Internet, it is much more urgent to solve the task of organizing the short texts well. However, the traditional feature selection methods cannot suitable for the short text. In this paper, we proposed a method to incorporate syntactic information for the short text. It emphasizes the feature which has more dependency relations with other words. The classifier SVM and machine learning environment Weka are involved in our experiments. The experiment results show that incorporate syntactic information in the short text, we can get more powerful features than traditional feature selection methods, such as DF, CHI. The precision of short text classification improved from 86.2% to 90.8%.


Sign in / Sign up

Export Citation Format

Share Document