Identification of Amaranthus Species Using Visible-Near-Infrared (Vis-NIR) Spectroscopy and Machine Learning Methods

Soo-In Sohn; Young-Ju Oh; Subramani Pandian; Yong-Ho Lee; John-Lewis Zinia Zaukuu; Hyeon-Jung Kang; Tae-Hun Ryu; Woo-Suk Cho; Youn-Sung Cho; Eun-Kyoung Shin

doi:10.3390/rs13204149

Identification of Amaranthus Species Using Visible-Near-Infrared (Vis-NIR) Spectroscopy and Machine Learning Methods

Remote Sensing ◽

10.3390/rs13204149 ◽

2021 ◽

Vol 13 (20) ◽

pp. 4149

Author(s):

Soo-In Sohn ◽

Young-Ju Oh ◽

Subramani Pandian ◽

Yong-Ho Lee ◽

John-Lewis Zinia Zaukuu ◽

...

Keyword(s):

Machine Learning ◽

Classification Accuracy ◽

Near Infrared ◽

Cross Validation ◽

Nir Spectroscopy ◽

Support Vector ◽

Adaxial Side ◽

Learning Methods ◽

Machine Learning Methods ◽

Geographical Regions

The feasibility of rapid and non-destructive classification of six different Amaranthus species was investigated using visible-near-infrared (Vis-NIR) spectra coupled with chemometric approaches. The focus of this research would be to use a handheld spectrometer in the field to classify six Amaranthus sp. in different geographical regions of South Korea. Spectra were obtained from the adaxial side of the leaves at 1.5 nm intervals in the Vis-NIR spectral range between 400 and 1075 nm. The obtained spectra were assessed with four different preprocessing methods in order to detect the optimum preprocessing method with high classification accuracy. Preprocessed spectra of six Amaranthus sp. were used as input for the machine learning-based chemometric analysis. All the classification results were validated using cross-validation to produce robust estimates of classification accuracies. The different combinations of preprocessing and modeling were shown to have a classification accuracy of between 71% and 99.7% after the cross-validation. The combination of Savitzky-Golay preprocessing and Support vector machine showed a maximum mean classification accuracy of 99.7% for the discrimination of Amaranthus sp. Considering the high number of spectra involved in this study, the growth stage of the plants, varying measurement locations, and the scanning position of leaves on the plant are all important. We conclude that Vis-NIR spectroscopy, in combination with appropriate preprocessing and machine learning methods, may be used in the field to effectively classify Amaranthus sp. for the effective management of the weedy species and/or for monitoring their food applications.

Download Full-text

Predictions of chalcospinels with composition ABCX4 (X – S or Se)

Perspektivnye Materialy ◽

10.30791/1028-978x-2020-7-5-18 ◽

2020 ◽

pp. 5-18

Author(s):

N. N. Kiselyova ◽

◽

V. A. Dudarev ◽

V. V. Ryazanov ◽

O. V. Sen’ko ◽

...

Keyword(s):

Machine Learning ◽

Crystal Lattice ◽

Prediction Accuracy ◽

Cross Validation ◽

Chemical Elements ◽

Optical Memory ◽

Support Vector ◽

Learning Methods ◽

Linear Discriminant ◽

Machine Learning Methods

New chalcospinels of the most common compositions were predicted: AIBIIICIVX4 (X — S or Se) and AIIBIIICIIIS4 (A, B, and C are various chemical elements). They are promising for the search for new materials for magneto-optical memory elements, sensors and anodes in sodium-ion batteries. The parameter “a” values of their crystal lattice are estimated. When predicting only the values of chemical elements properties were used. The calculations were carried out using machine learning programs that are part of the information-analytical system developed by the authors (various ensembles of algorithms of: the binary decision trees, the linear machine, the search for logical regularities of classes, the support vector machine, Fisher linear discriminant, the k-nearest neighbors, the learning a multilayer perceptron and a neural network), — for predicting chalcospinels not yet obtained, as well as an extensive family of regression methods, presented in the scikit-learn package for the Python language, and multilevel machine learning methods that were proposed by the authors — for estimation of the new chalcospinels lattice parameter value). The prediction accuracy of new chalcospinels according to the results of the cross-validation is not lower than 80%, and the prediction accuracy of the parameter of their crystal lattice (according to the results of calculating the mean absolute error (when cross-validation in the leave-one-out mode)) is ± 0.1 Å. The effectiveness of using multilevel machine learning methods to predict the physical properties of substances was shown.

Download Full-text

Application of Near-Infrared Hyperspectral Imaging with Machine Learning Methods to Identify Geographical Origins of Dry Narrow-Leaved Oleaster (Elaeagnus angustifolia) Fruits

Foods ◽

10.3390/foods8120620 ◽

2019 ◽

Vol 8 (12) ◽

pp. 620 ◽

Cited By ~ 3

Author(s):

Pan Gao ◽

Wei Xu ◽

Tianying Yan ◽

Chu Zhang ◽

Xin Lv ◽

...

Keyword(s):

Machine Learning ◽

Hyperspectral Imaging ◽

Near Infrared ◽

Support Vector ◽

Elaeagnus Angustifolia ◽

Learning Methods ◽

Machine Learning Methods ◽

Discriminant Models ◽

Second Derivative Spectra ◽

Origin Identification

Narrow-leaved oleaster (Elaeagnus angustifolia) fruit is a kind of natural product used as food and traditional medicine. Narrow-leaved oleaster fruits from different geographical origins vary in chemical and physical properties and differ in their nutritional and commercial values. In this study, near-infrared hyperspectral imaging covering the spectral range of 874–1734 nm was used to identify the geographical origins of dry narrow-leaved oleaster fruits with machine learning methods. Average spectra of each single narrow-leaved oleaster fruit were extracted. Second derivative spectra were used to identify effective wavelengths. Partial least squares discriminant analysis (PLS-DA) and support vector machine (SVM) were used to build discriminant models for geographical origin identification using full spectra and effective wavelengths. In addition, deep convolutional neural network (CNN) models were built using full spectra and effective wavelengths. Good classification performances were obtained by these three models using full spectra and effective wavelengths, with classification accuracy of the calibration, validation, and prediction set all over 90%. Models using effective wavelengths obtained close results to models using full spectra. The performances of the PLS-DA, SVM, and CNN models were close. The overall results illustrated that near-infrared hyperspectral imaging coupled with machine learning could be used to trace geographical origins of dry narrow-leaved oleaster fruits.

Download Full-text

Fusion of Multispectral Aerial Imagery and Vegetation Indices for Machine Learning-Based Ground Classification

Remote Sensing ◽

10.3390/rs13081411 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1411

Author(s):

Yanchao Zhang ◽

Wen Yang ◽

Ying Sun ◽

Christine Chang ◽

Jiya Yu ◽

...

Keyword(s):

Machine Learning ◽

Classification Accuracy ◽

Vegetation Indices ◽

Machine Learning Algorithms ◽

Support Vector ◽

Multispectral Images ◽

Learning Methods ◽

Machine Learning Methods ◽

Spectral Bands ◽

Vegetation Indexes

Unmanned Aerial Vehicles (UAVs) are emerging and promising platforms for carrying different types of cameras for remote sensing. The application of multispectral vegetation indices for ground cover classification has been widely adopted and has proved its reliability. However, the fusion of spectral bands and vegetation indices for machine learning-based land surface investigation has hardly been studied. In this paper, we studied the fusion of spectral bands information from UAV multispectral images and derived vegetation indices for almond plantation classification using several machine learning methods. We acquired multispectral images over an almond plantation using a UAV. First, a multispectral orthoimage was generated from the acquired multispectral images using SfM (Structure from Motion) photogrammetry methods. Eleven types of vegetation indexes were proposed based on the multispectral orthoimage. Then, 593 data points that contained multispectral bands and vegetation indexes were randomly collected and prepared for this study. After comparing six machine learning algorithms (Support Vector Machine, K-Nearest Neighbor, Linear Discrimination Analysis, Decision Tree, Random Forest, and Gradient Boosting), we selected three (SVM, KNN, and LDA) to study the fusion of multi-spectral bands information and derived vegetation index for classification. With the vegetation indexes increased, the model classification accuracy of all three selected machine learning methods gradually increased, then dropped. Our results revealed that that: (1) spectral information from multispectral images can be used for machine learning-based ground classification, and among all methods, SVM had the best performance; (2) combination of multispectral bands and vegetation indexes can improve the classification accuracy comparing to only spectral bands among all three selected methods; (3) among all VIs, NDEGE, NDVIG, and NDVGE had consistent performance in improving classification accuracies, and others may reduce the accuracy. Machine learning methods (SVM, KNN, and LDA) can be used for classifying almond plantation using multispectral orthoimages, and fusion of multispectral bands with vegetation indexes can improve machine learning-based classification accuracy if the vegetation indexes are properly selected.

Download Full-text

Comprehensive assessment of machine learning-based methods for predicting antimicrobial peptides

Briefings in Bioinformatics ◽

10.1093/bib/bbab083 ◽

2021 ◽

Author(s):

Jing Xu ◽

Fuyi Li ◽

André Leier ◽

Dongxu Xiang ◽

Hsin-Hui Shen ◽

...

Keyword(s):

Machine Learning ◽

Antimicrobial Peptides ◽

Computational Methods ◽

Cross Validation ◽

Predictive Performance ◽

Support Vector ◽

Data Sets ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods

Abstract Antimicrobial peptides (AMPs) are a unique and diverse group of molecules that play a crucial role in a myriad of biological processes and cellular functions. AMP-related studies have become increasingly popular in recent years due to antimicrobial resistance, which is becoming an emerging global concern. Systematic experimental identification of AMPs faces many difficulties due to the limitations of current methods. Given its significance, more than 30 computational methods have been developed for accurate prediction of AMPs. These approaches show high diversity in their data set size, data quality, core algorithms, feature extraction, feature selection techniques and evaluation strategies. Here, we provide a comprehensive survey on a variety of current approaches for AMP identification and point at the differences between these methods. In addition, we evaluate the predictive performance of the surveyed tools based on an independent test data set containing 1536 AMPs and 1536 non-AMPs. Furthermore, we construct six validation data sets based on six different common AMP databases and compare different computational methods based on these data sets. The results indicate that amPEPpy achieves the best predictive performance and outperforms the other compared methods. As the predictive performances are affected by the different data sets used by different methods, we additionally perform the 5-fold cross-validation test to benchmark different traditional machine learning methods on the same data set. These cross-validation results indicate that random forest, support vector machine and eXtreme Gradient Boosting achieve comparatively better performances than other machine learning methods and are often the algorithms of choice of multiple AMP prediction tools.

Download Full-text

Identifying Cancer Targets Based on Machine Learning Methods via Chou’s 5-steps Rule and General Pseudo Components

Current Topics in Medicinal Chemistry ◽

10.2174/1568026619666191016155543 ◽

2019 ◽

Vol 19 (25) ◽

pp. 2301-2317 ◽

Cited By ~ 2

Author(s):

Ruirui Liang ◽

Jiayang Xie ◽

Chi Zhang ◽

Mengying Zhang ◽

Hai Huang ◽

...

Keyword(s):

Machine Learning ◽

Growth Rate ◽

Big Data ◽

Human Genome Project ◽

Genome Project ◽

Support Vector ◽

Successful Implementation ◽

Learning Methods ◽

Machine Learning Methods ◽

Vector Machines

In recent years, the successful implementation of human genome project has made people realize that genetic, environmental and lifestyle factors should be combined together to study cancer due to the complexity and various forms of the disease. The increasing availability and growth rate of ‘big data’ derived from various omics, opens a new window for study and therapy of cancer. In this paper, we will introduce the application of machine learning methods in handling cancer big data including the use of artificial neural networks, support vector machines, ensemble learning and naïve Bayes classifiers.

Download Full-text

Integration of transcriptomic data identifies key hallmark genes in hypertrophic cardiomyopathy

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-02147-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Jing Xu ◽

Xiangdong Liu ◽

Qiming Dai

Keyword(s):

Machine Learning ◽

Hypertrophic Cardiomyopathy ◽

Heart Diseases ◽

Expression Patterns ◽

Support Vector ◽

Rna Seq ◽

Ppi Network ◽

Learning Methods ◽

Transcriptomic Data ◽

Machine Learning Methods

Abstract Background Hypertrophic cardiomyopathy (HCM) represents one of the most common inherited heart diseases. To identify key molecules involved in the development of HCM, gene expression patterns of the heart tissue samples in HCM patients from multiple microarray and RNA-seq platforms were investigated. Methods The significant genes were obtained through the intersection of two gene sets, corresponding to the identified differentially expressed genes (DEGs) within the microarray data and within the RNA-Seq data. Those genes were further ranked using minimum-Redundancy Maximum-Relevance feature selection algorithm. Moreover, the genes were assessed by three different machine learning methods for classification, including support vector machines, random forest and k-Nearest Neighbor. Results Outstanding results were achieved by taking exclusively the top eight genes of the ranking into consideration. Since the eight genes were identified as candidate HCM hallmark genes, the interactions between them and known HCM disease genes were explored through the protein–protein interaction (PPI) network. Most candidate HCM hallmark genes were found to have direct or indirect interactions with known HCM diseases genes in the PPI network, particularly the hub genes JAK2 and GADD45A. Conclusions This study highlights the transcriptomic data integration, in combination with machine learning methods, in providing insight into the key hallmark genes in the genetic etiology of HCM.

Download Full-text

Machine Learning Methods Applied to the Prediction of Pseudo-nitzschia spp. Blooms in the Galician Rias Baixas (NW Spain)

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040199 ◽

2021 ◽

Vol 10 (4) ◽

pp. 199

Author(s):

Francisco M. Bellas Aláez ◽

Jesus M. Torres Palenzuela ◽

Evangelos Spyrakos ◽

Luis González Vilas

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Prediction Models ◽

Support Vector ◽

False Alarms ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Rías Baixas ◽

New Algorithms

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.

Download Full-text

Predicting plaque vulnerability change using intravascular ultrasound + optical coherence tomography image-based fluid–structure interaction models and machine learning methods with patient follow-up data: a feasibility study

BioMedical Engineering OnLine ◽

10.1186/s12938-021-00868-6 ◽

2021 ◽

Vol 20 (1) ◽

Author(s):

Xiaoya Guo ◽

Akiko Maehara ◽

Mitsuaki Matsumura ◽

Liang Wang ◽

Jie Zheng ◽

...

Keyword(s):

Machine Learning ◽

Coronary Plaque ◽

Support Vector ◽

Single Factor ◽

Plaque Vulnerability ◽

Learning Methods ◽

Machine Learning Methods ◽

Vulnerability Prediction ◽

Biomechanical Factors

Abstract Background Coronary plaque vulnerability prediction is difficult because plaque vulnerability is non-trivial to quantify, clinically available medical image modality is not enough to quantify thin cap thickness, prediction methods with high accuracies still need to be developed, and gold-standard data to validate vulnerability prediction are often not available. Patient follow-up intravascular ultrasound (IVUS), optical coherence tomography (OCT) and angiography data were acquired to construct 3D fluid–structure interaction (FSI) coronary models and four machine-learning methods were compared to identify optimal method to predict future plaque vulnerability. Methods Baseline and 10-month follow-up in vivo IVUS and OCT coronary plaque data were acquired from two arteries of one patient using IRB approved protocols with informed consent obtained. IVUS and OCT-based FSI models were constructed to obtain plaque wall stress/strain and wall shear stress. Forty-five slices were selected as machine learning sample database for vulnerability prediction study. Thirteen key morphological factors from IVUS and OCT images and biomechanical factors from FSI model were extracted from 45 slices at baseline for analysis. Lipid percentage index (LPI), cap thickness index (CTI) and morphological plaque vulnerability index (MPVI) were quantified to measure plaque vulnerability. Four machine learning methods (least square support vector machine, discriminant analysis, random forest and ensemble learning) were employed to predict the changes of three indices using all combinations of 13 factors. A standard fivefold cross-validation procedure was used to evaluate prediction results. Results For LPI change prediction using support vector machine, wall thickness was the optimal single-factor predictor with area under curve (AUC) 0.883 and the AUC of optimal combinational-factor predictor achieved 0.963. For CTI change prediction using discriminant analysis, minimum cap thickness was the optimal single-factor predictor with AUC 0.818 while optimal combinational-factor predictor achieved an AUC 0.836. Using random forest for predicting MPVI change, minimum cap thickness was the optimal single-factor predictor with AUC 0.785 and the AUC of optimal combinational-factor predictor achieved 0.847. Conclusion This feasibility study demonstrated that machine learning methods could be used to accurately predict plaque vulnerability change based on morphological and biomechanical factors from multi-modality image-based FSI models. Large-scale studies are needed to verify our findings.

Download Full-text

Early warning of citric acid overdose and timely adjustment of regional citrate anticoagulation based on machine learning methods

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01489-8 ◽

2021 ◽

Vol 21 (S2) ◽

Author(s):

Huan Chen ◽

Yingying Ma ◽

Na Hong ◽

Hao Wang ◽

Longxiang Su ◽

...

Keyword(s):

Machine Learning ◽

Citric Acid ◽

Early Warning ◽

Regional Citrate Anticoagulation ◽

Support Vector ◽

Learning Methods ◽

Citrate Anticoagulation ◽

Replacement Fluid ◽

Machine Learning Methods ◽

Neutral Networks

Abstract Background Regional citrate anticoagulation (RCA) is an important local anticoagulation method during bedside continuous renal replacement therapy. To improve patient safety and achieve computer assisted dose monitoring and control, we took intensive care units patients into cohort and aiming at developing a data-driven machine learning model to give early warning of citric acid overdose and provide adjustment suggestions on citrate pumping rate and 10% calcium gluconate input rate for RCA treatment. Methods Patient age, gender, pumped citric acid dose value, 5% NaHCO3 solvent, replacement fluid solvent, body temperature value, and replacement fluid PH value as clinical features, models attempted to classify patients who received regional citrate anticoagulation into correct outcome category. Four models, Adaboost, XGBoost, support vector machine (SVM) and shallow neural network, were compared on the performance of predicting outcomes. Prediction results were evaluated using accuracy, precision, recall and F1-score. Results For classifying patients at the early stages of citric acid treatment, the accuracy of neutral networks model is higher than Adaboost, XGBoost and SVM, the F1-score of shallow neutral networks (90.77%) is overall outperformed than other models (88.40%, 82.17% and 88.96% for Adaboost, XGBoost and SVM). Extended experiment and validation were further conducted using the MIMIC-III database, the F1-scores for shallow neutral networks, Adaboost, XGBoost and SVM are 80.00%, 80.46%, 80.37% and 78.90%, the AUCs are 0.8638, 0.8086, 0.8466 and 0.7919 respectively. Conclusion The results of this study demonstrated the feasibility and performance of machine learning methods for monitoring and adjusting local regional citrate anticoagulation, and further provide decision-making recommendations to clinicians point-of-care.

Download Full-text

Prediction of Hanwoo Cattle Phenotypes from Genotypes Using Machine Learning Methods

Animals ◽

10.3390/ani11072066 ◽

2021 ◽

Vol 11 (7) ◽

pp. 2066

Author(s):

Swati Srivastava ◽

Bryan Irvine Lopez ◽

Himansu Kumar ◽

Myoungjin Jang ◽

Han-Ha Chai ◽

...

Keyword(s):

Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Eye Muscle ◽

Important Species ◽

Machine Learning Methods ◽

Extreme Gradient Boosting ◽

Boosting Method ◽

Predictive Correlation ◽

Hanwoo Cattle

Hanwoo was originally raised for draft purposes, but the increase in local demand for red meat turned that purpose into full-scale meat-type cattle rearing; it is now considered one of the most economically important species and a vital food source for Koreans. The application of genomic selection in Hanwoo breeding programs in recent years was expected to lead to higher genetic progress. However, better statistical methods that can improve the genomic prediction accuracy are required. Hence, this study aimed to compare the predictive performance of three machine learning methods, namely, random forest (RF), extreme gradient boosting method (XGB), and support vector machine (SVM), when predicting the carcass weight (CWT), marbling score (MS), backfat thickness (BFT) and eye muscle area (EMA). Phenotypic and genotypic data (53,866 SNPs) from 7324 commercial Hanwoo cattle that were slaughtered at the age of around 30 months were used. The results showed that the boosting method XGB showed the highest predictive correlation for CWT and MS, followed by GBLUP, SVM, and RF. Meanwhile, the best predictive correlation for BFT and EMA was delivered by GBLUP, followed by SVM, RF, and XGB. Although XGB presented the highest predictive correlations for some traits, we did not find an advantage of XGB or any machine learning methods over GBLUP according to the mean squared error of prediction. Thus, we still recommend the use of GBLUP in the prediction of genomic breeding values for carcass traits in Hanwoo cattle.

Download Full-text