High-throughput image segmentation and machine learning approaches in the plant sciences across multiple scales

Emerging Topics in Life Sciences ◽

10.1042/etls20200273 ◽

2021 ◽

Author(s):

Eli Buckner ◽

Haonan Tong ◽

Chanae Ottley ◽

Cranos Williams

Keyword(s):

Machine Learning ◽

Image Segmentation ◽

High Throughput ◽

Stress Responses ◽

High Performance ◽

Multiple Scales ◽

Cell Communication ◽

Plant Diseases ◽

Learning Approaches ◽

Plant Image

Agriculture has benefited greatly from the rise of big data and high-performance computing. The acquisition and analysis of data across biological scales have resulted in strategies modeling inter- actions between plant genotype and environment, models of root architecture that provide insight into resource utilization, and the elucidation of cell-to-cell communication mechanisms that are instrumental in plant development. Image segmentation and machine learning approaches for interpreting plant image data are among many of the computational methodologies that have evolved to address challenging agricultural and biological problems. These approaches have led to contributions such as the accelerated identification of gene that modulate stress responses in plants and automated high-throughput phenotyping for early detection of plant diseases. The continued acquisition of high throughput imaging across multiple biological scales provides opportunities to further push the boundaries of our understandings quicker than ever before. In this review, we explore the current state of the art methodologies in plant image segmentation and machine learning at the agricultural, organ, and cellular scales in plants. We show how the methodologies for segmentation and classification differ due to the diversity of physical characteristics found at these different scales. We also discuss the hardware technologies most commonly used at these different scales, the types of quantitative metrics that can be extracted from these images, and how the biological mechanisms by which plants respond to abiotic/biotic stresses or genotypic modifications can be extracted from these approaches.

Download Full-text

Environmental Sound Recognition on Embedded Systems: From FPGAs to TPUs

Electronics ◽

10.3390/electronics10212622 ◽

2021 ◽

Vol 10 (21) ◽

pp. 2622

Author(s):

Jurgen Vandendriessche ◽

Nick Wouters ◽

Bruno da Silva ◽

Mimoun Lamrini ◽

Mohamed Yassin Chkouri ◽

...

Keyword(s):

Machine Learning ◽

High Performance ◽

Machine Learning Techniques ◽

Sound Recognition ◽

Learning Approaches ◽

Environmental Sound ◽

Embedded Devices ◽

Power Efficient ◽

Computationally Intensive ◽

Environmental Sound Recognition

In recent years, Environmental Sound Recognition (ESR) has become a relevant capability for urban monitoring applications. The techniques for automated sound recognition often rely on machine learning approaches, which have increased in complexity in order to achieve higher accuracy. Nonetheless, such machine learning techniques often have to be deployed on resource and power-constrained embedded devices, which has become a challenge with the adoption of deep learning approaches based on Convolutional Neural Networks (CNNs). Field-Programmable Gate Arrays (FPGAs) are power efficient and highly suitable for computationally intensive algorithms like CNNs. By fully exploiting their parallel nature, they have the potential to accelerate the inference time as compared to other embedded devices. Similarly, dedicated architectures to accelerate Artificial Intelligence (AI) such as Tensor Processing Units (TPUs) promise to deliver high accuracy while achieving high performance. In this work, we evaluate existing tool flows to deploy CNN models on FPGAs as well as on TPU platforms. We propose and adjust several CNN-based sound classifiers to be embedded on such hardware accelerators. The results demonstrate the maturity of the existing tools and how FPGAs can be exploited to outperform TPUs.

Download Full-text

(Invited) Using Computational High-Throughput Screening and Machine Learning for the Data Design of High Performance Nanoporous Materials

ECS Meeting Abstracts ◽

10.1149/ma2020-01482712mtgabs ◽

2020 ◽

Vol MA2020-01 (48) ◽

pp. 2712-2712

Author(s):

Tom Kwong Woo ◽

Peter George Boyd

Keyword(s):

Machine Learning ◽

High Throughput ◽

High Throughput Screening ◽

High Performance ◽

Nanoporous Materials ◽

Data Design

Download Full-text

PARROT is a flexible recurrent neural network framework for analysis of large protein datasets

eLife ◽

10.7554/elife.70576 ◽

2021 ◽

Vol 10 ◽

Author(s):

Daniel Griffith ◽

Alex S Holehouse

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

High Throughput ◽

Recurrent Neural Network ◽

Transcriptional Activation ◽

Network Architecture ◽

Learning Approaches ◽

Large Protein ◽

Protein Datasets

The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.

Download Full-text

An Experimental Comparison between Deep Learning and Classical Machine Learning Approaches for Writer Identification in Medieval Documents

Journal of Imaging ◽

10.3390/jimaging6090089 ◽

2020 ◽

Vol 6 (9) ◽

pp. 89

Author(s):

Nicole Dalia Cilia ◽

Claudio De Stefano ◽

Francesco Fontanella ◽

Claudio Marrocco ◽

Mario Molinara ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

High Performance ◽

Ad Hoc ◽

Digital Images ◽

Experimental Comparison ◽

Learning Approaches ◽

Test Bed ◽

Ancient Manuscripts ◽

Ancient Documents

In the framework of palaeography, the availability of both effective image analysis algorithms, and high-quality digital images has favored the development of new applications for the study of ancient manuscripts and has provided new tools for decision-making support systems. The quality of the results provided by such applications, however, is strongly influenced by the selection of effective features, which should be able to capture the distinctive aspects to which the paleography expert is interested in. This process is very difficult to generalize due to the enormous variability in the type of ancient documents, produced in different historical periods with different languages and styles. The effect is that it is very difficult to define standard techniques that are general enough to be effectively used in any case, and this is the reason why ad-hoc systems, generally designed according to paleographers’ suggestions, have been designed for the analysis of ancient manuscripts. In recent years, there has been a growing scientific interest in the use of techniques based on deep learning (DL) for the automatic processing of ancient documents. This interest is not only due to their capability of designing high-performance pattern recognition systems, but also to their ability of automatically extracting features from raw data, without using any a priori knowledge. Moving from these considerations, the aim of this study is to verify if DL-based approaches may actually represent a general methodology for automatically designing machine learning systems for palaeography applications. To this purpose, we compared the performance of a DL-based approach with that of a “classical” machine learning one, in a particularly unfavorable case for DL, namely that of highly standardized schools. The rationale of this choice is to compare the obtainable results even when context information is present and discriminating: this information is ignored by DL approaches, while it is used by machine learning methods, making the comparison more significant. The experimental results refer to the use of a large sets of digital images extracted from an entire 12th-century Bibles, the “Avila Bible”. This manuscript, produced by several scribes who worked in different periods and in different places, represents a severe test bed to evaluate the efficiency of scribe identification systems.

Download Full-text

The Active Segmentation Platform for Microscopic Image Classification and Segmentation

Brain Sciences ◽

10.3390/brainsci11121645 ◽

2021 ◽

Vol 11 (12) ◽

pp. 1645

Author(s):

Sumit K. Vohra ◽

Dimiter Prodanov

Keyword(s):

Machine Learning ◽

Image Segmentation ◽

Image Classification ◽

Domain Knowledge ◽

Feature Space ◽

Ground Truth ◽

Classification Problem ◽

Data Sets ◽

Learning Approaches ◽

Data Set

Image segmentation still represents an active area of research since no universal solution can be identified. Traditional image segmentation algorithms are problem-specific and limited in scope. On the other hand, machine learning offers an alternative paradigm where predefined features are combined into different classifiers, providing pixel-level classification and segmentation. However, machine learning only can not address the question as to which features are appropriate for a certain classification problem. The article presents an automated image segmentation and classification platform, called Active Segmentation, which is based on ImageJ. The platform integrates expert domain knowledge, providing partial ground truth, with geometrical feature extraction based on multi-scale signal processing combined with machine learning. The approach in image segmentation is exemplified on the ISBI 2012 image segmentation challenge data set. As a second application we demonstrate whole image classification functionality based on the same principles. The approach is exemplified using the HeLa and HEp-2 data sets. Obtained results indicate that feature space enrichment properly balanced with feature selection functionality can achieve performance comparable to deep learning approaches. In summary, differential geometry can substantially improve the outcome of machine learning since it can enrich the underlying feature space with new geometrical invariant objects.

Download Full-text

High-performance data analytics of hybrid rocket fuel combustion data using different machine learning approaches

AIAA Scitech 2020 Forum ◽

10.2514/6.2020-1161 ◽

2020 ◽

Author(s):

Charlotte Debus ◽

Alexander Ruettgers ◽

Anna Petrarolo ◽

Mario Kobald ◽

Martin Siggel

Keyword(s):

Machine Learning ◽

Data Analytics ◽

High Performance ◽

Fuel Combustion ◽

Performance Data ◽

Learning Approaches ◽

Hybrid Rocket ◽

Rocket Fuel

Download Full-text

A Comparison of Machine Learning Approaches to Improve Free Topography Data for Flood Modelling

Remote Sensing ◽

10.3390/rs13020275 ◽

2021 ◽

Vol 13 (2) ◽

pp. 275

Author(s):

Michael Meadows ◽

Matthew Wilson

Keyword(s):

Neural Network ◽

Machine Learning ◽

Spatial Patterns ◽

Large Scale ◽

Multiple Scales ◽

Flood Hazard ◽

Training Data ◽

Learning Approaches ◽

Testing Dataset ◽

Topography Data

Given the high financial and institutional cost of collecting and processing accurate topography data, many large-scale flood hazard assessments continue to rely instead on freely-available global Digital Elevation Models, despite the significant vertical biases known to affect them. To predict (and thereby reduce) these biases, we apply a fully-convolutional neural network (FCN), a form of artificial neural network originally developed for image segmentation which is capable of learning from multi-variate spatial patterns at different scales. We assess its potential by training such a model on a wide variety of remote-sensed input data (primarily multi-spectral imagery), using high-resolution, LiDAR-derived Digital Terrain Models published by the New Zealand government as the reference topography data. In parallel, two more widely used machine learning models are also trained, in order to provide benchmarks against which the novel FCN may be assessed. We find that the FCN outperforms the other models (reducing root mean square error in the testing dataset by 71%), likely due to its ability to learn from spatial patterns at multiple scales, rather than only a pixel-by-pixel basis. Significantly for flood hazard modelling applications, corrections were found to be especially effective along rivers and their floodplains. However, our results also suggest that models are likely to be biased towards the land cover and relief conditions most prevalent in their training data, with further work required to assess the importance of limiting training data inputs to those most representative of the intended application area(s).

Download Full-text

Ensemble-AMPPred: Robust AMP Prediction and Recognition Using the Ensemble Learning Method with a New Hybrid Feature for Differentiating AMPs

Genes ◽

10.3390/genes12020137 ◽

2021 ◽

Vol 12 (2) ◽

pp. 137

Author(s):

Supatcha Lertampaiporn ◽

Tayvich Vorapreeda ◽

Apiradee Hongsthong ◽

Chinae Thammarongtham

Keyword(s):

Machine Learning ◽

High Performance ◽

Predictive Accuracy ◽

Antimicrobial Activities ◽

Ensemble Model ◽

Learning Approaches ◽

Ensemble Machine Learning ◽

Screening And Identification ◽

Feature Based ◽

Natural Peptides

Antimicrobial peptides (AMPs) are natural peptides possessing antimicrobial activities. These peptides are important components of the innate immune system. They are found in various organisms. AMP screening and identification by experimental techniques are laborious and time-consuming tasks. Alternatively, computational methods based on machine learning have been developed to screen potential AMP candidates prior to experimental verification. Although various AMP prediction programs are available, there is still a need for improvement to reduce false positives (FPs) and to increase the predictive accuracy. In this work, several well-known single and ensemble machine learning approaches have been explored and evaluated based on balanced training datasets and two large testing datasets. We have demonstrated that the developed program with various predictive models has high performance in differentiating between AMPs and non-AMPs. Thus, we describe the development of a program for the prediction and recognition of AMPs using MaxProbVote, which is an ensemble model. Moreover, to increase prediction efficiency, the ensemble model was integrated with a new hybrid feature based on logistic regression. The ensemble model integrated with the hybrid feature can effectively increase the prediction sensitivity of the developed program called Ensemble-AMPPred, resulting in overall improvements in terms of both sensitivity and specificity compared to those of currently available programs.

Download Full-text

Frontiers in the Solicitation of Machine Learning Approaches in Vegetable Science Research

Sustainability ◽

10.3390/su13158600 ◽

2021 ◽

Vol 13 (15) ◽

pp. 8600

Author(s):

Meenakshi Sharma ◽

Prashant Kaushik ◽

Aakash Chawade

Keyword(s):

Machine Learning ◽

Genome Sequencing ◽

Crop Production ◽

Seed Quality ◽

Raw Materials ◽

Science Research ◽

Machine Learning Algorithms ◽

Plant Diseases ◽

Vegetable Production ◽

Learning Approaches

Along with essential nutrients and trace elements, vegetables provide raw materials for the food processing industry. Despite this, plant diseases and unfavorable weather patterns continue to threaten the delicate balance between vegetable production and consumption. It is critical to utilize machine learning (ML) in this setting because it provides context for decision-making related to breeding goals. Cutting-edge technologies for crop genome sequencing and phenotyping, combined with advances in computer science, are currently fueling a revolution in vegetable science and technology. Additionally, various ML techniques such as prediction, classification, and clustering are frequently used to forecast vegetable crop production in the field. In the vegetable seed industry, machine learning algorithms are used to assess seed quality before germination and have the potential to improve vegetable production with desired features significantly; whereas, in plant disease detection and management, the ML approaches can improve decision-support systems that assist in converting massive amounts of data into valuable recommendations. On similar lines, in vegetable breeding, ML approaches are helpful in predicting treatment results, such as what will happen if a gene is silenced. Furthermore, ML approaches can be a saviour to insufficient coverage and noisy data generated using various omics platforms. This article examines ML models in the field of vegetable sciences, which encompasses breeding, biotechnology, and genome sequencing.

Download Full-text

Virtual-screening workflow tutorials and prospective results from the Teach-Discover-Treat competition 2014 against malaria

F1000Research ◽

10.12688/f1000research.11905.1 ◽

2017 ◽

Vol 6 ◽

pp. 1136 ◽

Cited By ~ 5

Author(s):

Sereina Riniker ◽

Gregory A. Landrum ◽

Floriane Montanari ◽

Santiago D. Villalba ◽

Julie Maier ◽

...

Keyword(s):

Machine Learning ◽

Virtual Screening ◽

High Throughput ◽

Learning Approaches ◽

High Throughput Screen ◽

Hit Rate

The first challenge in the 2014 competition launched by the Teach-Discover-Treat (TDT) initiative asked for the development of a tutorial for ligand-based virtual screening, based on data from a primary phenotypic high-throughput screen (HTS) against malaria. The resulting Workflows were applied to select compounds from a commercial database, and a subset of those were purchased and tested experimentally for anti-malaria activity. Here, we present the two most successful Workflows, both using machine-learning approaches, and report the results for the 114 compounds tested in the follow-up screen. Excluding the two known anti-malarials quinidine and amodiaquine and 31 compounds already present in the primary HTS, a high hit rate of 57% was found.

Download Full-text