Randomized lasso associates freshwater lake-system specific bacterial taxa with heterotrophic production through flow cytometry

Mapping Intimacies ◽

10.1101/392852 ◽

2018 ◽

Author(s):

Peter Rubbens ◽

Marian L. Schmidt ◽

Ruben Props ◽

Bopaiah A. Biddanda ◽

Nico Boon ◽

...

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

Cell Density ◽

Functional Groups ◽

Ecosystem Functioning ◽

Rrna Gene ◽

Freshwater Lakes ◽

Bacterial Physiology ◽

Lake System ◽

Heterotrophic Production

AbstractHigh-(HNA) and low-nucleic acid (LNA) bacteria are two operational groups identified by flow cytometry (FCM) in aquatic systems. HNA cell density often correlates strongly with heterotrophic production, while LNA cell density does not. However, which taxa are specifically associated with these groups, and by extension, productivity has remained elusive. Here, we addressed this knowledge gap by using a machine learning-based variable selection approach that integrated FCM and 16S rRNA gene sequencing data collected from 14 freshwater lakes spanning a broad range in physicochemical conditions. There was a strong association between bacterial heterotrophic production and HNA absolute cell abundances (R2= 0.65), but not with the more abundant LNA cells. This solidifies findings, mainly from marine systems, that HNA and LNA could be considered separate functional groups, the former contributing a disproportionately large share of carbon cycling. Taxa selected by the models could predict HNA and LNA absolute cell abundances at all taxonomic levels, with the highest performance at the OTU level. Selected OTUs ranged from low to high relative abundance and were mostly lake system-specific (89.5%-99.2%). A subset of selected OTUs was associated with both LNA and HNA groups (12.5%-33.3%) suggesting either phenotypic plasticity or within-OTU genetic and physiological heterogeneity. These findings may lead to the identification of systems-specific putative ecological indicators for heterotrophic productivity. Generally, our approach allows for the association of OTUs with specific functional groups in diverse ecosystems in order to improve our understanding of (microbial) biodiversity-ecosystem functioning relationships.ImportanceA major goal in microbial ecology is to understand how microbial community structure influences ecosystem functioning. Research is limited by the ability to readily culture most bacteria present in the environment and the difference in bacterial physiologyin situcompared to in laboratory culture. Various methods to directly associate bacterial taxa to functional groups in the environment are being developed. In this study, we applied machine learning methods to relate taxonomic data obtained from marker gene surveys to functional groups identified by flow cytometry. This allowed us to identify the taxa that are associated with heterotrophic productivity in freshwater lakes and indicated that the key contributors were highly system-specific, regularly rare members of the community, and that some could switch between being low and high contributors. Our approach provides a promising framework to identify taxa that contribute to ecosystem functioning and can be further developed to explore microbial contributions beyond heterotrophic production.

Download Full-text

Randomized Lasso Links Microbial Taxa with Aquatic Functional Groups Inferred from Flow Cytometry

mSystems ◽

10.1128/msystems.00093-19 ◽

2019 ◽

Vol 4 (5) ◽

Cited By ~ 2

Author(s):

Peter Rubbens ◽

Marian L. Schmidt ◽

Ruben Props ◽

Bopaiah A. Biddanda ◽

Nico Boon ◽

...

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

Nucleic Acid ◽

Cell Density ◽

Functional Groups ◽

Ecosystem Functioning ◽

Marker Gene ◽

Rrna Gene ◽

Freshwater Lakes ◽

Heterotrophic Production

ABSTRACT High-nucleic-acid (HNA) and low-nucleic-acid (LNA) bacteria are two operational groups identified by flow cytometry (FCM) in aquatic systems. A number of reports have shown that HNA cell density correlates strongly with heterotrophic production, while LNA cell density does not. However, which taxa are specifically associated with these groups, and by extension, productivity has remained elusive. Here, we addressed this knowledge gap by using a machine learning-based variable selection approach that integrated FCM and 16S rRNA gene sequencing data collected from 14 freshwater lakes spanning a broad range in physicochemical conditions. There was a strong association between bacterial heterotrophic production and HNA absolute cell abundances (R2 = 0.65), but not with the more abundant LNA cells. This solidifies findings, mainly from marine systems, that HNA and LNA bacteria could be considered separate functional groups, the former contributing a disproportionately large share of carbon cycling. Taxa selected by the models could predict HNA and LNA absolute cell abundances at all taxonomic levels. Selected operational taxonomic units (OTUs) ranged from low to high relative abundance and were mostly lake system specific (89.5% to 99.2%). A subset of selected OTUs was associated with both LNA and HNA groups (12.5% to 33.3%), suggesting either phenotypic plasticity or within-OTU genetic and physiological heterogeneity. These findings may lead to the identification of system-specific putative ecological indicators for heterotrophic productivity. Generally, our approach allows for the association of OTUs with specific functional groups in diverse ecosystems in order to improve our understanding of (microbial) biodiversity-ecosystem functioning relationships. IMPORTANCE A major goal in microbial ecology is to understand how microbial community structure influences ecosystem functioning. Various methods to directly associate bacterial taxa to functional groups in the environment are being developed. In this study, we applied machine learning methods to relate taxonomic data obtained from marker gene surveys to functional groups identified by flow cytometry. This allowed us to identify the taxa that are associated with heterotrophic productivity in freshwater lakes and indicated that the key contributors were highly system specific, regularly rare members of the community, and that some could possibly switch between being low and high contributors. Our approach provides a promising framework to identify taxa that contribute to ecosystem functioning and can be further developed to explore microbial contributions beyond heterotrophic production.

Download Full-text

Quantifying cell densities and biovolumes of phytoplankton communities and functional groups using scanning flow cytometry, machine learning and unsupervised clustering

10.1101/274357 ◽

2018 ◽

Author(s):

Mridul K. Thomas ◽

Simone Fontana ◽

Marta Reyes ◽

Francesco Pomati

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

Functional Groups ◽

Clustering Algorithm ◽

Unsupervised Clustering ◽

Learning Tools ◽

Time Resolved ◽

Automated Method ◽

Phytoplankton Communities ◽

Cell Densities

AbstractScanning flow cytometry (SFCM) is characterized by the measurement of time-resolved pulses of fluorescence and scattering, enabling the high-throughput quantification of phytoplankton morphology and pigmentation. Quantifying variation at the single cell and colony level improves our ability to understand dynamics in natural communities. Automated high-frequency monitoring of these communities is presently limited by the absence of repeatable, rapid protocols to analyse SFCM datasets, where images of individual particles are not available. Here we demonstrate a repeatable, semi-automated method to (1) rapidly clean SFCM data from a phytoplankton community by removing signals that do not belong to live phytoplankton cells, (2) classify individual cells into trait clusters that correspond to functional groups, and (3) quantify the biovolumes of individual cells, the total biovolume of the whole community and the total biovolumes of the major functional groups. Our method involves the development of training datasets using lab cultures, the use of an unsupervised clustering algorithm to identify trait clusters, and machine learning tools (random forests) to (1) evaluate variable importance, (2) classify data points, and (3) estimate biovolumes of individual cells. We provide example datasets and R code for our analytical approach that can be adapted for analysis of datasets from other flow cytometers or scanning flow cytometers.

Download Full-text

Quantifying cell densities and biovolumes of phytoplankton communities and functional groups using scanning flow cytometry, machine learning and unsupervised clustering

PLoS ONE ◽

10.1371/journal.pone.0196225 ◽

2018 ◽

Vol 13 (5) ◽

pp. e0196225 ◽

Cited By ~ 11

Author(s):

Mridul K. Thomas ◽

Simone Fontana ◽

Marta Reyes ◽

Francesco Pomati

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

Functional Groups ◽

Unsupervised Clustering ◽

Phytoplankton Communities ◽

Cell Densities

Download Full-text

Semi-automated classification of colonial Microcystis by FlowCAM imaging flow cytometry in mesocosm experiment reveals high heterogeneity during seasonal bloom

Scientific Reports ◽

10.1038/s41598-021-88661-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yersultan Mirasbekov ◽

Adina Zhumakhanova ◽

Almira Zhantuyakova ◽

Kuanysh Sarkytbayev ◽

Dmitry V. Malashenkov ◽

...

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

Spatial Resolution ◽

Mesocosm Experiment ◽

Imaging Flow Cytometry ◽

Leibler Divergence ◽

Temporal And Spatial ◽

High Level ◽

Training Sets

AbstractA machine learning approach was employed to detect and quantify Microcystis colonial morphospecies using FlowCAM-based imaging flow cytometry. The system was trained and tested using samples from a long-term mesocosm experiment (LMWE, Central Jutland, Denmark). The statistical validation of the classification approaches was performed using Hellinger distances, Bray–Curtis dissimilarity, and Kullback–Leibler divergence. The semi-automatic classification based on well-balanced training sets from Microcystis seasonal bloom provided a high level of intergeneric accuracy (96–100%) but relatively low intrageneric accuracy (67–78%). Our results provide a proof-of-concept of how machine learning approaches can be applied to analyze the colonial microalgae. This approach allowed to evaluate Microcystis seasonal bloom in individual mesocosms with high level of temporal and spatial resolution. The observation that some Microcystis morphotypes completely disappeared and re-appeared along the mesocosm experiment timeline supports the hypothesis of the main transition pathways of colonial Microcystis morphoforms. We demonstrated that significant changes in the training sets with colonial images required for accurate classification of Microcystis spp. from time points differed by only two weeks due to Microcystis high phenotypic heterogeneity during the bloom. We conclude that automatic methods not only allow a performance level of human taxonomist, and thus be a valuable time-saving tool in the routine-like identification of colonial phytoplankton taxa, but also can be applied to increase temporal and spatial resolution of the study.

Download Full-text

New interpretable machine learning method for single-cell data reveals correlates of clinical response to cancer immunotherapy

10.1101/702118 ◽

2019 ◽

Cited By ~ 3

Author(s):

Evan Greene ◽

Greg Finak ◽

Leonard A. D’Amico ◽

Nina Bhardwaj ◽

Candice D. Church ◽

...

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

T Cell ◽

Single Cell ◽

Cancer Immunotherapy ◽

Effector Memory ◽

Machine Learning Method ◽

Learning Method ◽

Modeling Framework ◽

Interpretable Machine Learning

AbstractHigh-dimensional single-cell cytometry is routinely used to characterize patient responses to cancer immunotherapy and other treatments. This has produced a wealth of datasets ripe for exploration but whose biological and technical heterogeneity make them difficult to analyze with current tools. We introduce a new interpretable machine learning method for single-cell mass and flow cytometry studies, FAUST, that robustly performs unbiased cell population discovery and annotation. FAUST processes data on a per-sample basis and returns biologically interpretable cell phenotypes that can be compared across studies, making it well-suited for the analysis and integration of complex datasets. We demonstrate how FAUST can be used for candidate biomarker discovery and validation by applying it to a flow cytometry dataset from a Merkel cell carcinoma anti-PD-1 trial and discover new CD4+ and CD8+ effector-memory T cell correlates of outcome co-expressing PD-1, HLA-DR, and CD28. We then use FAUST to validate these correlates in an independent CyTOF dataset from a published metastatic melanoma trial. Importantly, existing state-of-the-art computational discovery approaches as well as prior manual analysis did not detect these or any other statistically significant T cell sub-populations associated with anti-PD-1 treatment in either data set. We further validate our methodology by using FAUST to replicate the discovery of a previously reported myeloid correlate in a different published melanoma trial, and validate the correlate by identifying it de novo in two additional independent trials. FAUST’s phenotypic annotations can be used to perform cross-study data integration in the presence of heterogeneous data and diverse immunophenotyping staining panels, enabling hypothesis-driven inference about cell sub-population abundance through a multivariate modeling framework we call Phenotypic and Functional Differential Abundance (PFDA). We demonstrate this approach on data from myeloid and T cell panels across multiple trials. Together, these results establish FAUST as a powerful and versatile new approach for unbiased discovery in single-cell cytometry.

Download Full-text

Infinity Flow: High-Throughput Single-Cell Quantification of 100s of Proteins Using Conventional Flow Cytometry and Machine Learning

SSRN Electronic Journal ◽

10.2139/ssrn.3656603 ◽

2020 ◽

Author(s):

Etienne Becht ◽

Daniel Tolstrup ◽

Charles-Antoine Dutertre ◽

Florent Ginhoux ◽

Evan W. Newell ◽

...

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

Single Cell ◽

High Throughput ◽

Cell Quantification

Download Full-text

Functional Group Identification for FTIR Spectra Using Image-Based Machine Learning Models

10.26434/chemrxiv.14188679 ◽

2021 ◽

Author(s):

Abigail Enders ◽

Nicole North ◽

Chase Fensore ◽

Juan Velez-Alvarez ◽

Heather Allen

Keyword(s):

Machine Learning ◽

Gas Phase ◽

Functional Groups ◽

Functional Group ◽

Spectroscopic Method ◽

Group Identification ◽

Spectroscopic Technique ◽

Ftir Spectra ◽

Spectral Interpretation ◽

Time Required

<p>Fourier Transform Infrared Spectroscopy (FTIR) is a ubiquitous spectroscopic technique. Spectral interpretation is a time-consuming process, but it yields important information about functional groups present in compounds and in complex substances. We develop a generalizable model via a machine learning (ML) algorithm using Convolutional Neural Networks (CNNs) to identify the presence of functional groups in gas phase FTIR spectra. The ML models will reduce the amount of time required to analyze functional groups and facilitate interpretation of FTIR spectra. Through web scraping, we acquire intensity-frequency data from 8728 gas phase organic molecules within the NIST spectral database and transform the data into images. We successfully train models for 15 of the most common organic functional groups, which we then determine via identification from previously untrained spectra. These models serve to expand the application of FTIR measurements for facile analysis of organic samples. Our approach was done such that we have broad functional group models that inference in tandem to provide full interpretation of a spectrum. We present the first implementation of ML using image-based CNNs for predicting functional groups from a spectroscopic method.</p>

Download Full-text

Infinity Flow: High-throughput single-cell quantification of 100s of proteins using conventional flow cytometry and machine learning

10.1101/2020.06.17.152926 ◽

2020 ◽

Author(s):

Etienne Becht ◽

Daniel Tolstrup ◽

Charles-Antoine Dutertre ◽

Florent Ginhoux ◽

Evan W. Newell ◽

...

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

Single Cell ◽

Low Cost ◽

Expression Patterns ◽

Cell Types ◽

Cellular Heterogeneity ◽

Supervised Machine Learning ◽

Melanoma Metastasis ◽

Immunologic Research

AbstractModern immunologic research increasingly requires high-dimensional analyses in order to understand the complex milieu of cell-types that comprise the tissue microenvironments of disease. To achieve this, we developed Infinity Flow combining hundreds of overlapping flow cytometry panels using machine learning to enable the simultaneous analysis of the co-expression patterns of 100s of surface-expressed proteins across millions of individual cells. In this study, we demonstrate that this approach allows the comprehensive analysis of the cellular constituency of the steady-state murine lung and to identify novel cellular heterogeneity in the lungs of melanoma metastasis bearing mice. We show that by using supervised machine learning, Infinity Flow enhances the accuracy and depth of clustering or dimensionality reduction algorithms. Infinity Flow is a highly scalable, low-cost and accessible solution to single cell proteomics in complex tissues.

Download Full-text

Feasibility study of stain-free classification of cell apoptosis based on diffraction imaging flow cytometry and supervised machine learning techniques

APOPTOSIS ◽

10.1007/s10495-018-1454-y ◽

2018 ◽

Vol 23 (5-6) ◽

pp. 290-298 ◽

Cited By ~ 4

Author(s):

Jingwen Feng ◽

Tong Feng ◽

Chengwen Yang ◽

Wei Wang ◽

Yu Sa ◽

...

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

Cell Apoptosis ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Imaging Flow Cytometry ◽

Learning Techniques ◽

Diffraction Imaging ◽

Free Classification

Download Full-text

Studying Z-ring formation following nucleoid partitioning in escherichia coli by microscopy flow-cytometry and machine learning

Advances in Robotics & Automation ◽

10.4172/2168-9695-c4-024 ◽

2018 ◽

Vol 07 ◽

Author(s):

Bilena Almeida ◽

Vatsala Chauhan ◽

Andre S Ribeiro

Keyword(s):

Machine Learning ◽

Escherichia Coli ◽

Flow Cytometry ◽

Ring Formation ◽

Z Ring

Download Full-text