Merging Mixture Components for Cell Population Identification in Flow Cytometry

Advances in Bioinformatics ◽

10.1155/2009/247646 ◽

2009 ◽

Vol 2009 ◽

pp. 1-12 ◽

Cited By ~ 55

Author(s):

Greg Finak ◽

Ali Bashashati ◽

Ryan Brinkman ◽

Raphaël Gottardo

Keyword(s):

Flow Cytometry ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Model Fit ◽

Flow Cytometry Data ◽

Bioconductor Project ◽

Distinct Cell ◽

Cell Subpopulations ◽

Fcm Analysis ◽

Selection Of

We present a framework for the identification of cell subpopulations in flow cytometry data based on merging mixture components using the flowClust methodology. We show that the cluster merging algorithm under our framework improves model fit and provides a better estimate of the number of distinct cell subpopulations than either Gaussian mixture models or flowClust, especially for complicated flow cytometry data distributions. Our framework allows the automated selection of the number of distinct cell subpopulations and we are able to identify cases where the algorithm fails, thus making it suitable for application in a high throughput FCM analysis pipeline. Furthermore, we demonstrate a method for summarizing complex merged cell subpopulations in a simple manner that integrates with the existing flowClust framework and enables downstream data analysis. We demonstrate the performance of our framework on simulated and real FCM data. The software is available in the flowMerge package through the Bioconductor project.

Download Full-text

PhenoGMM: Gaussian Mixture Modeling of Cytometry Data Quantifies Changes in Microbial Community Structure

mSphere ◽

10.1128/msphere.00530-20 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Peter Rubbens ◽

Ruben Props ◽

Frederiek-Maarten Kerckhof ◽

Nico Boon ◽

Willem Waegeman

Keyword(s):

Flow Cytometry ◽

Community Structure ◽

Microbial Community ◽

Microbial Community Structure ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Community Diversity ◽

Data Sets ◽

Natural Ecosystems ◽

Flow Cytometry Data

ABSTRACT Microbial flow cytometry can rapidly characterize the status of microbial communities. Upon measurement, large amounts of quantitative single-cell data are generated, which need to be analyzed appropriately. Cytometric fingerprinting approaches are often used for this purpose. Traditional approaches either require a manual annotation of regions of interest, do not fully consider the multivariate characteristics of the data, or result in many community-describing variables. To address these shortcomings, we propose an automated model-based fingerprinting approach based on Gaussian mixture models, which we call PhenoGMM. The method successfully quantifies changes in microbial community structure based on flow cytometry data, which can be expressed in terms of cytometric diversity. We evaluate the performance of PhenoGMM using data sets from both synthetic and natural ecosystems and compare the method with a generic binning fingerprinting approach. PhenoGMM supports the rapid and quantitative screening of microbial community structure and dynamics. IMPORTANCE Microorganisms are vital components in various ecosystems on Earth. In order to investigate the microbial diversity, researchers have largely relied on the analysis of 16S rRNA gene sequences from DNA. Flow cytometry has been proposed as an alternative technology to characterize microbial community diversity and dynamics. The technology enables a fast measurement of optical properties of individual cells. So-called fingerprinting techniques are needed in order to describe microbial community diversity and dynamics based on flow cytometry data. In this work, we propose a more advanced fingerprinting strategy based on Gaussian mixture models. We evaluated our workflow on data sets from both synthetic and natural ecosystems, illustrating its general applicability for the analysis of microbial flow cytometry data. PhenoGMM supports a rapid and quantitative analysis of microbial community structure using flow cytometry.

Download Full-text

Using a Genetic Algorithm for Selection of Starting Conditions for the EM Algorithm for Gaussian Mixture Models

Advances in Intelligent Systems and Computing - Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015 ◽

10.1007/978-3-319-26227-7_12 ◽

2016 ◽

pp. 125-134

Author(s):

Wojciech Kwedlo

Keyword(s):

Genetic Algorithm ◽

Em Algorithm ◽

Mixture Models ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

The Em Algorithm ◽

Starting Conditions ◽

Selection Of

Download Full-text

Automatic selection of ROIs in functional imaging using Gaussian mixture models

Neuroscience Letters ◽

10.1016/j.neulet.2009.05.039 ◽

2009 ◽

Vol 460 (2) ◽

pp. 108-111 ◽

Cited By ~ 34

Author(s):

J.M. Górriz ◽

A. Lassl ◽

J. Ramírez ◽

D. Salas-Gonzalez ◽

C.G. Puntonet ◽

...

Keyword(s):

Mixture Models ◽

Functional Imaging ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Automatic Selection ◽

Selection Of

Download Full-text

A Novel Automated Analysis Of Flow Cytometry Data Identifies Distinct Cell Signatures In The Immune System Of CLL Patients

Blood ◽

10.1182/blood.v122.21.2864.2864 ◽

2013 ◽

Vol 122 (21) ◽

pp. 2864-2864

Author(s):

Jens Rueter ◽

Vivek Philip ◽

Krishna Karuturi ◽

Zaher Oueida ◽

Margaret Chavaree ◽

...

Keyword(s):

Flow Cytometry ◽

Computational Methods ◽

Peripheral Blood ◽

Automated Analysis ◽

Cell Types ◽

Analysis Procedure ◽

Individual Sample ◽

Flow Cytometry Data ◽

Distinct Cell ◽

Common Cell

Abstract Introduction Recent developments of novel immunotherapeutic drugs have shown promising results for patients with hematologic malignancies, however, an unmet need for accurate and specific biomarkers persists. To address this need, we developed a novel integrative analysis procedure for the automated analysis of multidimensional flow cytometry data obtained from the peripheral blood of patients with chronic lymphocytic leukemia (CLL). State of the art flow cytometry analysis is accomplished by manual sequential segmentation, or gating, of cell populations based on similarities in fluorescence and light scatter characteristics through visualization of the data in one- or two-dimensional plots. This approach has a number of limitations, including the subjective nature of the gating and the inability to fully utilize the high-dimensional data. Recent efforts have produced sophisticated computational methods that overcome many of these limitations; however, these newer computational methods have not been rigorously tested in a clinical context and have focused on the rigorous and automated analysis of samples from individual patients, with substantially less effort towards the analysis of patient populations. The ultimate goal of our analysis is to develop computational approaches that will enable an identification of subsets of patients with distinct immunological markers. Methods We developed a novel analysis framework that facilitates automated identification of both common cell types and patient population subgroups, based on post-processing of individual sample analysis with the FLOCK program. FLOCK identifies clusters of putatively similar cells in an individual sample by multidimensional clustering of the fluorescence marker and light-scattering measurements. We developed a rigorous hierarchical clustering approach to identify common “cell signatures” across multiple patients. The cell signatures were then mapped back onto the individual patient samples and used in a second clustering that identified patient subgroups based on similar abundances of specific cell types. Results We used our analytic framework to analyze multidimensional flow cytometry data (26 cell surface markers in 4 different antibody cocktails) from peripheral blood specimens of a heterogeneous group of 55 CLL patients and 13 healthy controls. Our analysis revealed distinct differences between controls and CLL patients. Analyzing the non-malignant peripheral blood cell types, we were furthermore able to differentiate between distinct clinical subpopulations of patients (e.g. identify treatment-naïve patients from those that had previously undergone chemotherapy). Conclusion/Discussion Using a novel integrative analysis procedure to analyze complex flow cytometry data of the peripheral blood from CLL patients, we are able to identify distinct cell type distributions. We propose that this information is a marker for the overall health/disease status of the corresponding patient, and could ultimately be used for diagnosis, prognosis, and selection of optimal treatment. In the context of multiple novel treatment options for CLL patients, such a tool will be crucial for defining individual patient prognosis, and defining an accurately matched treatment plan. Disclosures: No relevant conflicts of interest to declare.

Download Full-text

Model-based cell clustering and population tracking for time-series flow cytometry data

10.1101/690081 ◽

2019 ◽

Author(s):

Kodai Minoura ◽

Ko Abe ◽

Yuka Maeda ◽

Hiroyoshi Nishikawa ◽

Teppei Shimamura

Keyword(s):

Flow Cytometry ◽

Time Series ◽

Population Dynamics ◽

Cell Population ◽

Mixture Distribution ◽

Gaussian Mixture ◽

Time Dependent ◽

Cell Populations ◽

Flow Cytometry Data ◽

Multivariate Gaussian

AbstractMotivationModern flow cytometry technology has enabled the simultaneous analysis of multiple cell markers at the single-cell level, and it is widely used in a broad field of research. The detection of cell populations in flow cytometry data has long been dependent on “manual gating” by visual inspection. Recently, numerous software have been developed for automatic, computationally guided detection of cell populations; however, they are not designed for time-series flow cytometry data. Time-series flow cytometry data are indispensable for investigating the dynamics of cell populations that could not be elucidated by static time-point analysis.Therefore, there is a great need for tools to systematically analyze time-series flow cytometry data.ResultsWe propose a simple and efficient statistical framework, named CYBERTRACK (CYtometry-Based Estimation and Reasoning for TRACKing cell populations), to perform clustering and cell population tracking for time-series flow cytometry data. CYBERTRACK assumes that flow cytometry data are generated from a multivariate Gaussian mixture distribution with its mixture proportion at the current time dependent on that at a previous timepoint. Using simulation data, we evaluate the performance of CYBERTRACK when estimating parameters for a multivariate Gaussian mixture distribution, tracking time-dependent transitions of mixture proportions, and detecting change-points in the overall mixture proportion. The CYBERTRACK performance is validated using two real flow cytometry datasets, which demonstrate that the population dynamics detected by CYBERTRACK are consistent with our prior knowledge of lymphocyte behavior.ConclusionsOur results indicate that CYBERTRACK offers better understandings of time-dependent cell population dynamics to cytometry users by systematically analyzing time-series flow cytometry data.

Download Full-text

FlowFP: A Bioconductor Package for Fingerprinting Flow Cytometric Data

Advances in Bioinformatics ◽

10.1155/2009/193947 ◽

2009 ◽

Vol 2009 ◽

pp. 1-11 ◽

Cited By ~ 26

Author(s):

Wade T. Rogers ◽

Herbert A. Holyst

Keyword(s):

Flow Cytometry ◽

Gaussian Mixture ◽

Empirical Modeling ◽

Data Sets ◽

Automated Classification ◽

Data Quality Control ◽

Computationally Efficient ◽

Flow Cytometry Data ◽

Flow Cytometric Data ◽

Modeling Software

A new software package called flowFP for the analysis of flow cytometry data is introduced. The package, which is tightly integrated with other Bioconductor software for analysis of flow cytometry, provides tools to transform raw flow cytometry data into a form suitable for direct input into conventional statistical analysis and empirical modeling software tools. The approach of flowFP is to generate a description of the multivariate probability distribution function of flow cytometry data in the form of a “fingerprint.” As such, it is independent of a presumptive functional form for the distribution, in contrast with model-based methods such as Gaussian Mixture Modeling. FlowFP is computationally efficient and able to handle extremely large flow cytometry data sets of arbitrary dimensionality. Algorithms and software implementation of the package are described. Use of the software is exemplified with applications to data quality control and to the automated classification of Acute Myeloid Leukemia.

Download Full-text

Fast Forward Feature Selection of Hyperspectral Images for Classification With Gaussian Mixture Models

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ◽

10.1109/jstars.2015.2441771 ◽

2015 ◽

Vol 8 (6) ◽

pp. 2824-2831 ◽

Cited By ~ 20

Author(s):

Mathieu Fauvel ◽

Clement Dechesne ◽

Anthony Zullo ◽

Frederic Ferraty

Keyword(s):

Feature Selection ◽

Mixture Models ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Hyperspectral Images ◽

Selection Of

Download Full-text

Unsupervised Selection of Optimal Operating Parameters for Visual Place Recognition Algorithms Using Gaussian Mixture Models

IEEE Robotics and Automation Letters ◽

10.1109/lra.2020.3043171 ◽

2021 ◽

Vol 6 (2) ◽

pp. 343-350

Author(s):

James Mount ◽

Ming Xu ◽

Les Dawes ◽

Michael Milford

Keyword(s):

Mixture Models ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Operating Parameters ◽

Place Recognition ◽

Optimal Operating ◽

Recognition Algorithms ◽

Visual Place Recognition ◽

Selection Of

Download Full-text

PhenoGMM: Gaussian mixture modelling of microbial cytometry data enables efficient predictions of biodiversity

10.1101/641464 ◽

2019 ◽

Cited By ~ 2

Author(s):

Peter Rubbens ◽

Ruben Props ◽

Frederiek-Maarten Kerckhof ◽

Nico Boon ◽

Willem Waegeman

Keyword(s):

Flow Cytometry ◽

Microbial Community ◽

16S Rrna ◽

16S Rrna Gene ◽

Microbial Diversity ◽

Amplicon Sequencing ◽

Gaussian Mixture ◽

Community Diversity ◽

Rrna Gene ◽

Flow Cytometry Data

AbstractMicrobial flow cytometry allows to rapidly characterize microbial communities. Recent research has demonstrated a moderate to strong connection between the cytometric diversity and taxonomic diversity based on 16S rRNA gene amplicon sequencing data. This creates the opportunity to integrate both types of data to study and predict the microbial community diversity in an automated and efficient way. However, microbial flow cytometry data results in a number of unique challenges that need to be addressed. The results of our work are threefold: i) We expand current microbial cytometry fingerprinting approaches by proposing and validating a model-based fingerprinting approach based upon Gaussian Mixture Models, which we called PhenoGMM. ii) We show that microbial diversity can be rapidly estimated by PhenoGMM. In combination with a supervised machine learning model, diversity estimations based on 16S rRNA gene amplicon sequencing data can be predicted. iii) We evaluate our method extensively by using multiple datasets from different ecosystems and compare its predictive power with a generic binning fingerprinting approach that is commonly used in microbial flow cytometry. These results demonstrate the strong connection between the genetic make-up of a microbial community and its phenotypic properties as measured by flow cytometry. Our workflow facilitates the study of microbial diversity and community dynamics using flow cytometry in a fast and quantitative way.ImportanceMicroorganisms are vital components in various ecoystems on Earth. In order to investigate the microbial diversity, researchers have largely relied on the analysis of 16S rRNA gene sequences from DNA. Flow cytometry has been proposed as an alternative technique to characterize microbial community diversity and dynamics. It is an optical technique, able to rapidly characterize a number of phenotypic properties of individual cells. So-called fingerprinting techniques are needed in order to describe microbial community diversity and dynamics based on flow cytometry data. In this work, we propose a more advanced fingerprinting strategy based on Gaussian Mixture Models. When samples have been analyzed by both flow cytometry and 16S rRNA gene amplicon sequencing, we show that supervised machine learning models can be used to find the relationship between the two types of data. We evaluate our workflow on datasets from different ecosystems, illustrating its general applicability for the analysisof microbial flow cytometry data. PhenoGMM facilitates the rapid characterization and predictive modelling of microbial diversity using flow cytometry.

Download Full-text

Model-based cell clustering and population tracking for time-series flow cytometry data

BMC Bioinformatics ◽

10.1186/s12859-019-3294-3 ◽

2019 ◽

Vol 20 (S23) ◽

Cited By ~ 2

Author(s):

Kodai Minoura ◽

Ko Abe ◽

Yuka Maeda ◽

Hiroyoshi Nishikawa ◽

Teppei Shimamura

Keyword(s):

Flow Cytometry ◽

Time Series ◽

Population Dynamics ◽

Cell Population ◽

Mixture Distribution ◽

Gaussian Mixture ◽

Time Dependent ◽

Cell Populations ◽

Flow Cytometry Data ◽

Multivariate Gaussian

Abstract Background Modern flow cytometry technology has enabled the simultaneous analysis of multiple cell markers at the single-cell level, and it is widely used in a broad field of research. The detection of cell populations in flow cytometry data has long been dependent on “manual gating” by visual inspection. Recently, numerous software have been developed for automatic, computationally guided detection of cell populations; however, they are not designed for time-series flow cytometry data. Time-series flow cytometry data are indispensable for investigating the dynamics of cell populations that could not be elucidated by static time-point analysis. Therefore, there is a great need for tools to systematically analyze time-series flow cytometry data. Results We propose a simple and efficient statistical framework, named CYBERTRACK (CYtometry-Based Estimation and Reasoning for TRACKing cell populations), to perform clustering and cell population tracking for time-series flow cytometry data. CYBERTRACK assumes that flow cytometry data are generated from a multivariate Gaussian mixture distribution with its mixture proportion at the current time dependent on that at a previous timepoint. Using simulation data, we evaluate the performance of CYBERTRACK when estimating parameters for a multivariate Gaussian mixture distribution, tracking time-dependent transitions of mixture proportions, and detecting change-points in the overall mixture proportion. The CYBERTRACK performance is validated using two real flow cytometry datasets, which demonstrate that the population dynamics detected by CYBERTRACK are consistent with our prior knowledge of lymphocyte behavior. Conclusions Our results indicate that CYBERTRACK offers better understandings of time-dependent cell population dynamics to cytometry users by systematically analyzing time-series flow cytometry data.

Download Full-text