Machine learning based classification of cells into chronological stages using single-cell transcriptomics

Mapping Intimacies ◽

10.1101/303214 ◽

2018 ◽

Cited By ~ 1

Author(s):

Sumeet Pal Singh ◽

Sharan Janjuha ◽

Samata Chaudhuri ◽

Susanne Reinhardt ◽

Sevina Dietz ◽

...

Keyword(s):

Machine Learning ◽

Single Cell ◽

Predictive Power ◽

Single Cells ◽

Premature Aging ◽

Calorie Intake ◽

Learning Framework ◽

Pancreatic Cells ◽

Age Related ◽

The Impact

ABSTRACTAge-associated deterioration of cellular physiology leads to pathological conditions. The ability to detect premature aging could provide a window for preventive therapies against age-related diseases. However, the techniques for determining cellular age are limited, as they rely on a limited set of histological markers and lack predictive power. Here, we implement GERAS (GEnetic Reference for Age of Single-cell), a machine learning based framework capable of assigning individual cells to chronological stages based on their trans criptomes. GERAS displays greater than 90% accuracy in classifying the chronological stage of zebrafish and human pancreatic cells. The framework demonstrates robustness against biological and technical noise, as evaluated by its performance on independent samplings of single-cells. Additionally, GERAS determines the impact of differences in calorie intake and BMI on the aging of zebrafish and human pancreatic cells, respectively. We further harness the predictive power of GERAS to identify genome-wide molecular factors that correlate with aging. We show that one of these factors, junb, is necessary to maintain the proliferative state of juvenile beta-cells. Our results showcase the applicability of a machine learning framework to classify the chronological stage of heterogeneous cell populations, while enabling to detect pro-aging factors and candidate genes associated with aging.

Download Full-text

Quantitative insights into age-associated DNA-repair inefficiency in single cells

10.1101/628909 ◽

2019 ◽

Author(s):

Thomas Z. Young ◽

Ping Liu ◽

Murat Acar

Keyword(s):

Single Cell ◽

Single Cells ◽

Strand Break ◽

Cellular Aging ◽

Dsb Repair ◽

Phase Duration ◽

Age Related ◽

Repair Efficiency ◽

Single Strand Annealing ◽

The Impact

ABSTRACTThe double strand break (DSB) is a highly toxic form of DNA damage that is thought to be both a driver and consequence of age-related dysfunction. Although DSB repair is essential for a cell’s survival, little is known about how DSB repair mechanisms are affected by cellular age. Here we characterize the impact of cellular aging on the efficiency of single-strand annealing (SSA), a repair mechanism for DSBs occurring between direct repeats. Using a single-cell reporter of SSA repair, we measure SSA repair efficiency in young and old cells, and report a 23.4% decline in repair efficiency. This decline is not due to increased usage of non-homologous end joining (NHEJ). Instead, we identify increased G1-phase duration in old cells as a factor responsible for the decreased SSA repair efficiency. We further explore how SSA repair efficiency is affected by sequence heterology and find that heteroduplex rejection remains high in old cells. Our work provides novel quantitative insights into the links between cellular aging and DSB repair efficiency at single-cell resolution in replicatively aging cells.

Download Full-text

Morphodynamic signatures of MDA-MB-231 single cells and cell doublets undergoing invasion in confined microenvironments

Scientific Reports ◽

10.1038/s41598-021-85640-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Xingjian Zhang ◽

Trevor Chan ◽

Michael Mak

Keyword(s):

Single Cell ◽

Single Cells ◽

Leading Edge ◽

Cell Group ◽

Collective State ◽

Long Term Effects ◽

Cell Metastasis ◽

Geometric Confinement ◽

Short And Long Term ◽

The Impact

AbstractCancer cell metastasis is a major factor in cancer-related mortality. During the process of metastasis, cancer cells exhibit migratory phenotypes and invade through pores in the dense extracellular matrix. However, the characterization of morphological and subcellular features of cells in similar migratory phenotypes and the effects of geometric confinement on cell morphodynamics are not well understood. Here, we investigate the phenotypes of highly aggressive MDA-MB-231 cells in single cell and cell doublet (an initial and simplified collective state) forms in confined microenvironments. We group phenotypically similar single cells and cell doublets and characterize related morphological and subcellular features. We further detect two distinct migratory phenotypes, fluctuating and non-fluctuating, within the fast migrating single cell group. In addition, we demonstrate an increase in the number of protrusions formed at the leading edge of cells after invasion through geometric confinement. Finally, we track the short and long term effects of varied degrees of confinement on protrusion formation. Overall, our findings elucidate the underlying morphological and subcellular features associated with different single cell and cell doublet phenotypes and the impact of invasion through confined geometry on cell behavior.

Download Full-text

Quality assessment of single-cell RNA sequencing data by coverage skewness analysis

10.1101/2019.12.31.890269 ◽

2019 ◽

Author(s):

Imad Abugessaisa ◽

Shuhei Noguchi ◽

Melissa Cardon ◽

Akira Hasegawa ◽

Kazuhide Watanabe ◽

...

Keyword(s):

Quality Assessment ◽

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Assessment Method ◽

Poor Quality ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Gene Coverage ◽

The Impact

AbstractAnalysis and interpretation of single-cell RNA-sequencing (scRNA-seq) experiments are compromised by the presence of poor quality cells. For meaningful analyses, such poor quality cells should be excluded to avoid biases and large variation. However, no clear guidelines exist. We introduce SkewC, a novel quality-assessment method to identify poor quality single-cells in scRNA-seq experiments. The method is based on the assessment of gene coverage for each single cell and its skewness as a quality measure. To validate the method, we investigated the impact of poor quality cells on downstream analyses and compared biological differences between typical and poor quality cells. Moreover, we measured the ratio of intergenic expression, suggesting genomic contamination, and foreign organism contamination of single-cell samples. SkewC is tested in 37,993 single-cells generated by 15 scRNA-seq protocols. We envision SkewC as an indispensable QC method to be incorporated into scRNA-seq experiment to preclude the possibility of scRNA-seq data misinterpretation.

Download Full-text

Testing of solid oxide cells at high current densities

tm - Technisches Messen ◽

10.1515/teme-2021-0102 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

André Weber

Keyword(s):

Single Cell ◽

High Performance ◽

High Efficiency ◽

Electrochemical Impedance ◽

Single Cells ◽

Operating Conditions ◽

Solid Oxide ◽

Current Densities ◽

Cell Testing ◽

The Impact

Abstract Solid Oxide Cells (SOCs) have gained an increasing interest as electrochemical energy converters due to their high efficiency, fuel flexibility and ability of reversible fuel cell/electrolysis operation. During the development process as well as in quality assurance tests, the performance of single cells and cell stacks is commonly evaluated by means of current/voltage- (CV-) characteristics. Despite of the fact that the measurement of a CV-characteristic seems to be simple compared to more complex, dynamic methods as electrochemical impedance spectroscopy or current interrupt techniques, the resulting performance strongly depends on the test setup and the chosen operating conditions. In this paper, the impact of different single cell testing environments and operating conditions on the CV-characteristic of high performance cells is discussed. The influence of cell size, contacting and current collection, contact pressure, fuel flow rate and composition on the achievable cell performance is presented and limitations arising from the test bed and testing conditions will be pointed out. As today’s high performance cells are capable of delivering current densities of several ampere per cm2 a special emphasis will be laid on single cell testing in this current range.

Download Full-text

Dissecting heterogeneous cell populations across drug and disease conditions with PopAlign

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2005990117 ◽

2020 ◽

Vol 117 (46) ◽

pp. 28784-28794

Author(s):

Sisi Chen ◽

Paul Rivaud ◽

Jong H. Park ◽

Tiffany Tsou ◽

Emeric Charles ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Large Scale ◽

Probabilistic Models ◽

Single Cells ◽

Measurement Techniques ◽

Patient Specific ◽

Cell Populations ◽

Cell Measurement ◽

The Impact

Single-cell measurement techniques can now probe gene expression in heterogeneous cell populations from the human body across a range of environmental and physiological conditions. However, new mathematical and computational methods are required to represent and analyze gene-expression changes that occur in complex mixtures of single cells as they respond to signals, drugs, or disease states. Here, we introduce a mathematical modeling platform, PopAlign, that automatically identifies subpopulations of cells within a heterogeneous mixture and tracks gene-expression and cell-abundance changes across subpopulations by constructing and comparing probabilistic models. Probabilistic models provide a low-error, compressed representation of single-cell data that enables efficient large-scale computations. We apply PopAlign to analyze the impact of 40 different immunomodulatory compounds on a heterogeneous population of donor-derived human immune cells as well as patient-specific disease signatures in multiple myeloma. PopAlign scales to comparisons involving tens to hundreds of samples, enabling large-scale studies of natural and engineered cell populations as they respond to drugs, signals, or physiological change.

Download Full-text

Computation of Single-Cell Metabolite Distributions Using Mixture Models

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2020.614832 ◽

2020 ◽

Vol 8 ◽

Author(s):

Mona K. Tonn ◽

Philipp Thomas ◽

Mauricio Barahona ◽

Diego A. Oyarzún

Keyword(s):

Single Cell ◽

Mixture Models ◽

Single Cells ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Stochastic Simulations ◽

Deterministic Models ◽

Metabolic Heterogeneity ◽

Cell Expression ◽

The Impact

Metabolic heterogeneity is widely recognized as the next challenge in our understanding of non-genetic variation. A growing body of evidence suggests that metabolic heterogeneity may result from the inherent stochasticity of intracellular events. However, metabolism has been traditionally viewed as a purely deterministic process, on the basis that highly abundant metabolites tend to filter out stochastic phenomena. Here we bridge this gap with a general method for prediction of metabolite distributions across single cells. By exploiting the separation of time scales between enzyme expression and enzyme kinetics, our method produces estimates for metabolite distributions without the lengthy stochastic simulations that would be typically required for large metabolic models. The metabolite distributions take the form of Gaussian mixture models that are directly computable from single-cell expression data and standard deterministic models for metabolic pathways. The proposed mixture models provide a systematic method to predict the impact of biochemical parameters on metabolite distributions. Our method lays the groundwork for identifying the molecular processes that shape metabolic heterogeneity and its functional implications in disease.

Download Full-text

Single-cell analysis of HIV-1 transcriptional activity reveals expression of proviruses in expanded clones during ART

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1617961114 ◽

2017 ◽

Vol 114 (18) ◽

pp. E3659-E3668 ◽

Cited By ~ 70

Author(s):

Ann Wiegand ◽

Jonathan Spindler ◽

Feiyu F. Hong ◽

Wei Shao ◽

Joshua C. Cyktor ◽

...

Keyword(s):

Single Cell ◽

Mononuclear Cells ◽

Single Cells ◽

Constitutive Expression ◽

Multiple Time ◽

Hiv Rna ◽

Infected Cells ◽

The Impact ◽

Hiv 1

Little is known about the fraction of human immunodeficiency virus type 1 (HIV-1) proviruses that express unspliced viral RNA in vivo or about the levels of HIV RNA expression within single infected cells. We developed a sensitive cell-associated HIV RNA and DNA single-genome sequencing (CARD-SGS) method to investigate fractional proviral expression of HIV RNA (1.3-kb fragment of p6, protease, and reverse transcriptase) and the levels of HIV RNA in single HIV-infected cells from blood samples obtained from individuals with viremia or individuals on long-term suppressive antiretroviral therapy (ART). Spiking experiments show that the CARD-SGS method can detect a single cell expressing HIV RNA. Applying CARD-SGS to blood mononuclear cells in six samples from four HIV-infected donors (one with viremia and not on ART and three with viremia suppressed on ART) revealed that an average of 7% of proviruses (range: 2–18%) expressed HIV RNA. Levels of expression varied from one to 62 HIV RNA molecules per cell (median of 1). CARD-SGS also revealed the frequent expression of identical HIV RNA sequences across multiple single cells and across multiple time points in donors on suppressive ART consistent with constitutive expression of HIV RNA in infected cell clones. Defective proviruses were found to express HIV RNA at levels similar to those proviruses that had no obvious defects. CARD-SGS is a useful tool to characterize fractional proviral expression in single infected cells that persist despite ART and to assess the impact of experimental interventions on proviral populations and their expression.

Download Full-text

Automated annotation of rare-cell types from single-cell RNA-sequencing data through synthetic oversampling

10.1101/2021.01.20.427486 ◽

2021 ◽

Author(s):

Saptarshi Bej ◽

Anne-Marie Galow ◽

Robert David ◽

Markus Wolfien ◽

Olaf Wolkenhauer

Keyword(s):

Machine Learning ◽

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

Classification Problem ◽

Use Case ◽

Cell Capture ◽

Sequencing Data ◽

Rare Cells ◽

The Impact

AbstractThe research landscape of single-cell and single-nuclei RNA sequencing is evolving rapidly, and one area that is enabled by this technology, is the detection of rare cells. An automated, unbiased and accurate annotation of rare subpopulations is challenging. Once rare cells are identified in one dataset, it will usually be necessary to generate other datasets to enrich the analysis (e.g., with samples from other tissues). From a machine learning perspective, the challenge arises from the fact that rare cell subpopulations constitute an imbalanced classification problem.We here introduce a Machine Learning (ML)-based oversampling method that uses gene expression counts of already identified rare cells as an input to generate synthetic cells to then identify similar (rare) cells in other publicly available experiments. We utilize single-cell synthetic oversampling (sc-SynO), which is based on the Localized Random Affine Shadowsampling (LoRAS) algorithm. The algorithm corrects for the overall imbalance ratio of the minority and majority class.We demonstrate the effectiveness of the method for two independent use cases, each consisting of two published datasets. The first use case identifies cardiac glial cells in snRNA-Seq data (17 nuclei out of 8,635). This use case was designed to take a larger imbalance ratio (∼1 to 500) into account and only uses single-nuclei data. The second use case was designed to jointly use snRNA-Seq data and scRNA-Seq on a lower imbalance ratio (∼1 to 26) for the training step to likewise investigate the potential of the algorithm to consider both single cell capture procedures and the impact of “less” rare-cell types. For validation purposes, all datasets have also been analyzed in a traditional manner using common data analysis approaches, such as the Seurat3 workflow.Our algorithm identifies rare-cell populations with a high accuracy and low false positive detection rate. A striking benefit of our algorithm is that it can be readily implemented in other and existing workflows. The code basis is publicly available at FairdomHub (https://fairdomhub.org/assays/1368) and can easily be transferred to train other customized approaches.

Download Full-text

Development of a Novel Deep Transfer Learning Framework to Characterize Inter- and Intra-Tumor Heterogeneity in Myeloma Patients

Blood ◽

10.1182/blood-2019-130452 ◽

2019 ◽

Vol 134 (Supplement_1) ◽

pp. 3075-3075

Author(s):

Travis S Johnson ◽

Christina Y Yu ◽

Chuanpeng Dong ◽

Tongxin Wang ◽

Mohammad Issam Abu Zaid ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Profiling ◽

Transfer Learning ◽

Microarray Data ◽

Expression Profiling ◽

Research Funding ◽

Single Cells ◽

Learning Framework ◽

Patient Level

Background: Clonal heterogeneity is a known issue in multiple myeloma (MM) and the emergence of drug resistant clones is responsible for the incurability of the disease. Multiple studies of bulk CD138+ bone marrow samples have attempted to stratify MM patients into smaller, more distinct, patient risk groups based on molecular phenotypes. Recently, single cell RNA sequencing (scRNA-seq) technology has been applied in MM to identify cell clones. This leads to a new question: can we classify patients with scRNA-seq data guided by previously defined subtypes, and how do the single cell results correspond with the classification? Methods: We developed a novel, deep transfer learning framework to predict MM patient subtypes in patients with scRNA-seq based on patient classifications from microarray data. While the problem of scRNA-seq batch corrections has been intensively studied using transfer learning, there has been less work on similar comparisons between scRNA-seq and patient-level data. To address this issue, we utilized domain adaptation, a specific transfer learning approach, to combine scRNA-seq profiles and patient-level microarray data using a multitask learning framework. Figure 1 illustrates our computational framework. Its aim is to classify both cells and patients (with scRNA-seq data) according to patient level classifications derived from previous gene expression profiling studies for MM. Specifically, we adopted the 10-subtype classifications derived from microarray data1. Patients with scRNA-seq were summarized into a single vector by averaging gene counts across all the cells. Gene expression profiling data (including scRNA-seq and microarray) for MM patients from multiple studies were input into the transfer learning network consisting of 5 hidden layers. The last hidden layer was used to calculate the maximum mean discrepancy (MMD) between the patients from scRNA-seq and microarray to integrate the datasets. The datasets in this study are summarized in Table 1. Two microarray datasets (GSE19784, GSE2658) and one scRNA-seq dataset (GSE117156) were obtained from NCBI Gene Expression Omnibus. IUSM data were locally generated. One microarray and one scRNA-seq dataset were used in training and testing. GSE19784 was split into 80% training and 20% testing. GSE117156, due to the smaller sample size (11 patients), was split into 90% training and 10% testing. We ran 20 rounds of random cross validation using TensorFlow on a GTX1080 GPU. The expression profiles of patients and single cells from all datasets (GSE19784, GSE117156, GSE2658, IUSM) were input into the trained model after each round of cross validation to produce low-dimensional representations and predictions for each training, testing, and validation sample. Results: We found that our model was able to identify signals in the data based on expression profiles from patient-level and single cell data. The patient classification labels can be consistently reproduced in a held-out test set of patients as well as in a validation cohort of microarray data from 559 MM patients (GSE2658) and scRNA-seq from 4 MM patients from IUSM (Figure 2). These results show that the model can learn the subtypes across multiple datasets and platforms. The 4 IUSM patients tended to cluster similarly to their individual CD138+ cells after training, while GSE2658 patients still maintained some separation between MM subtype clusters (Figure 3). The single cells from our cohort of 4 patients did not necessarily classify to the same subtype as their patient. Conclusions: We found that a domain adaptive classifier can be trained across scRNA-seq and bulk gene expression profiling data from MM patients to integrate data and transfer knowledge. These models showed that single cells within a patient do not necessarily match the patient level molecular characteristics. Not surprisingly, similar results have been found in other cancer types2. As our novel framework is further refined and more patients are sequenced, we expect more unique insights into both inter- and intra-tumor MM heterogeneity. References: 1. Broyl A, Hose D, Lokhorst H, et al. Gene expression profiling for molecular classification of multiple myeloma in newly diagnosed patients. Blood. 2010;116(14):2543-2553. 2. Patel AP, Tirosh I, Trombetta JJ, et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science. 2014;344(6190):1396-1401. Disclosures Abonour: Celgene: Consultancy, Research Funding; BMS: Consultancy; Takeda: Consultancy, Research Funding; Janssen: Consultancy, Research Funding. Roodman:Amgen: Membership on an entity's Board of Directors or advisory committees.

Download Full-text

A machine learning framework to quantify and assess the impact of COVID-19 on the power sector: An Indian context

Advances in Applied Energy ◽

10.1016/j.adapen.2021.100078 ◽

2021 ◽

pp. 100078

Author(s):

Manu Suvarna ◽

Apoorva Katragadda ◽

Ziying Sun ◽

Yun Bin Choh ◽

Qianyu Chen ◽

...

Keyword(s):

Machine Learning ◽

Power Sector ◽

Learning Framework ◽

Indian Context ◽

The Impact

Download Full-text