Decomposing cell identity for transfer learning across cellular measurements, platforms, tissues, and species

Mapping Intimacies ◽

10.1101/395004 ◽

2018 ◽

Cited By ~ 6

Author(s):

Genevieve L. Stein-O’Brien ◽

Brian S. Clark ◽

Thomas Sherman ◽

Cristina Zibetti ◽

Qiwen Hu ◽

...

Keyword(s):

Transfer Learning ◽

Large Scale ◽

A Priori ◽

Cell Types ◽

Neurosecretory Cells ◽

Specific Cell ◽

Cell Identity ◽

Molecular Features ◽

Species Specific ◽

Meaningful Relationships

ABSTRACTNew approaches are urgently needed to glean biological insights from the vast amounts of single cell RNA sequencing (scRNA-Seq) data now being generated. To this end, we propose that cell identity should map to a reduced set of factors which will describe both exclusive and shared biology of individual cells, and that the dimensions which contain these factors reflect biologically meaningful relationships across different platforms, tissues and species. To find a robust set of dependent factors in large-scale scRNA- Seq data, we developed a Bayesian non-negative matrix factorization (NMF) algorithm, scCoGAPS. Application of scCoGAPS to scRNA-Seq data obtained over the course of mouse retinal development identified gene expression signatures for factors associated with specific cell types and continuous biological processes. To test whether these signatures are shared across diverse cellular contexts, we developed projectR to map biologically disparate datasets into the factors learned by scCoGAPS. Because projecting these dimensions preserve relative distances between samples, biologically meaningful relationships/factors will stratify new data consistent with their underlying processes, allowing labels or information from one dataset to be used for annotation of the other—a machine learning concept called transfer learning. Using projectR, data from multiple datasets was used to annotate latent spaces and reveal novel parallels between developmental programs in other tissues, species and cellular assays. Using this approach we are able to transfer cell type and state designations across datasets to rapidly annotate cellular features in a new dataset without a priori knowledge of their type, identify a species-specific signature of microglial cells, and identify a previously undescribed subpopulation of neurosecretory cells within the lung. Together, these algorithms define biologically meaningful dimensions of cellular identity, state, and trajectories that persist across technologies, molecular features, and species.GRAPHICAL ABSTRACT

Download Full-text

Histone Acetyltransferases and Stem Cell Identity

Cancers ◽

10.3390/cancers13102407 ◽

2021 ◽

Vol 13 (10) ◽

pp. 2407

Author(s):

Ruicen He ◽

Arthur Dantas ◽

Karl Riabowol

Keyword(s):

Stem Cells ◽

Stem Cell ◽

Cell Fate ◽

Epigenetic Modification ◽

Cell Types ◽

Histone Acetyltransferases ◽

Specific Cell ◽

Cell Fates ◽

Cell Identity ◽

Hematopoietic Stem

Acetylation of histones is a key epigenetic modification involved in transcriptional regulation. The addition of acetyl groups to histone tails generally reduces histone-DNA interactions in the nucleosome leading to increased accessibility for transcription factors and core transcriptional machinery to bind their target sequences. There are approximately 30 histone acetyltransferases and their corresponding complexes, each of which affect the expression of a subset of genes. Because cell identity is determined by gene expression profile, it is unsurprising that the HATs responsible for inducing expression of these genes play a crucial role in determining cell fate. Here, we explore the role of HATs in the maintenance and differentiation of various stem cell types. Several HAT complexes have been characterized to play an important role in activating genes that allow stem cells to self-renew. Knockdown or loss of their activity leads to reduced expression and or differentiation while particular HATs drive differentiation towards specific cell fates. In this study we review functions of the HAT complexes active in pluripotent stem cells, hematopoietic stem cells, muscle satellite cells, mesenchymal stem cells, neural stem cells, and cancer stem cells.

Download Full-text

Role of Different Sponge Cell Types in Species Specific Cell Aggregation

Nature New Biology ◽

10.1038/newbio230126b0 ◽

1971 ◽

Vol 230 (12) ◽

pp. 126-128 ◽

Cited By ~ 12

Author(s):

H. A. JOHN ◽

M. S. CAMPO ◽

A. M. MACKENZIE ◽

R. B. KEMP

Keyword(s):

Cell Aggregation ◽

Cell Types ◽

Specific Cell ◽

Sponge Cell ◽

Species Specific

Download Full-text

SCISSOR™: a single-cell inferred site-specific omics resource for tumor microenvironment association study

NAR Cancer ◽

10.1093/narcan/zcab037 ◽

2021 ◽

Vol 3 (3) ◽

Author(s):

Xiang Cui ◽

Fei Qin ◽

Xuanxuan Yu ◽

Feifei Xiao ◽

Guoshuai Cai

Keyword(s):

Tumor Microenvironment ◽

Single Cell ◽

Clinical Outcomes ◽

Large Scale ◽

Cell Types ◽

Cell Interaction ◽

Specific Cell ◽

Dynamic Visualization ◽

Tissue Specific ◽

Cell Composition

Abstract Tumor tissues are heterogeneous with different cell types in tumor microenvironment, which play an important role in tumorigenesis and tumor progression. Several computational algorithms and tools have been developed to infer the cell composition from bulk transcriptome profiles. However, they ignore the tissue specificity and thus a new resource for tissue-specific cell transcriptomic reference is needed for inferring cell composition in tumor microenvironment and exploring their association with clinical outcomes and tumor omics. In this study, we developed SCISSOR™ (https://thecailab.com/scissor/), an online open resource to fulfill that demand by integrating five orthogonal omics data of >6031 large-scale bulk samples, patient clinical outcomes and 451 917 high-granularity tissue-specific single-cell transcriptomic profiles of 16 cancer types. SCISSOR™ provides five major analysis modules that enable flexible modeling with adjustable parameters and dynamic visualization approaches. SCISSOR™ is valuable as a new resource for promoting tumor heterogeneity and tumor–tumor microenvironment cell interaction research, by delineating cells in the tissue-specific tumor microenvironment and characterizing their associations with tumor omics and clinical outcomes.

Download Full-text

Coexpression reveals conserved mechanisms of transcriptional cell identity

10.1101/2020.11.10.375758 ◽

2020 ◽

Cited By ~ 1

Author(s):

Megan Crow ◽

Hamsini Suresh ◽

John Lee ◽

Jesse Gillis

Keyword(s):

Gene Regulation ◽

Cell Types ◽

Regulatory Evolution ◽

Closely Related Species ◽

Cell Identity ◽

Gene Coexpression ◽

Evolution Of Gene Regulation ◽

Orthology Prediction ◽

Species Specific ◽

Coexpression Networks

ABSTRACTWhat makes a mouse a mouse, and not a hamster? The answer lies in the genome, and more specifically, in differences in gene regulation between the two organisms: where and when each gene is expressed. To quantify differences, a typical study will either compare functional genomics data from homologous tissues, limiting the approach to closely related species; or compare gene repertoires, limiting the resolution of the analysis to gross correlations between phenotypes and gene family size. As an alternative, gene coexpression networks provide a basis for studying the evolution of gene regulation without these constraints. By incorporating data from hundreds of independent experiments, meta-analytic coexpression networks reflect the convergent output of species-specific transcriptional regulation.In this work, we develop a measure of regulatory evolution based on gene coexpression. Comparing data from 14 species, we quantify the conservation of coexpression patterns 1) as a function of evolutionary time, 2) across orthology prediction algorithms, and 3) with reference to cell- and tissue-specificity. Strikingly, we uncover deeply conserved patterns of gradient-like expression across cell types from both the animal and plant kingdoms. These results suggest that ancient genes contribute to transcriptional cell identity through mechanisms that are independent of duplication and divergence.

Download Full-text

Bulk and Single-Cell Transcriptomics Identify Tobacco-Use Disparity in Lung Gene Expression of ACE2, the Receptor of 2019-nCov

10.20944/preprints202002.0051.v2 ◽

2020 ◽

Cited By ~ 6

Author(s):

Guoshuai Cai

Keyword(s):

Gene Expression ◽

Single Cell ◽

Large Scale ◽

Cell Types ◽

Smoking History ◽

Normal Lung ◽

Specific Cell ◽

Susceptible Population ◽

Former Smokers ◽

Ace2 Gene

In current severe global emergency situation of 2019-nCov outbreak, it is imperative to identify vulnerable and susceptible groups for effective protection and care. Recently, studies found that 2019-nCov and SARS-nCov share the same receptor, ACE2. In this study, we analyzed four large-scale bulk transcriptomic datasets of normal lung tissue and two single-cell transcriptomic datasets to investigate the disparities related to race, age, gender and smoking status in ACE2 gene expression and its distribution among cell types. We didn’t find significant disparities in ACE2 gene expression between racial groups (Asian vs Caucasian), age groups (>60 vs <60) or gender groups (male vs female). However, we observed significantly higher ACE2 gene expression in former smoker’s lung compared to non-smoker’s lung. Also, we found higher ACE2 gene expression in Asian current smokers compared to non-smokers but not in Caucasian current smokers, which may indicate an existence of gene-smoking interaction. In addition, we found that ACE2 gene is expressed in specific cell types related to smoking history and location. In bronchial epithelium, ACE2 is actively expressed in goblet cells of current smokers and club cells of non-smokers. In alveoli, ACE2 is actively expressed in remodelled AT2 cells of former smokers. Together, this study indicates that smokers especially former smokers may be more susceptible to 2019-nCov and have infection paths different with non-smokers. Thus, smoking history may provide valuable information in identifying susceptible population and standardizing treatment regimen.

Download Full-text

Altered cell and RNA isoform diversity in aging Down syndrome brains

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2114326118 ◽

2021 ◽

Vol 118 (47) ◽

pp. e2114326118

Author(s):

Carter R. Palmer ◽

Christine S. Liu ◽

William J. Romanow ◽

Ming-Hsiang Lee ◽

Jerold Chun

Keyword(s):

Down Syndrome ◽

Large Scale ◽

Cell Types ◽

Chromosome 21 ◽

Specific Cell ◽

Sequencing Technologies ◽

Isoform Diversity ◽

Long Read ◽

Single Nucleus ◽

Altered Cell

Down syndrome (DS), trisomy of human chromosome 21 (HSA21), is characterized by lifelong cognitive impairments and the development of the neuropathological hallmarks of Alzheimer’s disease (AD). The cellular and molecular modifications responsible for these effects are not understood. Here we performed single-nucleus RNA sequencing (snRNA-seq) employing both short- (Illumina) and long-read (Pacific Biosciences) sequencing technologies on a total of 29 DS and non-DS control prefrontal cortex samples. In DS, the ratio of inhibitory-to-excitatory neurons was significantly increased, which was not observed in previous reports examining sporadic AD. DS microglial transcriptomes displayed AD-related aging and activation signatures in advance of AD neuropathology, with increased microglial expression of C1q complement genes (associated with dendritic pruning) and the HSA21 transcription factor gene RUNX1. Long-read sequencing detected vast RNA isoform diversity within and among specific cell types, including numerous sequences that differed between DS and control brains. Notably, over 8,000 genes produced RNAs containing intra-exonic junctions, including amyloid precursor protein (APP) that had previously been associated with somatic gene recombination. These and related results illuminate large-scale cellular and transcriptomic alterations as features of the aging DS brain.

Download Full-text

Building an RNA Sequencing Transcriptome of the Central Nervous System

The Neuroscientist ◽

10.1177/1073858415610541 ◽

2016 ◽

Vol 22 (6) ◽

pp. 579-592 ◽

Cited By ~ 12

Author(s):

Xiaomin Dong ◽

Yanan You ◽

Jia Qian Wu

Keyword(s):

Gene Expression ◽

Central Nervous System ◽

Nervous System ◽

Rna Sequencing ◽

Large Scale ◽

Expression Profiles ◽

Cell Types ◽

Specific Cell ◽

Rna Seq ◽

The Central Nervous System

The composition and function of the central nervous system (CNS) is extremely complex. In addition to hundreds of subtypes of neurons, other cell types, including glia (astrocytes, oligodendrocytes, and microglia) and vascular cells (endothelial cells and pericytes) also play important roles in CNS function. Such heterogeneity makes the study of gene transcription in CNS challenging. Transcriptomic studies, namely the analyses of the expression levels and structures of all genes, are essential for interpreting the functional elements and understanding the molecular constituents of the CNS. Microarray has been a predominant method for large-scale gene expression profiling in the past. However, RNA-sequencing (RNA-Seq) technology developed in recent years has many advantages over microarrays, and has enabled building more quantitative, accurate, and comprehensive transcriptomes of the CNS and other systems. The discovery of novel genes, diverse alternative splicing events, and noncoding RNAs has remarkably expanded the complexity of gene expression profiles and will help us to understand intricate neural circuits. Here, we discuss the procedures and advantages of RNA-Seq technology in mammalian CNS transcriptome construction, and review the approaches of sample collection as well as recent progress in building RNA-Seq-based transcriptomes from tissue samples and specific cell types.

Download Full-text

Large-scale determination and characterization of cell type-specific regulatory elements in the human genome

10.1101/176602 ◽

2017 ◽

Author(s):

Can Wang ◽

Shihua Zhang

Keyword(s):

Histone Modifications ◽

Large Scale ◽

Chromatin Modification ◽

Cell Types ◽

Regulatory Elements ◽

Cell Type ◽

Cell Identity ◽

Functional Roles ◽

Cancer Occurrence ◽

Cell Type Specific

AbstractHistone modifications have been widely elucidated to play vital roles in gene regulation and cell identity. The Roadmap Epigenomics Consortium generated a reference catalogue of several key histone modifications across >100s of human cell types and tissues. Decoding these epigenomes into functional regulatory elements is a challenging task in computational biology. To this end, we adopted a differential chromatin modification analysis framework to comprehensively determine and characterize cell type-specific regulatory elements (CSREs) and their histone modification codes in the human epigenomes of five histone modifications across 127 tissues or cell types. The CSREs show significant relevance with cell type-specific biological functions and diseases and cell identity. Clustering of CSREs with their specificity signals reveals diverse histone codes, demonstrating the diversity of functional roles of CSREs within the same cell or tissue. Last but not least, dynamics of CSREs from close cell types or tissues can give a detailed view of developmental processes such as normal tissue development and cancer occurrence.

Download Full-text

ExpressHeart: Web Portal to Visualize Transcriptome Profiles of Non-Cardiomyocyte Cells

International Journal of Molecular Sciences ◽

10.3390/ijms22168943 ◽

2021 ◽

Vol 22 (16) ◽

pp. 8943

Author(s):

Gang Li ◽

Changfei Luan ◽

Yanhan Dong ◽

Yifang Xie ◽

Scott C. Zentz ◽

...

Keyword(s):

Heart Development ◽

Heart Diseases ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Cell Type ◽

Web Portal ◽

Molecular Features ◽

Species Specific ◽

Combine Information

Unveiling the molecular features in the heart is essential for the study of heart diseases. Non-cardiomyocytes (nonCMs) play critical roles in providing structural and mechanical support to the working myocardium. There is an increasing amount of single-cell RNA-sequencing (scRNA-seq) data characterizing the transcriptomic profiles of nonCM cells. However, no tool allows researchers to easily access the information. Thus, in this study, we develop an open-access web portal, ExpressHeart, to visualize scRNA-seq data of nonCMs from five laboratories encompassing three species. ExpressHeart enables comprehensive visualization of major cell types and subtypes in each study; visualizes gene expression in each cell type/subtype in various ways; and facilitates identifying cell-type-specific and species-specific marker genes. ExpressHeart also provides an interface to directly combine information across datasets, for example, generating lists of high confidence DEGs by taking the intersection across different datasets. Moreover, ExpressHeart performs comparisons across datasets. We show that some homolog genes (e.g., Mmp14 in mice and mmp14b in zebrafish) are expressed in different cell types between mice and zebrafish, suggesting different functions across species. We expect ExpressHeart to serve as a valuable portal for investigators, shedding light on the roles of genes on heart development in nonCM cells.

Download Full-text

CAGEd-oPOSSUM: motif enrichment analysis from CAGE-derived TSSs

10.1101/040667 ◽

2016 ◽

Cited By ~ 1

Author(s):

David J. Arenillas ◽

Alistair R.R. Forrest ◽

Hideya Kawaji ◽

Timo Lassman ◽

Wyeth W. Wasserman ◽

...

Keyword(s):

Large Scale ◽

Enrichment Analysis ◽

Cell Types ◽

Specific Cell ◽

Data Sets ◽

Transcription Start Sites ◽

Supplementary Material ◽

Supplementary Text ◽

Cap Analysis ◽

Genomic Regions

AbstractSummaryWith the emergence of large-scale Cap Analysis of Gene Expression (CAGE) data sets from individual labs and the FANTOM consortium, one can now analyze the cis-regulatory regions associated with gene transcription at an unprecedented level of refinement. By coupling transcription factor binding site (TFBS) enrichment analysis with CAGE-derived genomic regions, CAGEd-oPOSSUM can identify TFs that act as key regulators of genes involved in specific mammalian cell and tissue types. The webtool allows for the analysis of CAGE-derived transcription start sites (TSSs) either provided by the user or selected from ~1,300 mammalian samples from the FANTOM5 project with pre-computed TFBS predicted with JASPAR TF binding profiles. The tool helps power insights into the regulation of genes through the study of the specific usage of TSSs within specific cell types and/or under specific conditions.Availability and implementationThe CAGEd-oPOSUM web tool is implemented in Perl, MySQL, and Apache and is available at http://cagedop.cmmt.ubc.ca/CAGEd_oPOSSUM.Supporting InformationSupplementary Text, Figures, and Data are available online at bioRxiv.

Download Full-text