scholarly journals scAlign: a tool for alignment, integration and rare cell identification from scRNA-seq data

2018 ◽  
Author(s):  
Nelson Johansen ◽  
Gerald Quon

AbstractscRNA-seq dataset integration occurs in different contexts, such as the identification of cell type-specific differences in gene expression across conditions or species, or batch effect correction. We present scAlign, an unsupervised deep learning method for data integration that can incorporate partial, overlapping or a complete set of cell labels, and estimate per-cell differences in gene expression across datasets. scAlign performance is state-of-the-art and robust to cross-dataset variation in cell type-specific expression and cell type composition. We demonstrate that scAlign identifies a rare cell population likely to drive malaria transmission. Our framework is widely applicable to integration challenges in other domains.

2020 ◽  
Author(s):  
Devanshi Patel ◽  
Xiaoling Zhang ◽  
John J. Farrell ◽  
Jaeyoon Chung ◽  
Thor D. Stein ◽  
...  

ABSTRACTBecause regulation of gene expression is heritable and context-dependent, we investigated AD-related gene expression patterns in cell-types in blood and brain. Cis-expression quantitative trait locus (eQTL) mapping was performed genome-wide in blood from 5,257 Framingham Heart Study (FHS) participants and in brain donated by 475 Religious Orders Study/Memory & Aging Project (ROSMAP) participants. The association of gene expression with genotypes for all cis SNPs within 1Mb of genes was evaluated using linear regression models for unrelated subjects and linear mixed models for related subjects. Cell type-specific eQTL (ct-eQTL) models included an interaction term for expression of “proxy” genes that discriminate particular cell type. Ct-eQTL analysis identified 11,649 and 2,533 additional significant gene-SNP eQTL pairs in brain and blood, respectively, that were not detected in generic eQTL analysis. Of note, 386 unique target eGenes of significant eQTLs shared between blood and brain were enriched in apoptosis and Wnt signaling pathways. Five of these shared genes are established AD loci. The potential importance and relevance to AD of significant results in myeloid cell-types is supported by the observation that a large portion of GWS ct-eQTLs map within 1Mb of established AD loci and 58% (23/40) of the most significant eGenes in these eQTLs have previously been implicated in AD. This study identified cell-type specific expression patterns for established and potentially novel AD genes, found additional evidence for the role of myeloid cells in AD risk, and discovered potential novel blood and brain AD biomarkers that highlight the importance of cell-type specific analysis.


2002 ◽  
Vol 324 (2) ◽  
pp. 101-104 ◽  
Author(s):  
Yoshiteru Urai ◽  
Osamu Jinnouchi ◽  
Kyung Tak Kwak ◽  
Atsuhiko Suzue ◽  
Shinji Nagahiro ◽  
...  

2018 ◽  
Author(s):  
Ken Jean-Baptiste ◽  
José L. McFaline-Figueroa ◽  
Cristina M. Alexandre ◽  
Michael W. Dorrity ◽  
Lauren Saunders ◽  
...  

ABSTRACTSingle-cell RNA-seq can yield high-resolution cell-type-specific expression signatures that reveal new cell types and the developmental trajectories of cell lineages. Here, we apply this approach toA. thalianaroot cells to capture gene expression in 3,121 root cells. We analyze these data with Monocle 3, which orders single cell transcriptomes in an unsupervised manner and uses machine learning to reconstruct single-cell developmental trajectories along pseudotime. We identify hundreds of genes with cell-type-specific expression, with pseudotime analysis of several cell lineages revealing both known and novel genes that are expressed along a developmental trajectory. We identify transcription factor motifs that are enriched in early and late cells, together with the corresponding candidate transcription factors that likely drive the observed expression patterns. We assess and interpret changes in total RNA expression along developmental trajectories and show that trajectory branch points mark developmental decisions. Finally, by applying heat stress to whole seedlings, we address the longstanding question of possible heterogeneity among cell types in the response to an abiotic stress. Although the response of canonical heat shock genes dominates expression across cell types, subtle but significant differences in other genes can be detected among cell types. Taken together, our results demonstrate that single-cell transcriptomics holds promise for studying plant development and plant physiology with unprecedented resolution.


2020 ◽  
Author(s):  
Abolfazl Doostparast Torshizi ◽  
Jubao Duan ◽  
Kai Wang

AbstractThe importance of cell type-specific gene expression in disease-relevant tissues is increasingly recognized in genetic studies of complex diseases. However, the vast majority of gene expression studies are conducted on bulk tissues, necessitating computational approaches to infer biological insights on cell type-specific contribution to diseases. Several computational methods are available for cell type deconvolution (that is, inference of cellular composition) from bulk RNA-Seq data, but cannot impute cell type-specific expression profiles. We hypothesize that with external prior information such as single cell RNA-seq (scRNA-seq) and population-wide expression profiles, it can be a computationally tractable and identifiable to estimate both cellular composition and cell type-specific expression from bulk RNA-Seq data. Here we introduce CellR, which addresses cross-individual gene expression variations by employing genome-wide tissue-wise expression signatures from GTEx to adjust the weights of cell-specific gene markers. It then transforms the deconvolution problem into a linear programming model while taking into account inter/intra cellular correlations, and uses a multi-variate stochastic search algorithm to estimate the expression level of each gene in each cell type. Extensive analyses on several complex diseases such as schizophrenia, Alzheimer’s disease, Huntington’s disease, and type 2 diabetes validated efficiency of CellR, while revealing how specific cell types contribute to different diseases. We conducted numerical simulations on human cerebellum to generate pseudo-bulk RNA-seq data and demonstrated its efficiency in inferring cell-specific expression profiles. Moreover, we inferred cell-specific expression levels from bulk RNA-seq data on schizophrenia and computed differentially expressed genes within certain cell types. Using predicted gene expression profile on excitatory neurons, we were able to reproduce our recently published findings on TCF4 being a master regulator in schizophrenia and showed how this gene and its targets are enriched in excitatory neurons. In summary, CellR compares favorably (both accuracy and stability of inference) against competing approaches on inferring cellular composition from bulk RNA-seq data, but also allows direct imputation of cell type-specific gene expression, opening new doors to re-analyze gene expression data on bulk tissues in complex diseases.


2020 ◽  
Author(s):  
Chong Jin ◽  
Mengjie Chen ◽  
Danyu Lin ◽  
Wei Sun

AbstractMost tissue samples are composed of different cell types. Differential expression analysis without accounting for cell type composition cannot separate the changes due to cell type composition or cell type-specific expression. We propose a new framework to address these limitations: Cell Type Aware analysis of RNA-seq (CARseq). After evaluating its performance in simulations, we apply CARseq to compare gene expression of schizophrenia/autism subjects versus controls. Our results show that these two neurodevelopmental disorders differ from each other in terms of cell type composition changes and differential expression associated with different types of neurotransmitter receptors. We also discover overlapping signals of differential expression in microglia, supporting the two diseases’ similarity through immune regulation.


2003 ◽  
Vol 2 (3) ◽  
pp. 627-637 ◽  
Author(s):  
Mineko Maeda ◽  
Haruyo Sakamoto ◽  
Negin Iranfar ◽  
Danny Fuller ◽  
Toshinari Maruo ◽  
...  

ABSTRACT We used microarrays carrying most of the genes that are developmentally regulated in Dictyostelium to discover those that are preferentially expressed in prestalk cells. Prestalk cells are localized at the front of slugs and play crucial roles in morphogenesis and slug migration. Using whole-mount in situ hybridization, we were able to verify 104 prestalk genes. Three of these were found to be expressed only in cells at the very front of slugs, the PstA cell type. Another 10 genes were found to be expressed in the small number of cells that form a central core at the anterior, the PstAB cell type. The rest of the prestalk-specific genes are expressed in PstO cells, which are found immediately posterior to PstA cells but anterior to 80% of the slug that consists of prespore cells. Half of these are also expressed in PstA cells. At later stages of development, the patterns of expression of a considerable number of these prestalk genes changes significantly, allowing us to further subdivide them. Some are expressed at much higher levels during culmination, while others are repressed. These results demonstrate the extremely dynamic nature of cell-type-specific expression in Dictyostelium and further define the changing physiology of the cell types. One of the signals that affect gene expression in PstO cells is the hexaphenone DIF-1. We found that expression of about half of the PstO-specific genes were affected in a mutant that is unable to synthesize DIF-1, while the rest appeared to be DIF independent. These results indicate that differentiation of some aspects of PstO cells can occur in the absence of DIF-1.


2018 ◽  
Author(s):  
Kai Kang ◽  
Qian Meng ◽  
Igor Shats ◽  
David M. Umbach ◽  
Melissa Li ◽  
...  

AbstractThe cell type composition of many biological tissues varies widely across samples. Such sample heterogeneity hampers efforts to probe the role of each cell type in the tissue microenvironment. Current approaches that address this issue have drawbacks. Cell sorting or single-cell based experimental techniques disrupt in situ interactions and alter physiological status of cells in tissues. Computational methods are flexible and promising; but they often estimate either sample-specific proportions of each cell type or cell-type-specific gene expression profiles, not both, by requiring the other as input. We introduce a computational Complete Deconvolution method that can estimate both sample-specific proportions of each cell type and cell-type-specific gene expression profiles simultaneously using bulk RNA-Seq data only (CDSeq). We assessed our method’s performance using several synthetic and experimental mixtures of varied but known cell type composition and compared its performance to the performance of two state-of-the art deconvolution methods on the same mixtures. The results showed CDSeq can estimate both sample-specific proportions of each component cell type and cell-typespecificgene expression profiles with high accuracy. CDSeq holds promise for computationally deciphering complex mixtures of cell types, each with differing expression profiles, using RNA-seq data measured in bulk tissue (MATLAB code is available at https://github.com/kkang7/CDSeq_011).


Sign in / Sign up

Export Citation Format

Share Document