bayNorm: Bayesian gene expression recovery, imputation and normalisation for single cell RNA-sequencing data

Mapping Intimacies ◽

10.1101/384586 ◽

2018 ◽

Cited By ~ 7

Author(s):

Wenhao Tang ◽

François Bertaux ◽

Philipp Thomas ◽

Claire Stefanelli ◽

Malika Saint ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Single Molecule ◽

Empirical Bayes ◽

Missing Values ◽

Likelihood Function ◽

Differential Expression Analysis ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Normalisation of single cell RNA sequencing (scRNA-seq) data is a prerequisite to their interpretation. The marked technical variability and high amounts of missing observations typical of scRNA-seq datasets make this task particularly challenging. Here, we introduce bayNorm, a novel Bayesian approach for scaling and inference of scRNA-seq counts. The method’s likelihood function follows a binomial model of mRNA capture, while priors are estimated from expression values across cells using an empirical Bayes approach. We demonstrate using publicly-available scRNA-seq datasets and simulated expression data that bayNorm allows robust imputation of missing values generating realistic transcript distributions that match single molecule FISH measurements. Moreover, by using priors informed by dataset structures, bayNorm improves accuracy and sensitivity of differential expression analysis and reduces batch effect compared to other existing methods. Altogether, bayNorm provides an efficient, integrated solution for global scaling normalisation, imputation and true count recovery of gene expression measurements from scRNA-seq data.

Download Full-text

bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btz726 ◽

2019 ◽

Cited By ~ 2

Author(s):

Wenhao Tang ◽

François Bertaux ◽

Philipp Thomas ◽

Claire Stefanelli ◽

Malika Saint ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Empirical Bayes ◽

Missing Values ◽

Likelihood Function ◽

Differential Expression Analysis ◽

Batch Effect ◽

Supplementary Information ◽

Single Cell Rna Sequencing

Abstract Motivation Normalization of single-cell RNA-sequencing (scRNA-seq) data is a prerequisite to their interpretation. The marked technical variability, high amounts of missing observations and batch effect typical of scRNA-seq datasets make this task particularly challenging. There is a need for an efficient and unified approach for normalization, imputation and batch effect correction. Results Here, we introduce bayNorm, a novel Bayesian approach for scaling and inference of scRNA-seq counts. The method’s likelihood function follows a binomial model of mRNA capture, while priors are estimated from expression values across cells using an empirical Bayes approach. We first validate our assumptions by showing this model can reproduce different statistics observed in real scRNA-seq data. We demonstrate using publicly available scRNA-seq datasets and simulated expression data that bayNorm allows robust imputation of missing values generating realistic transcript distributions that match single molecule fluorescence in situ hybridization measurements. Moreover, by using priors informed by dataset structures, bayNorm improves accuracy and sensitivity of differential expression analysis and reduces batch effect compared with other existing methods. Altogether, bayNorm provides an efficient, integrated solution for global scaling normalization, imputation and true count recovery of gene expression measurements from scRNA-seq data. Availability and implementation The R package ‘bayNorm’ is publishd on bioconductor at https://bioconductor.org/packages/release/bioc/html/bayNorm.html. The code for analyzing data in this article is available at https://github.com/WT215/bayNorm_papercode. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Accounting for technical noise in differential expression analysis of single-cell RNA sequencing data

Nucleic Acids Research ◽

10.1093/nar/gkx754 ◽

2017 ◽

Vol 45 (19) ◽

pp. 10978-10988 ◽

Cited By ~ 26

Author(s):

Cheng Jia ◽

Yu Hu ◽

Derek Kelly ◽

Junhyong Kim ◽

Mingyao Li ◽

...

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Sequencing Data ◽

Technical Noise ◽

Single Cell Rna Sequencing

Download Full-text

Single-Cell Transcriptome Analysis Reveals Dynamic Cell Populations and Differential Gene Expression Patterns in Control and Aneurysmal Human Aortic Tissue

Circulation ◽

10.1161/circulationaha.120.046528 ◽

2020 ◽

Vol 142 (14) ◽

pp. 1374-1388

Author(s):

Yanming Li ◽

Pingping Ren ◽

Ashley Dawson ◽

Hernan G. Vasquez ◽

Waleed Ageedi ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Aortic Wall ◽

Genome Wide Association ◽

Aortic Tissue ◽

Sequencing Data ◽

Genome Wide ◽

Single Cell Rna Sequencing ◽

Differential Gene

Background: Ascending thoracic aortic aneurysm (ATAA) is caused by the progressive weakening and dilatation of the aortic wall and can lead to aortic dissection, rupture, and other life-threatening complications. To improve our understanding of ATAA pathogenesis, we aimed to comprehensively characterize the cellular composition of the ascending aortic wall and to identify molecular alterations in each cell population of human ATAA tissues. Methods: We performed single-cell RNA sequencing analysis of ascending aortic tissues from 11 study participants, including 8 patients with ATAA (4 women and 4 men) and 3 control subjects (2 women and 1 man). Cells extracted from aortic tissue were analyzed and categorized with single-cell RNA sequencing data to perform cluster identification. ATAA-related changes were then examined by comparing the proportions of each cell type and the gene expression profiles between ATAA and control tissues. We also examined which genes may be critical for ATAA by performing the integrative analysis of our single-cell RNA sequencing data with publicly available data from genome-wide association studies. Results: We identified 11 major cell types in human ascending aortic tissue; the high-resolution reclustering of these cells further divided them into 40 subtypes. Multiple subtypes were observed for smooth muscle cells, macrophages, and T lymphocytes, suggesting that these cells have multiple functional populations in the aortic wall. In general, ATAA tissues had fewer nonimmune cells and more immune cells, especially T lymphocytes, than control tissues did. Differential gene expression data suggested the presence of extensive mitochondrial dysfunction in ATAA tissues. In addition, integrative analysis of our single-cell RNA sequencing data with public genome-wide association study data and promoter capture Hi-C data suggested that the erythroblast transformation-specific related gene( ERG ) exerts an important role in maintaining normal aortic wall function. Conclusions: Our study provides a comprehensive evaluation of the cellular composition of the ascending aortic wall and reveals how the gene expression landscape is altered in human ATAA tissue. The information from this study makes important contributions to our understanding of ATAA formation and progression.

Download Full-text

Lung transplantation for patients with severe COVID-19

Science Translational Medicine ◽

10.1126/scitranslmed.abe4282 ◽

2020 ◽

Vol 12 (574) ◽

pp. eabe4282 ◽

Cited By ~ 1

Author(s):

Ankit Bharat ◽

Melissa Querrey ◽

Nikolay S. Markov ◽

Samuel Kim ◽

Chitaru Kurihara ◽

...

Keyword(s):

Respiratory Failure ◽

Pulmonary Fibrosis ◽

Lung Transplantation ◽

Single Cell ◽

Rna Sequencing ◽

Lung Tissue ◽

Single Molecule ◽

Sequencing Data ◽

Native Lung ◽

Single Cell Rna Sequencing

Lung transplantation can potentially be a life-saving treatment for patients with nonresolving COVID-19–associated respiratory failure. Concerns limiting lung transplantation include recurrence of SARS-CoV-2 infection in the allograft, technical challenges imposed by viral-mediated injury to the native lung, and the potential risk for allograft infection by pathogens causing ventilator-associated pneumonia in the native lung. Additionally, the native lung might recover, resulting in long-term outcomes preferable to those of transplant. Here, we report the results of lung transplantation in three patients with nonresolving COVID-19–associated respiratory failure. We performed single-molecule fluorescence in situ hybridization (smFISH) to detect both positive and negative strands of SARS-CoV-2 RNA in explanted lung tissue from the three patients and in additional control lung tissue samples. We conducted extracellular matrix imaging and single-cell RNA sequencing on explanted lung tissue from the three patients who underwent transplantation and on warm postmortem lung biopsies from two patients who had died from COVID-19–associated pneumonia. Lungs from these five patients with prolonged COVID-19 disease were free of SARS-CoV-2 as detected by smFISH, but pathology showed extensive evidence of injury and fibrosis that resembled end-stage pulmonary fibrosis. Using machine learning, we compared single-cell RNA sequencing data from the lungs of patients with late-stage COVID-19 to that from the lungs of patients with pulmonary fibrosis and identified similarities in gene expression across cell lineages. Our findings suggest that some patients with severe COVID-19 develop fibrotic lung disease for which lung transplantation is their only option for survival.

Download Full-text

Differential gene expression analysis in single-cell RNA sequencing data

2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2017.8217650 ◽

2017 ◽

Author(s):

Tianyu Wang ◽

Sheida Nabavi

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Differential Gene Expression ◽

Expression Analysis ◽

Gene Expression Analysis ◽

Sequencing Data ◽

Differential Gene Expression Analysis ◽

Single Cell Rna Sequencing ◽

Differential Gene

Download Full-text

Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data

Genome Biology ◽

10.1186/gb-2013-14-1-r7 ◽

2013 ◽

Vol 14 (1) ◽

pp. R7 ◽

Cited By ~ 100

Author(s):

Jong Kim ◽

John C Marioni

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Stochastic Gene Expression ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Kinetics Of

Download Full-text

Abstract 4689: Subclone-specific evolution of tumor phenotypes – A framework to study subclone-specific gene expression from a combination of bulk DNA and single cell RNA sequencing data

10.1158/1538-7445.sabcs18-4689 ◽

2019 ◽

Author(s):

Yi Qiao ◽

Xiaomeng Huang ◽

Samuel Brady ◽

Andrea Bild ◽

David Bowtell ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Specific Gene ◽

Sequencing Data ◽

Specific Gene Expression ◽

Single Cell Rna Sequencing ◽

Tumor Phenotypes

Download Full-text

DoubletFinder: Doublet detection in single-cell RNA sequencing data using artificial nearest neighbors

10.1101/352484 ◽

2018 ◽

Cited By ~ 17

Author(s):

Christopher S. McGinnis ◽

Lyndsay M. Murrow ◽

Zev J. Gartner

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

False Negative ◽

Droplet Microfluidics ◽

Cell Capture ◽

Sequencing Data ◽

Putative Gene ◽

Detection Tool ◽

Single Cell Rna Sequencing

SUMMARYSingle-cell RNA sequencing (scRNA-seq) using droplet microfluidics occasionally produces transcriptome data representing more than one cell. These technical artifacts are caused by cell doublets formed during cell capture and occur at a frequency proportional to the total number of sequenced cells. The presence of doublets can lead to spurious biological conclusions, which justifies the practice of sequencing fewer cells to limit doublet formation rates. Here, we present a computational doublet detection tool – DoubletFinder – that identifies doublets based solely on gene expression features. DoubletFinder infers the putative gene expression profile of real doublets by generating artificial doublets from existing scRNA-seq data. Neighborhood detection in gene expression space then identifies sequenced cells with increased probability of being doublets based on their proximity to artificial doublets. DoubletFinder robustly identifies doublets across scRNA-seq datasets with variable numbers of cells and sequencing depth, and predicts false-negative and false-positive doublets defined using conventional barcoding approaches. We anticipate that DoubletFinder will aid in scRNA-seq data analysis and will increase the throughput and accuracy of scRNA-seq experiments.

Download Full-text

SPsimSeq: semi-parametric simulation of bulk and single cell RNA sequencing data

10.1101/677740 ◽

2019 ◽

Cited By ~ 1

Author(s):

Alemu Takele Assefa ◽

Jo Vandesompele ◽

Olivier Thas

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Empirical Distribution ◽

Supplementary Information ◽

Rna Seq ◽

Sequencing Data ◽

Actual Distribution ◽

Wide Range ◽

Single Cell Rna Sequencing

SummarySPsimSeq is a semi-parametric simulation method for bulk and single cell RNA sequencing data. It simulates data from a good estimate of the actual distribution of a given real RNA-seq dataset. In contrast to existing approaches that assume a particular data distribution, our method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data. Importantly, our method can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations (e.g. confounding batch effects), and different sample sizes. It can also be used to simulate different gene expression units resulting from different library preparation protocols, such as read counts or UMI counts.Availability and implementationThe R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq.Supplementary informationSupplementary data are available at bioRχiv online.

Download Full-text

G2S3: a gene graph-based imputation method for single-cell RNA sequencing data

10.1101/2020.04.01.020586 ◽

2020 ◽

Author(s):

Weimiao Wu ◽

Qile Dai ◽

Yunqing Liu ◽

Xiting Yan ◽

Zuoheng Wang

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Sequencing Data ◽

High Data ◽

Study Gene Expression ◽

Single Cell Rna Sequencing ◽

Novel Method

AbstractSingle-cell RNA sequencing provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses. We propose a novel method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and other existing methods to seven single-cell datasets to compare their performance. Our results demonstrated that G2S3 is superior in recovering true expression levels, identifying cell subtypes, improving differential expression analyses, and recovering gene regulatory relationships, especially for mildly expressed genes.

Download Full-text