mRIN for direct assessment of genome-wide and gene-specific mRNA integrity from large-scale RNA-sequencing data

Huijuan Feng; Xuegong Zhang; Chaolin Zhang

doi:10.1038/ncomms8816

CaSpER: Identification, visualization and integrative analysis of CNV events in multiscale resolution using single-cell or bulk RNA sequencing data

10.1101/426122 ◽

2018 ◽

Cited By ~ 1

Author(s):

Akdes Serin Harmancı ◽

Arif O. Harmanci ◽

Xiaobo Zhou

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Copy Number ◽

Large Scale ◽

Complete Characterization ◽

Integrative Analysis ◽

Sequencing Data ◽

Length Scales ◽

Expression Levels ◽

Genome Wide

AbstractRNA sequencing experiments generate large amounts of information about expression levels of genes. Although they are mainly used for quantifying expression levels, they contain much more biologically important information such as copy number variants (CNV). Here, we propose CaSpER, a signal processing approach for identification, visualization, and integrative analysis of focal and large-scale CNV events in multiscale resolution using either bulk or single-cell RNA sequencing data. CaSpER performs smoothing of the genome-wide RNA sequencing signal profiles in different multiscale resolutions, identifying CNV events at different length scales. CaSpER also employs a novel methodology for generation of genome-wide B-allele frequency (BAF) signal profile from the reads and utilizes it in multiscale fashion for correction of CNV calls. The shift in allelic signal is used to quantify the loss-of-heterozygosity (LOH) which is valuable for CNV identification. CaSpER uses Hidden Markov Models (HMM) to assign copy number states to regions. The multiscale nature of CaSpER enables comprehensive analysis of focal and large-scale CNVs and LOH segments. CaSpER performs well in accuracy compared to gold standard SNP genotyping arrays. In particular, analysis of single cell Glioblastoma (GBM) RNA sequencing data with CaSpER reveals novel mutually exclusive and co-occurring CNV sub-clones at different length scales. Moreover, CaSpER discovers gene expression signatures of CNV sub-clones, performs gene ontology (GO) enrichment analysis and identifies potential therapeutic targets for the sub-clones. CaSpER increases the utility of RNA-sequencing datasets and complements other tools for complete characterization and visualization of the genomic and transcriptomic landscape of single cell and bulk RNA sequencing data, especially in cancer research.

Download Full-text

CaSpER identifies and visualizes CNV events by integrative analysis of single-cell or bulk RNA-sequencing data

Nature Communications ◽

10.1038/s41467-019-13779-x ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 5

Author(s):

Akdes Serin Harmanci ◽

Arif O. Harmanci ◽

Xiaobo Zhou

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Copy Number Variants ◽

Complete Characterization ◽

Integrative Analysis ◽

Sequencing Data ◽

Expression Levels ◽

Genome Wide ◽

A Genome

AbstractRNA sequencing experiments generate large amounts of information about expression levels of genes. Although they are mainly used for quantifying expression levels, they contain much more biologically important information such as copy number variants (CNVs). Here, we present CaSpER, a signal processing approach for identification, visualization, and integrative analysis of focal and large-scale CNV events in multiscale resolution using either bulk or single-cell RNA sequencing data. CaSpER integrates the multiscale smoothing of expression signal and allelic shift signals for CNV calling. The allelic shift signal measures the loss-of-heterozygosity (LOH) which is valuable for CNV identification. CaSpER employs an efficient methodology for the generation of a genome-wide B-allele frequency (BAF) signal profile from the reads and utilizes it for correction of CNVs calls. CaSpER increases the utility of RNA-sequencing datasets and complements other tools for complete characterization and visualization of the genomic and transcriptomic landscape of single cell and bulk RNA sequencing data.

Download Full-text

LSTrAP-Crowd: Prediction of novel components of bacterial ribosomes with crowd-sourced analysis of RNA sequencing data

10.1101/2020.04.20.005249 ◽

2020 ◽

Author(s):

Benedict Hew ◽

Qiao Wen Tan ◽

William Goh ◽

Jonathan Wei Xiong Ng ◽

Kenny Koh ◽

...

Keyword(s):

Gene Expression ◽

Protein Synthesis ◽

Rna Sequencing ◽

Gene Expression Data ◽

Large Scale ◽

Bacterial Resistance ◽

Expression Data ◽

Sequencing Data ◽

Novel Proteins ◽

Novel Antibiotics

AbstractBacterial resistance to antibiotics is a growing problem that is projected to cause more deaths than cancer in 2050. Consequently, novel antibiotics are urgently needed. Since more than half of the available antibiotics target the bacterial ribosomes, proteins that are involved in protein synthesis are thus prime targets for the development of novel antibiotics. However, experimental identification of these potential antibiotic target proteins can be labor-intensive and challenging, as these proteins are likely to be poorly characterized and specific to few bacteria. In order to identify these novel proteins, we established a Large-Scale Transcriptomic Analysis Pipeline in Crowd (LSTrAP-Crowd), where 285 individuals processed 26 terabytes of RNA-sequencing data of the 17 most notorious bacterial pathogens. In total, the crowd processed 26,269 RNA-seq experiments and used the data to construct gene co-expression networks, which were used to identify more than a hundred uncharacterized genes that were transcriptionally associated with protein synthesis. We provide the identity of these genes together with the processed gene expression data. The data can be used to identify other vulnerabilities or bacteria, while our approach demonstrates how the processing of gene expression data can be easily crowdsourced.

Download Full-text

Single-Cell Transcriptome Analysis Reveals Dynamic Cell Populations and Differential Gene Expression Patterns in Control and Aneurysmal Human Aortic Tissue

Circulation ◽

10.1161/circulationaha.120.046528 ◽

2020 ◽

Vol 142 (14) ◽

pp. 1374-1388

Author(s):

Yanming Li ◽

Pingping Ren ◽

Ashley Dawson ◽

Hernan G. Vasquez ◽

Waleed Ageedi ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Aortic Wall ◽

Genome Wide Association ◽

Aortic Tissue ◽

Sequencing Data ◽

Genome Wide ◽

Single Cell Rna Sequencing ◽

Differential Gene

Background: Ascending thoracic aortic aneurysm (ATAA) is caused by the progressive weakening and dilatation of the aortic wall and can lead to aortic dissection, rupture, and other life-threatening complications. To improve our understanding of ATAA pathogenesis, we aimed to comprehensively characterize the cellular composition of the ascending aortic wall and to identify molecular alterations in each cell population of human ATAA tissues. Methods: We performed single-cell RNA sequencing analysis of ascending aortic tissues from 11 study participants, including 8 patients with ATAA (4 women and 4 men) and 3 control subjects (2 women and 1 man). Cells extracted from aortic tissue were analyzed and categorized with single-cell RNA sequencing data to perform cluster identification. ATAA-related changes were then examined by comparing the proportions of each cell type and the gene expression profiles between ATAA and control tissues. We also examined which genes may be critical for ATAA by performing the integrative analysis of our single-cell RNA sequencing data with publicly available data from genome-wide association studies. Results: We identified 11 major cell types in human ascending aortic tissue; the high-resolution reclustering of these cells further divided them into 40 subtypes. Multiple subtypes were observed for smooth muscle cells, macrophages, and T lymphocytes, suggesting that these cells have multiple functional populations in the aortic wall. In general, ATAA tissues had fewer nonimmune cells and more immune cells, especially T lymphocytes, than control tissues did. Differential gene expression data suggested the presence of extensive mitochondrial dysfunction in ATAA tissues. In addition, integrative analysis of our single-cell RNA sequencing data with public genome-wide association study data and promoter capture Hi-C data suggested that the erythroblast transformation-specific related gene( ERG ) exerts an important role in maintaining normal aortic wall function. Conclusions: Our study provides a comprehensive evaluation of the cellular composition of the ascending aortic wall and reveals how the gene expression landscape is altered in human ATAA tissue. The information from this study makes important contributions to our understanding of ATAA formation and progression.

Download Full-text

Genome-Wide Identification and Characterization of Long Non-Coding RNAs in Peanut

Genes ◽

10.3390/genes10070536 ◽

2019 ◽

Vol 10 (7) ◽

pp. 536 ◽

Cited By ~ 2

Author(s):

Xiaobo Zhao ◽

Liming Gan ◽

Caixia Yan ◽

Chunjuan Li ◽

Quanxi Sun ◽

...

Keyword(s):

Large Scale ◽

Target Genes ◽

Sequencing Data ◽

Regulatory Processes ◽

Genome Wide ◽

Non Coding Rnas ◽

Identification And Characterization ◽

Lower Expression ◽

Weighted Correlation

Long non-coding RNAs (lncRNAs) are involved in various regulatory processes although they do not encode protein. Presently, there is little information regarding the identification of lncRNAs in peanut (Arachis hypogaea Linn.). In this study, 50,873 lncRNAs of peanut were identified from large-scale published RNA sequencing data that belonged to 124 samples involving 15 different tissues. The average lengths of lncRNA and mRNA were 4335 bp and 954 bp, respectively. Compared to the mRNAs, the lncRNAs were shorter, with fewer exons and lower expression levels. The 4713 co-expression lncRNAs (expressed in all samples) were used to construct co-expression networks by using the weighted correlation network analysis (WGCNA). LncRNAs correlating with the growth and development of different peanut tissues were obtained, and target genes for 386 hub lncRNAs of all lncRNAs co-expressions were predicted. Taken together, these findings can provide a comprehensive identification of lncRNAs in peanut.

Download Full-text

A map of tumor–host interactions in glioma at single-cell resolution

GigaScience ◽

10.1093/gigascience/giaa109 ◽

2020 ◽

Vol 9 (10) ◽

Cited By ~ 3

Author(s):

Francesca Pia Caruso ◽

Luciano Garofano ◽

Fulvio D'Angelo ◽

Kai Yu ◽

Fuchou Tang ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cross Talk ◽

Large Scale ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Sequencing Data ◽

Host Interaction ◽

Receptor Interactions ◽

Single Cell Rna Sequencing

ABSTRACT Background Single-cell RNA sequencing is the reference technique for characterizing the heterogeneity of the tumor microenvironment. The composition of the various cell types making up the microenvironment can significantly affect the way in which the immune system activates cancer rejection mechanisms. Understanding the cross-talk signals between immune cells and cancer cells is of fundamental importance for the identification of immuno-oncology therapeutic targets. Results We present a novel method, single-cell Tumor–Host Interaction tool (scTHI), to identify significantly activated ligand–receptor interactions across clusters of cells from single-cell RNA sequencing data. We apply our approach to uncover the ligand–receptor interactions in glioma using 6 publicly available human glioma datasets encompassing 57,060 gene expression profiles from 71 patients. By leveraging this large-scale collection we show that unexpected cross-talk partners are highly conserved across different datasets in the majority of the tumor samples. This suggests that shared cross-talk mechanisms exist in glioma. Conclusions Our results provide a complete map of the active tumor–host interaction pairs in glioma that can be therapeutically exploited to reduce the immunosuppressive action of the microenvironment in brain tumor.

Download Full-text

In silico genome wide mining of conserved and novel miRNAs in the brain and pineal gland of Danio rerio using small RNA sequencing data

Genomics Data ◽

10.1016/j.gdata.2015.11.013 ◽

2016 ◽

Vol 7 ◽

pp. 46-53 ◽

Cited By ~ 3

Author(s):

Suyash Agarwal ◽

Naresh Sahebrao Nagpure ◽

Prachi Srivastava ◽

Basdeo Kushwaha ◽

Ravindra Kumar ◽

...

Keyword(s):

Pineal Gland ◽

Rna Sequencing ◽

Danio Rerio ◽

Small Rna ◽

In Silico ◽

Small Rna Sequencing ◽

Sequencing Data ◽

Genome Wide ◽

The Brain

Download Full-text

SSCC: a novel computational framework for rapid and accurate clustering large single cell RNA-seq data

10.1101/344242 ◽

2018 ◽

Cited By ~ 2

Author(s):

Xianwen Ren ◽

Liangtao Zheng ◽

Zemin Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Random Projection ◽

Rna Seq ◽

Sequencing Data ◽

Computational Framework ◽

Human Blood Cells ◽

Single Cell Rna Sequencing ◽

Data Volume

ABSTRACTClustering is a prevalent analytical means to analyze single cell RNA sequencing data but the rapidly expanding data volume can make this process computational challenging. New methods for both accurate and efficient clustering are of pressing needs. Here we proposed a new clustering framework based on random projection and feature construction for large scale single-cell RNA sequencing data, which greatly improves clustering accuracy, robustness and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, our method reached 20% improvements for clustering accuracy and 50-fold acceleration but only consumed 66% memory usage compared to the widely-used software package SC3. Compared to k-means, the accuracy improvement can reach 3-fold depending on the concrete dataset. An R implementation of the framework is available from https://github.com/Japrin/sscClust.

Download Full-text

A bivariate zero-inflated negative binomial model for identifying underlying dependence with application to single cell RNA sequencing data

10.1101/2020.03.06.977728 ◽

2020 ◽

Author(s):

Hunyong Cho ◽

Chuwen Liu ◽

John S. Preisser ◽

Di Wu

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Latent Variable ◽

Large Scale ◽

Negative Binomial ◽

Model Fitting ◽

Sequencing Data ◽

Excess Zeros ◽

Binomial Distributions ◽

Single Cell Rna Sequencing

SummaryMeasuring gene-gene dependence in single cell RNA sequencing (scRNA-seq) count data is often of interest and remains challenging, because an unidentified portion of the zero counts represent non-detected RNA due to technical reasons. Conventional statistical methods that fail to account for technical zeros incorrectly measure the dependence among genes. To address this problem, we propose a bivariate zero-inflated negative binomial (BZINB) model constructed using a bivariate Poisson-gamma mixture with dropout indicators for the technical (excess) zeros. Parameters are estimated based on the EM algorithm and are used to measure the underlying dependence by decomposing the two sources of zeros. Compared to existing models, the proposed BZINB model is specifically designed for estimating dependence and is more flexible, while preserving the marginal zero-inflated negative binomial distributions. Additionally, it has a simple latent variable framework, allowing parameters to have clear and intuitive interpretations, and its computation is feasible with large scale data. Using a recent scRNA-seq dataset, we illustrate model fitting and how the model-based measures can be different from naive measures. The inferential ability of the proposed model is evaluated in a simulation study. An R package ‘bzinb’ is available on CRAN.

Download Full-text

Population-level genome-wide STR typing in Plasmodium species reveals higher resolution population structure and genetic diversity relative to SNP typing

10.1101/2021.05.19.444768 ◽

2021 ◽

Author(s):

Jiru Han ◽

Jacob E Munro ◽

Anthony Kocoski ◽

Alyssa E Barry ◽

Melanie Bahlo

Keyword(s):

Genetic Diversity ◽

Large Scale ◽

Tandem Repeats ◽

Plasmodium Species ◽

Whole Genome Sequence ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Genome Wide ◽

Field Samples

Short tandem repeats (STRs) are highly informative genetic markers that have been used extensively in population genetics analysis. They are an important source of genetic diversity and can also have functional impact. Despite the availability of bioinformatic methods that permit large-scale genome-wide genotyping of STRs from whole genome sequencing data, they have not previously been applied to sequencing data from large collections of malaria parasite field samples. Here, we have genotyped STRs using HipSTR in more than 3,000 Plasmodium falciparum and 174 Plasmodium vivax published whole-genome sequence data from samples collected across the globe. High levels of noise and variability in the resultant callset necessitated the development of a novel method for quality control of STR genotype calls. A set of high-quality STR loci (6,768 from P. falciparum and 3,496 from P. vivax) were used to study Plasmodium genetic diversity, population structures and genomic signatures of selection and these were compared to genome-wide single nucleotide polymorphism (SNP) genotyping data. In addition, the genome-wide information about genetic variation and other characteristics of STRs in P. falciparum and P. vivax have been made available in an interactive web-based R Shiny application PlasmoSTR (https://github.com/bahlolab/PlasmoSTR).

Download Full-text