Improving the diagnostic yield of exome-sequencing, by predicting gene-phenotype associations using large-scale gene expression analysis

RNA sequencing data: hitchhiker's guide to expression analysis

10.7287/peerj.preprints.27283 ◽

2018 ◽

Author(s):

Koen Van Den Berge ◽

Katharina Hembach ◽

Charlotte Soneson ◽

Simone Tiberi ◽

Lieven Clement ◽

...

Keyword(s):

Gene Expression ◽

Rna Sequencing ◽

Large Scale ◽

Science Studies ◽

Rna Seq ◽

Sequencing Data ◽

Data Types ◽

The Past ◽

Long Read ◽

Statistical Approaches

Gene expression is the fundamental level at which the result of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq datasets as well as the performance of the myriad of methods developed. In this review, we give an overall view of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on quantification of gene expression and statistical approaches for differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.

Download Full-text

Improving the diagnostic yield of exome- sequencing by predicting gene–phenotype associations using large-scale gene expression analysis

Nature Communications ◽

10.1038/s41467-019-10649-4 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 22

Author(s):

Patrick Deelen ◽

Sipko van Dam ◽

Johanna C. Herkert ◽

Juha M. Karjalainen ◽

Harm Brugge ◽

...

Keyword(s):

Gene Expression ◽

Exome Sequencing ◽

Expression Analysis ◽

Large Scale ◽

Gene Expression Analysis ◽

Diagnostic Yield

Download Full-text

Clinical implementation of RNA sequencing for Mendelian disease diagnostics

10.1101/2021.04.01.21254633 ◽

2021 ◽

Author(s):

Vicente A. Yepez ◽

Mirjana Gusic ◽

Robert Kopajtich ◽

Christian Mertes ◽

Nicholas H. Smith ◽

...

Keyword(s):

Gene Expression ◽

Rna Sequencing ◽

Genetic Diagnosis ◽

Allelic Expression ◽

Disease Genes ◽

Mendelian Disease ◽

Rna Seq ◽

Aberrant Splicing ◽

Aberrant Expression ◽

Routine Diagnostics

Lack of functional evidence hampers variant interpretation, leaving a large proportion of cases with a suspected Mendelian disorder without genetic diagnosis after genome or whole exome sequencing (WES). Research studies advocate to further sequence transcriptomes to directly and systematically probe gene expression defects. However, collection of additional biopsies, and establishment of lab workflows, analytical pipelines, and defined concepts in clinical interpretation of aberrant gene expression are still needed for adopting RNA-sequencing (RNA-seq) in routine diagnostics. To address these issues, we implemented an automated RNA-seq protocol and a computational workflow with which we analyzed skin fibroblasts of 303 individuals with a suspected mitochondrial disease. We detected on average 12,500 genes per sample including around 60% disease genes - a coverage substantially higher than with whole blood, supporting the use of skin biopsies. We prioritized genes demonstrating aberrant expression, aberrant splicing, or mono-allelic expression. The pipeline required less than one week from sample preparation to result reporting and provided a median of eight disease genes per patient for inspection. A genetic diagnosis was established for 16% of the WES-inconclusive cases. Detection of aberrant expression was a major contributor to diagnosis including instances of 50% reduction, which, together with mono-allelic expression, allowed for the diagnosis of dominant disorders caused by haploinsufficiency. Moreover, calling aberrant splicing and variants from RNA-seq data enabled detecting and validating splice-disrupting variants, of which the majority fell outside WES-covered regions. Together, these results show that streamlined experimental and computational processes can accelerate the implementation of RNA-seq in routine diagnostics.

Download Full-text

RNA sequencing data: hitchhiker's guide to expression analysis

10.7287/peerj.preprints.27283v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Koen Van Den Berge ◽

Katharina Hembach ◽

Charlotte Soneson ◽

Simone Tiberi ◽

Lieven Clement ◽

...

Keyword(s):

Gene Expression ◽

Rna Sequencing ◽

Large Scale ◽

Science Studies ◽

Rna Seq ◽

Sequencing Data ◽

Data Types ◽

The Past ◽

Long Read ◽

Statistical Approaches

Gene expression is the fundamental level at which the result of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq datasets as well as the performance of the myriad of methods developed. In this review, we give an overall view of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on quantification of gene expression and statistical approaches for differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.

Download Full-text

RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis

Annual Review of Biomedical Data Science ◽

10.1146/annurev-biodatasci-072018-021255 ◽

2019 ◽

Vol 2 (1) ◽

pp. 139-173 ◽

Cited By ~ 23

Author(s):

Koen Van den Berge ◽

Katharina M. Hembach ◽

Charlotte Soneson ◽

Simone Tiberi ◽

Lieven Clement ◽

...

Keyword(s):

Gene Expression ◽

Rna Sequencing ◽

Large Scale ◽

Science Studies ◽

Data Sets ◽

Rna Seq ◽

Sequencing Data ◽

Data Types ◽

The Past ◽

Long Read

Gene expression is the fundamental level at which the results of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq data sets, as well as the performance of the myriad of methods developed. In this review, we give an overview of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on the quantification of gene expression and statistical approachesfor differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.

Download Full-text

RNA sequencing data: hitchhiker's guide to expression analysis

10.7287/peerj.preprints.27283v2 ◽

2018 ◽

Cited By ~ 1

Author(s):

Koen Van Den Berge ◽

Katharina Hembach ◽

Charlotte Soneson ◽

Simone Tiberi ◽

Lieven Clement ◽

...

Keyword(s):

Gene Expression ◽

Rna Sequencing ◽

Large Scale ◽

Science Studies ◽

Rna Seq ◽

Sequencing Data ◽

Data Types ◽

The Past ◽

Long Read ◽

Statistical Approaches

Gene expression is the fundamental level at which the result of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq datasets as well as the performance of the myriad of methods developed. In this review, we give an overall view of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on quantification of gene expression and statistical approaches for differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.

Download Full-text

An integrated approach in gene-expression landscape profiling to identify housekeeping and tissue-specific genes in cattle

Animal Production Science ◽

10.1071/an20689 ◽

2021 ◽

Vol 61 (16) ◽

pp. 1643

Author(s):

Peng Li ◽

Yun Zhu ◽

Xiaolong Kang ◽

Xingang Dan ◽

Yun Ma ◽

...

Keyword(s):

Gene Expression ◽

Expression Analysis ◽

Large Scale ◽

Gene Expression Analysis ◽

Housekeeping Genes ◽

Analysis Tool ◽

Expression Data ◽

Rna Seq ◽

Web Based ◽

Tissue Specific

Context High-throughput transcriptome sequencing (RNA-Seq) has been widely applied in cattle studies. Public databases such as the National Center for Biotechnology Information (NCBI) contain large collections of gene expression data from various cattle tissues that can be used in gene expression analysis research Aims This study was conducted to investigate patterns of transcriptome variation across tissues of cattle through large-scale identification of housekeeping genes (i.e. those crucial to maintaining basic cellular activity) and tissue-specific genes in cattle tissues. Methods Using data available in the NCBI Sequence Read Archive database, we analysed 1377 transcriptome data sequences from 60 bovine tissue types, identified tissue-specific and housekeeping genes, and set up a web-based bovine gene expression analysis tool. Key results We found 101 genes widely expressed in almost all tissue and screened out five housekeeping genes: RPL35A, eIF4A2, GAPDH, IPO5 and PAK2. Focusing on 12 major organs, we found 861 genes specifically expressing in these tissues. Furthermore, 187 significantly differentially expressed genes were found among six types of muscle tissues. All expression data were made available at our new website http://cattleExp.org, which can be freely accessed for future gene expression analyses. Conclusions The housekeeping genes and tissue-specific genes identified will provide more information for researchers studying gene expression in cattle. Implications The web-based cattle gene expression analysis tool will make it easy for researchers to access large public datasets. Users can easily access all publicly available RNA data and upload their own RNA-Seq data.

Download Full-text

QUBIC2: A novel biclustering algorithm for large-scale bulk RNA-sequencing and single-cell RNA-sequencing data analysis

10.1101/409961 ◽

2018 ◽

Cited By ~ 5

Author(s):

Juan Xie ◽

Anjun Ma ◽

Yu Zhang ◽

Bingqiang Liu ◽

Changlin Wan ◽

...

Keyword(s):

Gene Expression ◽

Transcriptional Regulation ◽

Single Cell ◽

Rna Sequencing ◽

Spatial Data ◽

Large Scale ◽

Biological Information ◽

Superior Performance ◽

Rna Seq ◽

Sequencing Data

ABSTRACTThe combination of biclustering and large-scale gene expression data holds a promising potential for inference of the condition specific functional pathways/networks. However, existing biclustering tools do not have satisfied performance on high-resolution RNA-sequencing (RNA-Seq) data, majorly due to the lack of (i) a consideration of high sparsity of RNA-Seq data, e.g., the massive zeros or lowly expressed genes in the data, especially for single-cell RNA-Seq (scRNA-Seq) data, and (ii) an understanding of the underlying transcriptional regulation signals of the observed gene expression values. Here we presented a novel biclustering algorithm namely QUBIC2, for the analysis of large-scale bulk RNA-Seq and scRNA-Seq data. Key novelties of the algorithm include (i) used a truncated model to handle the unreliable quantification of genes with low or moderate expression, (ii) adopted the mixture Gaussian distribution and an information-divergency objective function to capture shared transcriptional regulation signals among a set of genes, (iii) utilized a Core-Dual strategy to identify biclusters and optimize relevant parameters, and (iv) developed a size-based P-value framework to evaluate the statistical significances of all the identified biclusters. Our method validation on comprehensive data sets of bulk and single cell RNA-seq data suggests that QUBIC2 had superior performance in functional modules detection and cell type classification compared with the other five widely-used biclustering tools. In addition, the applications of temporal and spatial data demonstrated that QUBIC2 can derive meaningful biological information from scRNA-Seq data. The source code for QUBIC2 can be freely accessed at https://github.com/maqin2001/qubic2.

Download Full-text

LncGSEA: a versatile tool to infer lncRNA associated pathways from large-scale cancer transcriptome sequencing data

BMC Genomics ◽

10.1186/s12864-021-07900-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yanan Ren ◽

Ting-You Wang ◽

Leah C. Anderton ◽

Qi Cao ◽

Rendong Yang

Keyword(s):

Gene Expression ◽

Large Scale ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Clinical Samples ◽

Sequencing Data ◽

Multiple Cancer ◽

Regulatory Pathways ◽

Cancer Transcriptome ◽

Versatile Tool

Abstract Background Long non-coding RNAs (lncRNAs) are a growing focus in cancer research. Deciphering pathways influenced by lncRNAs is important to understand their role in cancer. Although knock-down or overexpression of lncRNAs followed by gene expression profiling in cancer cell lines are established approaches to address this problem, these experimental data are not available for a majority of the annotated lncRNAs. Results As a surrogate, we present lncGSEA, a convenient tool to predict the lncRNA associated pathways through Gene Set Enrichment Analysis of gene expression profiles from large-scale cancer patient samples. We demonstrate that lncGSEA is able to recapitulate lncRNA associated pathways supported by literature and experimental validations in multiple cancer types. Conclusions LncGSEA allows researchers to infer lncRNA regulatory pathways directly from clinical samples in oncology. LncGSEA is written in R, and is freely accessible at https://github.com/ylab-hi/lncGSEA.

Download Full-text

FC 011KIDNEYNETWORK: USING KIDNEY DERIVED GENE EXPRESSION DATA TO PREDICT AND PRIORITIZE NOVEL GENES INVOLVED IN KIDNEY DISEASE

Nephrology Dialysis Transplantation ◽

10.1093/ndt/gfab131.001 ◽

2021 ◽

Vol 36 (Supplement_1) ◽

Author(s):

Floranne Boulogne ◽

Laura Claus ◽

Henry Wiersma ◽

Roy Oelen ◽

Floor Schukking ◽

...

Keyword(s):

Gene Expression ◽

Kidney Disease ◽

Candidate Gene ◽

Exome Sequencing ◽

Rna Sequencing ◽

Expression Patterns ◽

Genetic Diagnosis ◽

Specific Gene ◽

Sequencing Data ◽

Exome Sequencing Data

Abstract Background and Aims Genetic testing in patients with suspected hereditary kidney disease does not always reveal the genetic cause for the patient's disorder. Potentially pathogenic variants can reside in genes that are not known to be involved in kidney disease, which makes it difficult to prioritize and interpret the relevance of these variants. As such, there is a clear need for methods that predict the phenotypic consequences of gene expression in a way that is as unbiased as possible. To help identify candidate genes we have developed KidneyNetwork, in which tissue-specific expression is utilized to predict kidney-specific gene functions. Method We combined gene co-expression in 878 publicly available kidney RNA-sequencing samples with the co-expression of a multi-tissue RNA-sequencing dataset of 31,499 samples to build KidneyNetwork. The expression patterns were used to predict which genes have a kidney-related function, and which (disease) phenotypes might be caused when these genes are mutated. By integrating the information from the HPO database, in which known phenotypic consequences of disease genes are annotated, with the gene co-expression network we obtained prediction scores for each gene per HPO term. As proof of principle, we applied KidneyNetwork to prioritize variants in exome-sequencing data from 13 kidney disease patients without a genetic diagnosis. Results We assessed the prediction performance of KidneyNetwork by comparing it to GeneNetwork, a multi-tissue co-expression network we previously developed. In KidneyNetwork, we observe a significantly improved prediction accuracy of kidney-related HPO-terms, as well as an increase in the total number of significantly predicted kidney-related HPO-terms (figure 1). To examine its clinical utility, we applied KidneyNetwork to 13 patients with a suspected hereditary kidney disease without a genetic diagnosis. Based on the HPO terms “Renal cyst” and “Hepatic cysts”, combined with a list of potentially damaging variants in one of the undiagnosed patients with mild ADPKD/PCLD, we identified ALG6 as a new candidate gene. ALG6 bears a high resemblance to other genes implicated in this phenotype in recent years. Through the 100,000 Genomes Project and collaborators we identified three additional patients with kidney and/or liver cysts carrying a suspected deleterious variant in ALG6. Conclusion We present KidneyNetwork, a kidney specific co-expression network that accurately predicts what genes have kidney-specific functions and may result in kidney disease. Gene-phenotype associations of genes unknown for kidney-related phenotypes can be predicted by KidneyNetwork. We show the added value of KidneyNetwork by applying it to exome sequencing data of kidney disease patients without a molecular diagnosis and consequently we propose ALG6 as a promising candidate gene. KidneyNetwork can be applied to clinically unsolved kidney disease cases, but it can also be used by researchers to gain insight into individual genes to better understand kidney physiology and pathophysiology. Acknowledgments This research was made possible through access to the data and findings generated by the 100,000 Genomes Project; http://www.genomicsengland.co.uk.

Download Full-text