scholarly journals Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads

2018 ◽  
Author(s):  
Andrian Yang ◽  
Joshua Y. S. Tang ◽  
Michael Troup ◽  
Joshua W. K. Ho

AbstractMotivationRead alignment is an important step in RNA-seq analysis as the result of alignment forms the basis for further downstream analyses. However, recent studies have shown that published alignment tools have variable mapping sensitivity and do not necessarily align reads which should have been aligned, a problem we termed as the false-negative non-alignment problem.ResultsWe have developed Scavenger, a pipeline for recovering unaligned reads using a novel mechanism which utilises information from aligned reads. Scavenger performs recovery of unaligned reads by re-aligning unaligned reads against a putative location derived from aligned reads with sequence similarity against unaligned reads. We show that Scavenger can successfully recover unaligned reads in both simulated and real RNA-seq datasets, including single-cell RNA-seq data. The reads recovered contain more genetic variants compared to previously aligned reads, indicating that divergence between personal and reference genomes plays a role in the false-negative non-alignment problem. We also explored the impact of read recovery on downstream analyses, in particular gene expression analysis, and showed that Scavenger is able to both recover genes which were previously non-expressed and also increase gene expression, with lowly expressed genes having the most impact from the addition of recovered reads. We also found that the majority of genes with >1 fold change in expression after recovery are categorised as pseudogenes, indicating that pseudogene expression can be affected by the false-negative non-alignment problem. Scavenger helps to solve the false-negative non-alignment problem through recovery of unaligned reads using information from previously aligned reads.AvailabilityScavenger is available via an open source license in https://github.com/VCCRI/Scavenger/[email protected]

F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 1587 ◽  
Author(s):  
Andrian Yang ◽  
Joshua Y. S. Tang ◽  
Michael Troup ◽  
Joshua W. K. Ho

Read alignment is an important step in RNA-seq analysis as the result of alignment forms the basis for downstream analyses. However, recent studies have shown that published alignment tools have variable mapping sensitivity and do not necessarily align all the reads which should have been aligned, a problem we termed as the false-negative non-alignment problem. Here we present Scavenger, a python-based bioinformatics pipeline for recovering unaligned reads using a novel mechanism in which a putative alignment location is discovered based on sequence similarity between aligned and unaligned reads. We showed that Scavenger could recover unaligned reads in a range of simulated and real RNA-seq datasets, including single-cell RNA-seq data. We found that recovered reads tend to contain more genetic variants with respect to the reference genome compared to previously aligned reads, indicating that divergence between personal and reference genomes plays a role in the false-negative non-alignment problem. Even when the number of recovered reads is relatively small compared to the total number of reads, the addition of these recovered reads can impact downstream analyses, especially in terms of estimating the expression and differential expression of lowly expressed genes, such as pseudogenes.


2021 ◽  
Vol 22 (3) ◽  
pp. 1222
Author(s):  
Cristina Cuello ◽  
Cristina A. Martinez ◽  
Josep M. Cambra ◽  
Inmaculada Parrilla ◽  
Heriberto Rodriguez-Martinez ◽  
...  

This study was designed to investigate the impact of vitrification on the transcriptome profile of blastocysts using a porcine (Sus scrofa) model and a microarray approach. Blastocysts were collected from weaned sows (n = 13). A total of 60 blastocysts were vitrified (treatment group). After warming, vitrified embryos were cultured in vitro for 24 h. Non-vitrified blastocysts (n = 40) were used as controls. After the in vitro culture period, the embryo viability was morphologically assessed. A total of 30 viable embryos per group (three pools of 10 from 4 different donors each) were subjected to gene expression analysis. A fold change cut-off of ±1.5 and a restrictive threshold at p-value < 0.05 were used to distinguish differentially expressed genes (DEGs). The survival rates of vitrified/warmed blastocysts were similar to those of the control (nearly 100%, n.s.). A total of 205 (112 upregulated and 93 downregulated) were identified in the vitrified blastocysts compared to the control group. The vitrification/warming impact was moderate, and it was mainly related to the pathways of cell cycle, cellular senescence, gap junction, and signaling for TFGβ, p53, Fox, and MAPK. In conclusion, vitrification modified the transcriptome of in vivo-derived porcine blastocysts, resulting in minor gene expression changes.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Li Tong ◽  
◽  
Po-Yen Wu ◽  
John H. Phan ◽  
Hamid R. Hassazadeh ◽  
...  

Abstract To use next-generation sequencing technology such as RNA-seq for medical and health applications, choosing proper analysis methods for biomarker identification remains a critical challenge for most users. The US Food and Drug Administration (FDA) has led the Sequencing Quality Control (SEQC) project to conduct a comprehensive investigation of 278 representative RNA-seq data analysis pipelines consisting of 13 sequence mapping, three quantification, and seven normalization methods. In this article, we focused on the impact of the joint effects of RNA-seq pipelines on gene expression estimation as well as the downstream prediction of disease outcomes. First, we developed and applied three metrics (i.e., accuracy, precision, and reliability) to quantitatively evaluate each pipeline’s performance on gene expression estimation. We then investigated the correlation between the proposed metrics and the downstream prediction performance using two real-world cancer datasets (i.e., SEQC neuroblastoma dataset and the NIH/NCI TCGA lung adenocarcinoma dataset). We found that RNA-seq pipeline components jointly and significantly impacted the accuracy of gene expression estimation, and its impact was extended to the downstream prediction of these cancer outcomes. Specifically, RNA-seq pipelines that produced more accurate, precise, and reliable gene expression estimation tended to perform better in the prediction of disease outcome. In the end, we provided scenarios as guidelines for users to use these three metrics to select sensible RNA-seq pipelines for the improved accuracy, precision, and reliability of gene expression estimation, which lead to the improved downstream gene expression-based prediction of disease outcome.


2020 ◽  
Author(s):  
Colin Peter Singer Kruse ◽  
Alexander D Meyers ◽  
Proma Basu ◽  
Sarahann Hutchinson ◽  
Darron R Luesse ◽  
...  

Abstract Background: Understanding of gravity sensing and response is critical to long-term human habitation in space and can provide new advantages for terrestrial agriculture. To this end, the altered gene expression profile induced by microgravity has been repeatedly queried by microarray and RNA-seq experiments to understand gravitropism. However, the quantification of altered protein abundance in space has been minimally investigated. Results: Proteomic (iTRAQ-labelled LC-MS/MS) and transcriptomic (RNA-seq) analyses simultaneously quantified protein and transcript differential expression of three-day old, etiolated Arabidopsis thaliana seedlings grown aboard the International Space Station along with their ground control counterparts. Protein extracts were fractionated to isolate soluble and membrane proteins and analyzed to detect differentially phosphorylated peptides. In total, 968 RNAs, 107 soluble proteins, and 103 membrane proteins were identified as differentially expressed. In addition, the proteomic analyses identified 16 differential phosphorylation events. Proteomic data delivered novel insights and simultaneously provided new context to previously made observations of gene expression in microgravity. There is a sweeping shift in post-transcriptional mechanisms of gene regulation including RNA-decapping protein DCP5, the splicing factors GRP7 and GRP8, and AGO4,. These data also indicate AHA2 and FERONIA as well as CESA1 and SHOU4 as central to the cell wall adaptations seen in spaceflight. Patterns of tubulin-a 1, 3,4 and 6 phosphorylation further reveal an interaction of microtubule and redox homeostasis that mirrors osmotic response signaling elements. The absence of gravity also results in a seemingly wasteful dysregulation of plastid gene transcription. Conclusions: The datasets gathered from Arabidopsis seedlings exposed to microgravity revealed marked impacts on post-transcriptional regulation, cell wall synthesis, redox/microtubule dynamics, and plastid gene transcription. The impact of post-transcriptional regulatory alterations represents an unstudied element of the plant microgravity response with the potential to significantly impact plant growth efficiency and beyond. What’s more, addressing the effects of microgravity on AHA2, CESA1, and alpha tubulins has the potential to enhance cytoskeletal organization and cell wall composition, thereby enhancing biomass production and growth in microgravity. Finally, understanding and manipulating the dysregulation of plastid gene transcription has further potential to address the goal of enhancing plant growth in the stressful conditions of microgravity.


2008 ◽  
Vol 17 (4) ◽  
pp. 200-206 ◽  
Author(s):  
Catherine I. Dumur ◽  
Sherjeel Sana ◽  
Amy C. Ladd ◽  
Andrea Ferreira-Gonzalez ◽  
David S. Wilkinson ◽  
...  

2012 ◽  
Vol 30 (30_suppl) ◽  
pp. 56-56
Author(s):  
Byung-In Lee ◽  
Kahuku Oades ◽  
Lien Vo ◽  
Jerry Lee ◽  
Mark Landers ◽  
...  

56 Background: Gene expression profiling has been shown to be effective in analyzing postoperative tumor samples in various cancers. However, in analyzing small specimens such as core biopsies, the limited amount of available material makes multi-gene analyses difficult or impossible. Microarray-based analyses also provide limited dynamic range. We describe the development of targeted RNA-sequencing methodology which combines the power of a universal RNA amplification with NGS for an ultra-deep expression analysis of multiple target genes, enabling <100 ng of sample input for multi-gene analysis in a single tube format. Methods: The gene expression patterns of triple-negative breast cancer FFPE samples were analyzed using a 96-gene breast cancer biomarker panel across three different platforms: Affymetrix Human Gene ST 1.0 microarrays, a pre-developed OncoScore qRT-PCR panel, and targeted RNA-seq. For targeted RNA-seq analysis, the 96-gene panel was amplified using a universal, single-tube “XP-PCR” amplification strategy followed by sequence analysis using the Ion-Torrent Personal Genome Machine. Results: Targeted RNA-seq provided the most sensitivity in terms of detection rates with <100 ng FFPE RNA input and provides unlimited dynamic range with increased sequencing depth. Expression ratio compression issues typically associated with a high number of pre-amplification cycles in standard multiplex-primed methods were not observed here. Low expressing genes, undetectable by qRT-PCR analysis from 1,000 ng input FFPE RNA, were detected and eligible for expression analysis with a significant number of sequencing reads. Alternative transcription/splicing analysis is also possible from sequence analysis of the target transcripts using targeted RNA-seq. Conclusions: By combining universally primed pre-amplification and NGS in multi-gene expression analysis, targeted RNA-seq provides the most sensitive gene expression analysis methodology.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e21045-e21045
Author(s):  
Emma O'Connor ◽  
Eileen E. Parkes ◽  
Leeona Galligan ◽  
James Bradford ◽  
Shauna Lambe ◽  
...  

e21045 Background: Traditionally gene expression signatures (GES) are used individually to classify patients into subgroups. Signatures targeting the same biology are often developed independently and may not classify identically. We developed the claraT software tool that uses consensus between multiple published GES categorised by the Hallmarks of Cancer (Hanahan & Weinberg, 2011) to classify cancers. As metastatic melanoma represents poor prognostic disease (5-yr survival 15-20%), we applied claraT to the TCGA melanoma dataset to identify targetable biologies, validated in a cohort of melanoma patients treated with Ipilimumab. Methods: TCGA RNA-seq data ( n= 472) was analysed using the claraT platform including GES for immune ( n= 14), angiogenesis ( n= 9) and epithelial-mesenchymal transition (EMT) ( n= 12) Hallmarks. Samples were clustered for the combined and individual Hallmarks. Median progression-free (PFS) and overall-survival (OS) differences were analysed across identified subgroups. Analysis was validated in an Ipilimumab treated melanoma dataset ( n= 42) (Van Allen, 2015). Results: Clustering the combined Hallmarks identified 4 subgroups in the TCGA cohort: 1) Immune active, 2) Immune-EMT active, 3) EMT-Angiogenesis active, 4) All inactive. Groups 1&2 had significantly improved OS compared to Groups 3&4 (HR = 0.50, p< 0.0001). Clustering using single Hallmarks revealed that immune-positive tumours had significantly improved OS (HR = 0.53, p< 0.0001) compared to immune-negative tumours. Angiogenesis-negative tumours displayed improved PFS (HR = 0.73, p= 0.03) and OS (HR = 0.53, p <0.0001) compared to angiogenesis-negative tumours. Interestingly the EMT Hallmark was not found to be individually prognostic. When validated in the Ipilimumab treated dataset, patients classified as immune-positive had improved OS (HR = 0.357, p= 0.010) when compared to immune-negative. Similar trends were also observed for angiogenesis and EMT Hallmarks. Conclusions: This study demonstrates how simultaneous analysis of multiple GES ( n= 35 in this study) can identify robust biologies through consensus expression. This platform may have value in the identification of reliable biomarkers for clinical trials and could inform how combination therapies targeting key biologies may be used in cancer treatment.


Sign in / Sign up

Export Citation Format

Share Document