scholarly journals The rate and spectrum of mosaic mutations during embryogenesis revealed by RNA sequencing of 49 tissues

2019 ◽  
Author(s):  
Francesc Muyas ◽  
Luis Zapata ◽  
Roderic Guigó ◽  
Stephan Ossowski

AbstractBackgroundMosaic mutations acquired during early embryogenesis can lead to severe early-onset genetic disorders and cancer predisposition, but are often undetectable in blood samples. The rate and mutational spectrum of embryonic mosaic mutations (EMMs) have only been studied in few tissues and their contribution to genetic disorders is unknown. Therefore, we investigated how frequent mosaic mutations occur during embryogenesis across all germ layers and tissues.ResultsUsing RNA sequencing data from the Genotype-Tissue Expression (GTEx) cohort comprising 49 normal tissues and 570 individuals, we found that new-borns on average harbour 0.5 - 1 EMMs in the exome affecting multiple organs (1.3230 × 10−8 per nucleotide per individual), a similar frequency as reported for germline de novo mutations. Our multi-tissue, multi-individual study design allowed us to distinguish mosaic mutations acquired during different stages of embryogenesis and adult life, as well as to provide insights into the rate and spectrum of mosaic mutations. We observed that EMMs are dominated by a mutational signature associated with spontaneous deamination of methylated cytosines and the number of cell divisions. After birth, cells continue to accumulate somatic mutations, which can lead to the development of cancer. Investigation of the mutational spectrum of the gastrointestinal tract revealed a mutational pattern associated with the food-borne carcinogen aflatoxin, a signature that has so far only been reported in liver cancer.ConclusionIn summary, our multi-tissue, multi-individual study reveals a surprisingly high number of embryonic mosaic mutations in coding regions, implying novel hypotheses and diagnostic procedures for investigating genetic causes of disease and cancer predisposition.

2021 ◽  
Author(s):  
Gelana Khazeeva ◽  
Karolis Sablauskas ◽  
Bart van der Sanden ◽  
Wouter Steyaert ◽  
Michael Kwint ◽  
...  

De novo mutations (DNMs) are an important cause of genetic disorders. The accurate identification of DNMs from sequencing data is therefore fundamental to rare disease research and diagnostics. Unfortunately, identifying reliable DNMs remains a major challenge due to sequence errors, uneven coverage, and mapping artifacts. Here, we developed a deep convolutional neural network (CNN) DNM caller (DeNovoCNN), that encodes alignment of sequence reads for a trio as 160×164 resolution images. DeNovoCNN was trained on DNMs of whole exome sequencing (WES) of 2003 trios achieving on average 99.2% recall and 93.8% precision. We find that DeNovoCNN has increased recall/sensitivity and precision compared to existing de novo calling approaches (GATK, DeNovoGear, Samtools) based on the Genome in a Bottle reference dataset. Sanger validations of DNMs called in both exome and genome datasets confirm that DeNovoCNN outperforms existing methods. Most importantly, we show that DeNovoCNN is robust against different exome sequencing and analyses approaches, thereby allowing it to be applied on other datasets. DeNovoCNN is freely available and can be run on existing alignment (BAM/CRAM) and variant calling (VCF) files from WES and WGS without a need for variant recalling.


Viruses ◽  
2020 ◽  
Vol 12 (6) ◽  
pp. 620
Author(s):  
Katarzyna Leskinen ◽  
Maria I. Pajunen ◽  
Miguel Vincente Gomez-Raya Vilanova ◽  
Saija Kiljunen ◽  
Andrew Nelson ◽  
...  

YerA41 is a Myoviridae bacteriophage that was originally isolated due its ability to infect Yersinia ruckeri bacteria, the causative agent of enteric redmouth disease of salmonid fish. Several attempts to determine its genomic DNA sequence using traditional and next generation sequencing technologies failed, indicating that the phage genome is modified in such a way that it is an unsuitable template for PCR amplification and for conventional sequencing. To determine the YerA41 genome sequence, we performed RNA-sequencing from phage-infected Y. ruckeri cells at different time points post-infection. The host-genome specific reads were subtracted and de novo assembly was performed on the remaining unaligned reads. This resulted in nine phage-specific scaffolds with a total length of 143 kb that shared only low level and scattered identity to known sequences deposited in DNA databases. Annotation of the sequences revealed 201 predicted genes, most of which found no homologs in the databases. Proteome studies identified altogether 63 phage particle-associated proteins. The RNA-sequencing data were used to characterize the transcriptional control of YerA41 and to investigate its impact on the bacterial gene expression. Overall, our results indicate that RNA-sequencing can be successfully used to obtain the genomic sequence of non-sequencable phages, providing simultaneous information about the phage–host interactions during the process of infection.


2021 ◽  
Author(s):  
GiWon Shin ◽  
Hee Jung Koo ◽  
Mihwa Seo ◽  
Seung-Jae V. Lee ◽  
Hong Gil Nam ◽  
...  

Abstract Small RNAs that originate from transfer RNA (tRNA) species, tRNA-derived fragments (tRFs), play diverse biological functions but little is known for their association with aging. Moreover, biochemical aspects of tRNAs limit discovery of functional tRFs by high throughput sequencing. In particular, genes encoding tRNAs exist as multiple copies throughout genome, and mature tRNAs have various modified bases, contributing to ambiguities for RNA sequencing-based analysis of tRFs. Here, we report age-dependent changes in tRFs that may have functional impacts on aging in Caenorhabditis elegans. We first analyzed published RNA sequencing data by using a new strategy for tRNA-associated sequencing reads. Our current method used unique mature tRNAs as a reference for the sequence alignment, and properly filtered out false positive enrichment for tRFs. Our analysis successfully distinguished de novo mutation sites from differences among homologous copies, and identified potential RNA modification sites. Importantly, we revealed that the major source of tRFs up-regulated during aging was the tRNAs with abundant gene copy numbers. Together, our work raises the possibility that age-dependent changes in tRF levels have functional roles in the lifespan of animals, including C. elegans.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3998 ◽  
Author(s):  
Kishor Dhaygude ◽  
Kalevi Trontti ◽  
Jenni Paviala ◽  
Claire Morandin ◽  
Christopher Wheat ◽  
...  

Transcriptome resources for social insects have the potential to provide new insight into polyphenism, i.e., how divergent phenotypes arise from the same genome. Here we present a transcriptome based on paired-end RNA sequencing data for the ant Formica exsecta (Formicidae, Hymenoptera). The RNA sequencing libraries were constructed from samples of several life stages of both sexes and female castes of queens and workers, in order to maximize representation of expressed genes. We first compare the performance of common assembly and scaffolding software (Trinity, Velvet-Oases, and SOAPdenovo-trans), in producing de novo assemblies. Second, we annotate the resulting expressed contigs to the currently published genomes of ants, and other insects, including the honeybee, to filter genes that have annotation evidence of being true genes. Our pipeline resulted in a final assembly of altogether 39,262 mRNA transcripts, with an average coverage of >300X, belonging to 17,496 unique genes with annotation in the related ant species. From these genes, 536 genes were unique to one caste or sex only, highlighting the importance of comprehensive sampling. Our final assembly also showed expression of several splice variants in 6,975 genes, and we show that accounting for splice variants affects the outcome of downstream analyses such as gene ontologies. Our transcriptome provides an outstanding resource for future genetic studies on F. exsecta and other ant species, and the presented transcriptome assembly can be adapted to any non-model species that has genomic resources available from a related taxon.


2014 ◽  
Author(s):  
Gael P Alamancos ◽  
Amadís Pagès ◽  
Juan L Trincado ◽  
Nicolás Bellora ◽  
Eduardo Eyras

Alternative splicing plays an essential role in many cellular processes and bears major relevance in the understanding of multiple diseases, including cancer. High-throughput RNA sequencing allows genome-wide analyses of splicing across multiple conditions. However, the increasing number of available datasets represents a major challenge in terms of computation time and storage requirements. We describe SUPPA, a computational tool to calculate relative inclusion values of alternative splicing events, exploiting fast transcript quantification. SUPPA accuracy is comparable and sometimes superior to standard methods using simulated as well as real RNA sequencing data compared to experimentally validated events. We assess the variability in terms of the choice of annotation and provide evidence that using complete transcripts rather than more transcripts per gene provides better estimates. Moreover, SUPPA coupled with de novo transcript reconstruction methods does not achieve accuracies as high as using quantification of known transcripts, but remains comparable to existing methods. Finally, we show that SUPPA is more than 1000 times faster than standard methods. Coupled with fast transcript quantification, SUPPA provides inclusion values at a much higher speed than existing methods without compromising accuracy, thereby facilitating the systematic splicing analysis of large datasets with limited computational resources. The software is implemented in Python 2.7 and is available under the MIT license at https://bitbucket.org/regulatorygenomicsupf/suppa


2021 ◽  
Author(s):  
Adelina Rabenius ◽  
Sajitha Chandrakumaran ◽  
Lea Sistonen ◽  
Anniina Vihervaara

Nascent RNA-sequencing tracks transcription at nucleotide resolution. The genomic distribution of engaged transcription complexes, in turn, uncovers functional genomic regions. Here, we provide data-analytical steps to 1) identify transcribed regulatory elements de novo genome-wide, 2) quantify engaged transcription complexes at enhancers, promoter-proximal regions, divergent transcripts, gene bodies and termination windows, and 3) measure distribution of transcription machineries and regulatory proteins across functional genomic regions. This protocol follows RNA synthesis and genome-regulation in mammals, as demonstrated in human K562 erythroleukemia cells.


Genes ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 120
Author(s):  
Yiyun Sun ◽  
Dandan Xu ◽  
Chundong Zhang ◽  
Yitao Wang ◽  
Lian Zhang ◽  
...  

We previously demonstrated that proline-rich protein 11 (PRR11) and spindle and kinetochore associated 2 (SKA2) constituted a head-to-head gene pair driven by a prototypical bidirectional promoter. This gene pair synergistically promoted the development of non-small cell lung cancer. However, the signaling pathways leading to the ectopic expression of this gene pair remains obscure. In the present study, we first analyzed the lung squamous cell carcinoma (LSCC) relevant RNA sequencing data from The Cancer Genome Atlas (TCGA) database using the correlation analysis of gene expression and gene set enrichment analysis (GSEA), which revealed that the PRR11-SKA2 correlated gene list highly resembled the Hedgehog (Hh) pathway activation-related gene set. Subsequently, GLI1/2 inhibitor GANT-61 or GLI1/2-siRNA inhibited the Hh pathway of LSCC cells, concomitantly decreasing the expression levels of PRR11 and SKA2. Furthermore, the mRNA expression profile of LSCC cells treated with GANT-61 was detected using RNA sequencing, displaying 397 differentially expressed genes (203 upregulated genes and 194 downregulated genes). Out of them, one gene set, including BIRC5, NCAPG, CCNB2, and BUB1, was involved in cell division and interacted with both PRR11 and SKA2. These genes were verified as the downregulated genes via RT-PCR and their high expression significantly correlated with the shorter overall survival of LSCC patients. Taken together, our results indicate that GLI1/2 mediates the expression of the PRR11-SKA2-centric gene set that serves as an unfavorable prognostic indicator for LSCC patients, potentializing new combinatorial diagnostic and therapeutic strategies in LSCC.


Author(s):  
Vincent M. Tutino ◽  
Haley R. Zebraski ◽  
Hamidreza Rajabzadeh-Oghaz ◽  
Lee Chaves ◽  
Adam A. Dmytriw ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document