On the verge of diagnosis: Detection, reporting, and investigation of de novo variants in novel genes identified by clinical sequencing

2018 ◽  
Vol 39 (11) ◽  
pp. 1505-1516 ◽  
Author(s):  
Isabelle Thiffault ◽  
Maxime Cadieux‐Dion ◽  
Emily Farrow ◽  
Raymond Caylor ◽  
Neil Miller ◽  
...  
2021 ◽  
Author(s):  
Chris Papadopoulos ◽  
Isabelle Callebaut ◽  
Jean-Christophe Gelly ◽  
Isabelle Hatin ◽  
Olivier Namy ◽  
...  

The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences' properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic ORFs (Open Reading Frames) of S. cerevisiae with the aim of (i) exploring whether the large structural diversity observed in proteomes is already present in noncoding sequences, and (ii) estimating the potential of the noncoding genome to produce novel protein bricks that can either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural diversity of canonical proteins with strikingly the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by identifying intergenic ORFs with a strong translation signal in ribosome profiling experiments and by reconstructing the ancestral sequences of 70 yeast de novo genes. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and the one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.


2018 ◽  
Author(s):  
Leo Blondel ◽  
Tamsin E. M. Jones ◽  
Cassandra G. Extavour

AbstractNew cellular functions and developmental processes can evolve by modifying existing genes or creating novel genes. Novel genes can arise not only via duplication or mutation but also by acquiring foreign DNA, also called horizontal gene transfer (HGT). Here we show that HGT likely contributed to the creation of a novel gene indispensable for reproduction in some insects. Long considered a novel gene with unknown origin, oskar has evolved to fulfil a crucial role in insect germ cell formation. Our analysis of over 100 insect Oskar sequences suggests that Oskar arose de novo via fusion of eukaryotic and prokaryotic sequences. This work shows that highly unusual gene origin processes can give rise to novel genes that can facilitate evolution of novel developmental mechanisms.One Sentence SummaryOur research shows that gene origin processes often considered highly unusual, including HGT and de novo coding region evolution, can give rise to novel genes that can both participate in pre-existing gene regulatory networks, and also facilitate the evolution of novel developmental mechanisms.


2021 ◽  
Author(s):  
Mrinalini Mrinalini ◽  
Nalini Puniamoorthy

Abstract BackgroundOxford Nanopore Technologies (ONT) long-read transcriptomes offer many advantages including long reads (>10kbp), end-to-end transcripts, structural variants, isoform-level resolution of genes and expression. However, uptake of ONT transcriptomics is still low, largely due to high error rates (2 to 13%) and reliance on reference databases that are unavailable for many non-model species. Additionally, bioinformatics tools and pipelines for de novo ONT transcriptomics are still in early stages of development. ResultsHere, we use de novo ONT GridION transcriptomics to discover novel genes from the male accessory glands (AG) of a widespread, non-model dung fly, Sepsis punctum. Insect AGs are of particular interest for this as they are hotspots for rapid evolution of novel reproductive genes, and they synthesize seminal fluid proteins that lack homology to any other known proteins. We implement a completely de novo ONT GridION transcriptome pipeline, incorporating quality-filtering and rigorous error-correction procedures, to characterize this novel gene set and to quantify their expression. Specifically, we compare these ONT genes and their expression against de novo lllumina HiSeq transcriptome data. We find 40 high-quality and high-confidence ONT genes that cross-verify against Illumina genes; twenty-six of which are novel and specific to S. punctum. Read count based expression quantification in ONT samples is highly congruent with Illumina’s Transcript per Million (TPM), both in overall pattern and within functional categories. Novel genes account for an average of 81% of total gene expression underscoring their functional importance in S. punctum AGs. Eighty percentage of these genes are secretory in nature, responsible for 74% total gene expression. Notably, median sequence similarities of ONT nucleotide and protein sequences match within-Illumina sequence similarities indicating that our de novo ONT transcriptome pipeline successfully mitigated sequencing errors. ConclusionsThis is the first study to adapt ONT transcriptomics for completely de novo characterization of novel genes in animals. Our study demonstrates that ONT long-reads, constituting a quarter of the number of bases sequenced at less than a third the cost of Illumina reads, can be a resource-friendly and cost-effective solution for end-to-end sequencing of unknown genes even in the absence of a reference database.


BMC Genomics ◽  
2011 ◽  
Vol 12 (1) ◽  
Author(s):  
Massimo Iorizzo ◽  
Douglas A Senalik ◽  
Dariusz Grzebelus ◽  
Megan Bowman ◽  
Pablo F Cavagnaro ◽  
...  

2017 ◽  
Vol 27 (3) ◽  
pp. 421-429 ◽  
Author(s):  
Venu Pullabhatla ◽  
Amy L Roberts ◽  
Myles J Lewis ◽  
Daniele Mauro ◽  
David L Morris ◽  
...  

2020 ◽  
Author(s):  
Vitor Lima Coelho ◽  
Tarcísio Fontenele de Brito ◽  
Ingrid Alexandre de Abreu Brito ◽  
Maira Arruda Cardoso ◽  
Mateus Antonio Berni ◽  
...  

AbstractRhodnius prolixus is a Triatominae insect species and a primary vector of Chagas disease. The genome of R. prolixus has been recently sequenced and partially assembled, but few transcriptome analyses have been performed to date. In this study, we describe the stage-specific transcriptomes obtained from previtellogenic stages of oogenesis and from mature eggs. By analyzing ~228 million paired-end RNA-Seq reads, we significantly improved the current genome annotations for 9,206 genes. We provide extended 5’ and 3’ UTRs, complete Open Reading Frames, and alternative transcript variants. Strikingly, using a combination of genome-guided and de novo transcriptome assembly we found more than two thousand novel genes, thus increasing the number of genes in R. prolixus from 15,738 to 17,864. We used the improved transcriptome to investigate stage-specific gene expression profiles during R. prolixus oogenesis. Our data reveal that 11,127 genes are expressed in the early previtellogenic stage of oogenesis and their transcripts are deposited in the developing egg including key factors regulating germline development, genome integrity, and the maternal-zygotic transition. In addition, GO term analyses show that transcripts encoding components of the steroid hormone receptor pathway, cytoskeleton, and intracellular signaling are abundant in the mature eggs, where they likely control early embryonic development upon fertilization. Our results significantly improve the R. prolixus genome and transcriptome and provide novel insight into oogenesis and early embryogenesis in this medically relevant insect.


2021 ◽  
Author(s):  
Chris Papadopoulos ◽  
Isabelle Callebaut ◽  
Jean-Christophe Gelly ◽  
Isabelle Hatin ◽  
Olivier Namy ◽  
...  

The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences’ properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic open reading frames (ORFs) of S. cerevisiae with the aim of (1) exploring whether the structural states’ diversity of proteomes is already present in noncoding sequences, and (2) estimating the potential of the noncoding genome to produce novel protein bricks that could either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural state diversity of canonical proteins, with the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by reconstructing the ancestral sequences of 70 yeast de novo genes and characterized the sequence and structural properties of intergenic ORFs with a strong translation signal. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.


2014 ◽  
Vol 5 (1) ◽  
Author(s):  
B. J. O'Roak ◽  
H. A. Stessman ◽  
E. A. Boyle ◽  
K. T. Witherspoon ◽  
B. Martin ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document