Comparative genomic analyses highlight the contribution of pseudogenized protein-coding genes to human lincRNAs

Mapping Intimacies ◽

10.1101/163626 ◽

2017 ◽

Author(s):

Wan-Hsin Liu ◽

Zing Tsung-Yeh Tsai ◽

Huai-Kuang Tsai

Keyword(s):

Human Genome ◽

Noncoding Rna ◽

De Novo ◽

Systematic Investigation ◽

Comparative Genomic ◽

Protein Coding ◽

Protein Coding Genes ◽

Competing Endogenous Rnas ◽

Intergenic Regions ◽

The Relationship

AbstractBackgroundThe regulatory roles of long intergenic noncoding RNAs (lincRNAs) in humans have been revealed through the use of advanced sequencing technology. Recently, three possible scenarios of lincRNA origin have been proposed: de novo origination from intergenic regions, duplication from long noncoding RNA, and pseudogenization from protein. The first two scenarios are largely studied and supported, yet few studies focused on the evolution from pseudo genized protein-coding sequence to lincRNA. Due to the non-mutually exclusive nature that these three scenarios have, accompanied by the need of systematic investigation of lincRNA origination, we conduct a comparative genomics study to investigate the evolution of human lincRNAs.ResultsCombining with syntenic analysis and stringent Blastn e-value cutoff, we found that the majority of lincRNAs are aligned to the intergenic regions of other species. Interestingly, 193 human lincRNAs could have protein-coding orthologs in at least two of nine vertebrates. Transposable elements in these conserved regions in human genome are much less than expectation. Moreover, 19% of these lincRNAs have overlaps with or are close to pseudogenes in the human genome.ConclusionsWe suggest that a notable portion of lincRNAs could be derived from pseudogenized protein-coding genes. Furthermore, based on our computational analysis, we hypothesize that a subset of these lincRNAs could have potential to regulate their paralogs by functioning as competing endogenous RNAs. Our results provide evolutionary evidence of the relationship between human lincRNAs and protein-coding genes.

Download Full-text

GENOMICS AND EPIGENOMICS IN MAIZE HYBRID KERNEL

IRAQI JOURNAL OF AGRICULTURAL SCIENCES ◽

10.36103/ijas.v49i6.129 ◽

2018 ◽

Vol 49 (6) ◽

Author(s):

Elsahookie & et al.

Keyword(s):

Noncoding Rna ◽

Endosperm Development ◽

Imprinted Genes ◽

Specific Expression ◽

Protein Coding ◽

Maize Hybrid ◽

Preferential Expression ◽

Protein Coding Genes ◽

Stage Of Development ◽

Intergenic Regions

The endosperm in cereals supplies nutrients to the developing kernel and seedling, and it is the primary tissue that gene imprinting occurs. Developing maize (Zea mays L.) endosperms were analysed for allelic gene expression in both reciprocal crosses of inbreds B73 and Mo17. A high-throughput transcriptome sequencing in kernels at 0, 3 up to 15 DAP of both reciprocals were performed, and found a gradual increased paternal transcript expression in 3 and 5 DAP kernels. Meanwhile, in 7 DAP endosperm, most of genes tested gave the ratio 2:1 maternal: paternal, suggesting that paternal genes are almost fully activated at 7 DAP. There were 300 PEGs and 499 MEGs identified across endosperm development stages. A 63 genes out of 116, 234 exhibited parent-specific expression were identified at 7, 10 and 15 DAP. Most of paternally expressed genes was at 7 DAP due to deviation of paternal alleles expression at this stage of development. Imprinted genes in terms of relative expression of maternal and paternal alleles differed at least five folds in both crosses. A total of 179 (1.6%) protein coding genes expressed in the endosperm were imprinted, 68 of them showed maternal preferential expression and 111 paternal expression, besides 38 long noncoding RNA were found imprinted and transcribed in either sense or antisense direction from intronic regions of normal protein coding genes or from intergenic regions. Imprinted genes showed clustering around the genome. A total of 21 imprinted genes in the maize hybrid endosperm had differentially methylated regions (DMRs). All DMRs were found to be hypomethylated in maternal alleles and hypermethylated in paternal alleles. These results confirm a complex mechanism controlling endosperm in maize in imprinting, auxin activity, and development regulation. Studying F2 kernels on F1 plants may shed a new light on controlling kernel number weight in unit of area.

Download Full-text

Diversity and evolution of the emerging Pandoraviridae family

10.1101/230904 ◽

2017 ◽

Cited By ~ 1

Author(s):

Matthieu Legendre ◽

Elisabeth Fabre ◽

Olivier Poirot ◽

Sandra Jeudy ◽

Audrey Lartigue ◽

...

Keyword(s):

Comparative Genomics ◽

De Novo ◽

Gene Duplications ◽

Statistical Features ◽

Strong Component ◽

Protein Coding ◽

Protein Coding Genes ◽

Intergenic Regions ◽

Comparative Genomics Analysis ◽

Horizontal Transfers

AbstractWith DNA genomes up to 2.5 Mb packed in particles of bacterium-like shape and dimension, the first two Acanthamoeba-infectingPandoravirusesremained the most spectacular viruses since their description in 2013. Our isolation of three new strains from distant locations and environments allowed us to perform the first comparative genomics analysis of the emerging worldwide-distributed Pandoraviridae family. Thorough annotation of the genomes combining transcriptomic, proteomic, and bioinformatic analyses, led to the discovery of many non-coding transcripts while significantly reducing the former set of predicted protein-coding genes. We found that the Pandoraviridae exhibit an open pan genome, the enormous size of which is not adequately explained by gene duplications or horizontal transfers. As most of the strain specific genes have no extant homolog and exhibit statistical features comparable to intergenic regions, we suggests thatde novogene creation is a strong component in the evolution of the giant Pandoravirus genomes.

Download Full-text

Pandoravirus celtis illustrates the microevolution processes at work in the giant Pandoraviridae genomes

10.1101/500207 ◽

2018 ◽

Cited By ~ 1

Author(s):

Matthieu Legendre ◽

Jean-Marie Alempic ◽

Nadège Philippe ◽

Audrey Lartigue ◽

Sandra Jeudy ◽

...

Keyword(s):

De Novo ◽

Gene Repertoire ◽

Protein Coding ◽

Genomic Changes ◽

Coding Regions ◽

Protein Coding Genes ◽

Intergenic Regions ◽

Mere Existence ◽

Increasing Functions ◽

Similar Gene

AbstractWith genomes of up to 2.7 Mb propagated in µm-long oblong particles and initially predicted to encode more than 2000 proteins, members of the Pandoraviridae family display the most extreme features of the known viral world. The mere existence of such giant viruses raises fundamental questions about their origin and the processes governing their evolution. A previous analysis of six newly available isolates, independently confirmed by a study including 3 others, established that the Pandoraviridae pan-genome is open, meaning that each new strain exhibits protein-coding genes not previously identified in other family members. With an average increment of about 60 proteins, the gene repertoire shows no sign of reaching a limit and remains largely coding for proteins without recognizable homologs in other viruses or cells (ORFans). To explain these results, we proposed that most new protein-coding genes were created de novo, from pre-existing non-coding regions of the G+C rich pandoravirus genomes. The comparison of the gene content of a new isolate, P. celtis, closely related (96% identical genome) to the previously described P. quercus is now used to test this hypothesis by studying genomic changes in a microevolution range. Our results confirm that the differences between these two similar gene contents mostly consist of protein-coding genes without known homologs (ORFans), with statistical signatures close to that of intergenic regions. These newborn proteins are under slight negative selection, perhaps to maintain stable folds and prevent protein aggregation pending the eventual emergence of fitness-increasing functions. Our study also unraveled several insertion events mediated by a transposase of the hAT family, 3 copies of which are found in P. celtis and are presumably active. Members of the Pandoraviridae are presently the first viruses known to encode this type of transposase.

Download Full-text

Extreme purifying selection against point mutations in the human genome

10.1101/2021.08.23.457339 ◽

2021 ◽

Author(s):

Noah Dukler ◽

Mehreen R Mughal ◽

Ritika Ramani ◽

Yi-Fei Huang ◽

Adam Siepel

Keyword(s):

Human Genome ◽

De Novo ◽

Point Mutations ◽

Purifying Selection ◽

Selection Coefficient ◽

Sequencing Data ◽

Protein Coding ◽

Coding Regions ◽

Protein Coding Genes ◽

Selective Effects

Genome sequencing of tens of thousands of human individuals has recently enabled the measurement of large selective effects for mutations to protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring similar selective effects at individual sites in noncoding as well as in coding regions of the human genome. ExtRaINSIGHT estimates the prevalance of strong purifying selection, or "ultraselection" (λs), as the fractional depletion of rare single-nucleotide variants (minor allele frequency <0.1%) in a target set of genomic sites relative to matched sites that are putatively neutrally evolving, in a manner that controls for local variation and neighbor-dependence in mutation rate. We show using simulations that, above an appropriate threshold, λs is closely related to the average site-specific selection coefficient against heterozygous point mutations, as predicted at mutation-selection balance. Applying ExtRaINSIGHT to 71,702 whole genome sequences from gnomAD v3, we find particularly strong evidence of ultraselection in evolutionarily ancient miRNAs and neuronal protein-coding genes, as well as at splice sites. Moreover, our estimated selection coefficient against heterozygous amino-acid replacements across the genome (at 1.4%) is substantially larger than previous estimates based on smaller sample sizes. By contrast, we find weak evidence of ultraselection in other noncoding RNAs and transcription factor binding sites, and only modest evidence in ultraconserved elements and human accelerated regions. We estimate that ~0.3-0.5% of the human genome is ultraselected, with one third to one half of ultraselected sites falling in coding regions. These estimates suggest ~0.3-0.4 lethal or nearly lethal de novo mutations per potential human zygote, together with ~2 de novo mutations that are more weakly deleterious. Overall, our study sheds new light on the genome-wide distribution of fitness effects for new point mutations by combining deep new sequencing data sets and classical theory from population genetics.

Download Full-text

De novoemergence of adaptive membrane proteins from thymine-rich intergenic sequences

10.1101/621532 ◽

2019 ◽

Author(s):

Nikolaos Vakirlis ◽

Omer Acar ◽

Brian Hsu ◽

Nelson Castilho Coelho ◽

S. Branden Van Oss ◽

...

Keyword(s):

De Novo ◽

Transmembrane Proteins ◽

Protein Coding ◽

Coding Sequences ◽

Beneficial Effects ◽

Protein Coding Genes ◽

Evolutionary Innovation ◽

Intergenic Sequences ◽

Intergenic Regions ◽

Novel Protein

SummaryRecent evidence demonstrates that novel protein-coding genes can arisede novofrom intergenic loci. This evolutionary innovation is thought to be facilitated by the pervasive translation of intergenic transcripts, which exposes a reservoir of variable polypeptides to natural selection. Do intergenic translation events yield polypeptides with useful biochemical capacities? The answer to this question remains controversial. Here, we systematically characterized howde novoemerging coding sequences impact fitness. In budding yeast, overexpression of these sequences was enriched in beneficial effects, while their disruption was generally inconsequential. We found that beneficial emerging sequences have a strong tendency to encode putative transmembrane proteins, which appears to stem from a cryptic propensity for transmembrane signals throughout thymine-rich intergenic regions of the genome. These findings suggest that novel genes with useful biochemical capacities, such as transmembrane domains, tend to evolvede novowithin intergenic loci that already harbored a blueprint for these capacities.

Download Full-text

Comparative Genomics of Clinical Isolates of the Emerging Tick-Borne Pathogen Neoehrlichia mikurensis

Microorganisms ◽

10.3390/microorganisms9071488 ◽

2021 ◽

Vol 9 (7) ◽

pp. 1488

Author(s):

Anna Grankvist ◽

Daniel Jaén-Luchoro ◽

Linda Wass ◽

Per Sikora ◽

Christine Wennerås

Keyword(s):

Vascular Endothelium ◽

De Novo ◽

Phylogenetic Analyses ◽

Geographic Origin ◽

Comparative Genomic ◽

Whole Genome ◽

Illumina Hiseq ◽

Protein Coding ◽

Ehrlichia Ruminantium ◽

Protein Coding Genes

Tick-borne ‘Neoehrlichia (N.) mikurensis’ is the cause of neoehrlichiosis, an infectious vasculitis of humans. This strict intracellular pathogen is a member of the family Anaplasmataceae and has been unculturable until recently. The only available genetic data on this new pathogen are six partially sequenced housekeeping genes. The aim of this study was to advance the knowledge regarding ‘N. mikurensis’ genomic relatedness with other Anaplasmataceae members, intra-species genotypic variability and potential virulence factors explaining its tropism for vascular endothelium. Here, we present the de novo whole-genome sequences of three ‘N. mikurensis’ strains derived from Swedish patients diagnosed with neoehrlichiosis. The genomes were obtained by extraction of DNA from patient plasma, library preparation using 10x Chromium technology, and sequencing by Illumina Hiseq-4500. ‘N. mikurensis’ was found to have the next smallest genome of the Anaplasmataceae family (1.1 Mbp with 27% GC contents) consisting of 845 protein-coding genes, every third of which with unknown function. Comparative genomic analyses revealed that ‘N. mikurensis’ was more closely related to Ehrlichia chaffeensis than to Ehrlichia ruminantium, the opposite of what 16SrRNA sequence-based phylogenetic analyses determined. The genetic variability of the three whole-genome-sequenced ‘N. mikurensis’ strains was extremely low, between 0.14 and 0.22‰, a variation that was associated with geographic origin. No protein-coding genes exclusively shared by N. mikurensis and E. ruminantium were identified to explain their common tropism for vascular endothelium.

Download Full-text

TheDGCR5long noncoding RNA may regulate expression of several schizophrenia-related genes

Science Translational Medicine ◽

10.1126/scitranslmed.aat6912 ◽

2018 ◽

Vol 10 (472) ◽

pp. eaat6912 ◽

Cited By ~ 21

Author(s):

Qingtuan Meng ◽

Kangli Wang ◽

Tonya Brunetti ◽

Yan Xia ◽

Chuan Jiao ◽

...

Keyword(s):

Noncoding Rna ◽

Genome Wide Association Study ◽

De Novo ◽

Copy Number Variations ◽

Postmortem Brain ◽

Psychiatric Disease ◽

Potential Contribution ◽

Protein Coding ◽

Protein Coding Genes ◽

Postmortem Brain Tissue

A number of studies indicate that rare copy number variations (CNVs) contribute to the risk of schizophrenia (SCZ). Most of these studies have focused on protein-coding genes residing in the CNVs. Here, we investigated long noncoding RNAs (lncRNAs) within 10 SCZ risk–associated CNV deletion regions (CNV-lncRNAs) and examined their potential contribution to SCZ risk. We used RNA sequencing transcriptome data derived from postmortem brain tissue from control individuals without psychiatric disease as part of the PsychENCODE BrainGVEX and Developmental Capstone projects. We carried out weighted gene coexpression network analysis to identify protein-coding genes coexpressed with CNV-lncRNAs in the human brain. We identified one neuronal function–related coexpression module shared by both datasets. This module contained a lncRNA calledDGCR5within the 22q11.2 CNV region, which was identified as a hub gene. Protein-coding genes associated with SCZ genome-wide association study signals, de novo mutations, or differential expression were also contained in this neuronal module. UsingDGCR5knockdown and overexpression experiments in human neural progenitor cells derived from human induced pluripotent stem cells, we identified a potential role forDGCR5in regulating certain SCZ-related genes.

Download Full-text

From de novo to ‘de nono’: The majority of novel protein coding genes identified with phylostratigraphy are old genes or recent duplicates

Genome Biology and Evolution ◽

10.1093/gbe/evy231 ◽

2018 ◽

Cited By ~ 2

Author(s):

Claudio Casola

Keyword(s):

De Novo ◽

Protein Coding ◽

Protein Coding Genes ◽

Novel Protein

Download Full-text

Comparative genomic analysis of mitochondrial protein-coding genes in Veneroida clams: Analysis of superfamily-specific genomic and evolutionary features

Marine Genomics ◽

10.1016/j.margen.2015.08.004 ◽

2015 ◽

Vol 24 ◽

pp. 329-334 ◽

Cited By ~ 2

Author(s):

Jae Yeon Hwang ◽

Chang-kyu Lee ◽

Heebal Kim ◽

Bo-Hye Nam ◽

Cheul Min An ◽

...

Keyword(s):

Mitochondrial Protein ◽

Genomic Analysis ◽

Comparative Genomic Analysis ◽

Comparative Genomic ◽

Protein Coding ◽

Protein Coding Genes ◽

Evolutionary Features

Download Full-text

Chromosome-level assembly of Drosophila bifasciata reveals important karyotypic transition of the X chromosome

10.1101/847558 ◽

2019 ◽

Author(s):

Ryan Bracewell ◽

Anita Tran ◽

Kamalakar Chatla ◽

Doris Bachtrog

Keyword(s):

X Chromosome ◽

Genome Assembly ◽

De Novo ◽

Pericentromeric Region ◽

Species Group ◽

Chromosome 15 ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Read ◽

Chromosome Level

ABSTRACTThe Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromere, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.

Download Full-text