Genome-wide features of introns are evolutionary decoupled among themselves and from genome size throughout Eukarya

Mapping Intimacies ◽

10.1101/283549 ◽

2018 ◽

Cited By ~ 4

Author(s):

Irma Lozada-Chávez ◽

Peter F. Stadler ◽

Sonja J. Prohaska

Keyword(s):

Genome Size ◽

Evolutionary Dynamics ◽

Protein Coding ◽

Specific Level ◽

Spliceosomal Introns ◽

Major Mechanism ◽

Genome Wide ◽

A Genome ◽

Eukaryotic Gene ◽

The Impact

AbstractThe impact of spliceosomal introns on genome and organismal evolution remains puzzling. Here, we investigated the correlative associations among genome-wide features of introns from protein-coding genes (e.g., size, density, genome-content, repeats), genome size and multicellular complexity on 461 eukaryotes. Thus, we formally distinguished simple from complex multicellular organisms (CMOs), and developed the program GenomeContent to systematically estimate genomic traits. We performed robust phylogenetic controlled analyses, by taking into account significant uncertainties in the tree of eukaryotes and variation in genome size estimates. We found that changes in the variation of some intron features (such as size and repeat composition) are only weakly, while other features measuring intron abundance (within and across genes) are not, scaling with changes in genome size at the broadest phylogenetic scale. Accordingly, the strength of these associations fluctuates at the lineage-specific level, and changes in the length and abundance of introns within a genome are found to be largely evolving independently throughout Eukarya. Thereby, our findings are in disagreement with previous estimations claiming a concerted evolution between genome size and introns across eukaryotes. We also observe that intron features vary homogeneously (with low repetitive composition) within fungi, plants and stramenophiles; but they vary dramatically (with higher repetitive composition) within holozoans, chlorophytes, alveolates and amoebozoans. We also found that CMOs and their closest ancestral relatives are characterized by high intron-richness, regardless their genome size. These patterns contrast the narrow distribution of exon features found across eukaryotes. Collectively, our findings unveil spliceosomal introns as a dynamically evolving non-coding DNA class and strongly argue against both, a particular intron feature as key determinant of eukaryotic gene architecture, as well as a major mechanism (adaptive or non-adaptive) behind the evolutionary dynamics of introns over a large phylogenetic scale. We hypothesize that intron-richness is a pre-condition to evolve complex multicellularity.

Download Full-text

Genome-wide sexually antagonistic variants reveal longstanding constraints on sexual dimorphism in the fruitfly

10.1101/117176 ◽

2017 ◽

Cited By ~ 1

Author(s):

Filip Ruzicka ◽

Mark S. Hill ◽

Tanya M. Pennell ◽

Ilona Flis ◽

Fiona C. Ingleby ◽

...

Keyword(s):

Sexual Dimorphism ◽

Evolutionary Dynamics ◽

Genome Wide Association Study ◽

Balancing Selection ◽

Sexual Antagonism ◽

Protein Coding ◽

Genome Wide ◽

Genomic Location ◽

Classic Theory ◽

A Genome

The evolution of sexual dimorphism is constrained by a shared genome, leading to ‘sexual antagonism’ where different alleles at given loci are favoured by selection in males and females. Despite its wide taxonomic incidence, we know little about the identity, genomic location and evolutionary dynamics of antagonistic genetic variants. To address these deficits, we use sex-specific fitness data from 202 fully sequenced hemiclonal D. melanogaster fly lines to perform a genome-wide association study of sexual antagonism. We identify ~230 chromosomal clusters of candidate antagonistic SNPs. In contradiction to classic theory, we find no clear evidence that the X chromosome is a hotspot for sexually antagonistic variation. Characterising antagonistic SNPs functionally, we find a large excess of missense variants but little enrichment in terms of gene function. We also assess the evolutionary persistence of antagonistic variants by examining extant polymorphism in wild D. melanogaster populations. Remarkably, antagonistic variants are associated with multiple signatures of balancing selection across the D. melanogaster distribution range, indicating widespread and evolutionarily persistent (>10,000 years) genomic constraints. Based on our results, we propose that antagonistic variation accumulates due to constraints on the resolution of sexual conflict over protein coding sequences, thus contributing to the long-term maintenance of heritable fitness variation.

Download Full-text

A Nonsense Variant in Hephaestin Like 1 (HEPHL1) Is Responsible for Congenital Hypotrichosis in Belted Galloway Cattle

Genes ◽

10.3390/genes12050643 ◽

2021 ◽

Vol 12 (5) ◽

pp. 643

Author(s):

Thibaud Kuca ◽

Brandy M. Marron ◽

Joana G. P. Jacinto ◽

Julia M. Paris ◽

Christian Gerspach ◽

...

Keyword(s):

Genome Wide Association Study ◽

Homozygosity Mapping ◽

Mendelian Inheritance ◽

Large Animal Model ◽

Large Animal ◽

Loss Of Function ◽

Protein Coding ◽

Positional Candidate ◽

Genome Wide ◽

A Genome

Genodermatosis such as hair disorders mostly follow a monogenic mode of inheritance. Congenital hypotrichosis (HY) belong to this group of disorders and is characterized by abnormally reduced hair since birth. The purpose of this study was to characterize the clinical phenotype of a breed-specific non-syndromic form of HY in Belted Galloway cattle and to identify the causative genetic variant for this recessive disorder. An affected calf born in Switzerland presented with multiple small to large areas of alopecia on the limbs and on the dorsal part of the head, neck, and back. A genome-wide association study using Swiss and US Belted Galloway cattle encompassing 12 cases and 61 controls revealed an association signal on chromosome 29. Homozygosity mapping in a subset of cases refined the HY locus to a 1.5 Mb critical interval and subsequent Sanger sequencing of protein-coding exons of positional candidate genes revealed a stop gain variant in the HEPHL1 gene that encodes a multi-copper ferroxidase protein so-called hephaestin like 1 (c.1684A>T; p.Lys562*). A perfect concordance between the homozygous presence of this most likely pathogenic loss-of-function variant and the HY phenotype was found. Genotyping of more than 700 purebred Swiss and US Belted Galloway cattle showed the global spread of the mutation. This study provides a molecular test that will permit the avoidance of risk matings by systematic genotyping of relevant breeding animals. This rare recessive HEPHL1-related form of hypotrichosis provides a novel large animal model for similar human conditions. The results have been incorporated in the Online Mendelian Inheritance in Animals (OMIA) database (OMIA 002230-9913).

Download Full-text

Machine-learning annotation of human splicing branchpoints

10.1101/094003 ◽

2016 ◽

Cited By ~ 3

Author(s):

Bethany Signal ◽

Brian S Gloss ◽

Marcel E Dinger ◽

Timothy R Mercer

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Gene Splicing ◽

Genetic Encoding ◽

Genome Wide ◽

Common Genetic Variants ◽

A Genome ◽

Wide Scale ◽

The Impact ◽

Splicing Patterns

ABSTRACTBackgroundThe branchpoint element is required for the first lariat-forming reaction in splicing. However due to difficulty in experimentally mapping at a genome-wide scale, current catalogues are incomplete.ResultsWe have developed a machine-learning algorithm trained with empirical human branchpoint annotations to identify branchpoint elements from primary genome sequence alone. Using this approach, we can accurately locate branchpoints elements in 85% of introns in current gene annotations. Consistent with branchpoints as basal genetic elements, we find our annotation is unbiased towards gene type and expression levels. A major fraction of introns was found to encode multiple branchpoints raising the prospect that mutational redundancy is encoded in key genes. We also confirmed all deleterious branchpoint mutations annotated in clinical variant databases, and further identified thousands of clinical and common genetic variants with similar predicted effects.ConclusionsWe propose the broad annotation of branchpoints constitutes a valuable resource for further investigations into the genetic encoding of splicing patterns, and interpreting the impact of common- and disease-causing human genetic variation on gene splicing.

Download Full-text

The Genomic Ecosystem of Transposable Elements in Maize

10.1101/559922 ◽

2019 ◽

Cited By ~ 18

Author(s):

Michelle C. Stitzer ◽

Sarah N. Anderson ◽

Nathan M. Springer ◽

Jeffrey Ross-Ibarra

Keyword(s):

Transposable Elements ◽

Evolutionary Dynamics ◽

Phenotypic Diversity ◽

Flowering Plant ◽

Genome Wide ◽

Family Level ◽

Genomic Environment ◽

Single Category ◽

The Impact

Transposable elements (TEs) constitute the majority of flowering plant DNA, reflecting their tremendous success in subverting, avoiding, and surviving the defenses of their host genomes to ensure their selfish replication. More than 85% of the sequence of the maize genome can be ascribed to past transposition, providing a major contribution to the structure of the genome. Evidence from individual loci has informed our understanding of how transposition has shaped the genome, and a number of individual TE insertions have been causally linked to dramatic phenotypic changes. But genome-wide analyses in maize and other taxa have frequently represented TEs as a relatively homogeneous class of fragmentary relics of past transposition, obscuring their evolutionary history and interaction with their host genome. Using an updated annotation of structurally intact TEs in the maize reference genome, we investigate the family-level ecological and evolutionary dynamics of TEs in maize. Integrating a variety of data, from descriptors of individual TEs like coding capacity, expression, and methylation, as well as similar features of the sequence they inserted into, we model the relationship between these attributes of the genomic environment and the survival of TE copies and families. Our analyses reveal a diversity of ecological strategies of TE families, each representing the evolution of a distinct ecological niche allowing survival of the TE family. In contrast to the wholesale relegation of all TEs to a single category of junk DNA, these differences generate a rich ecology of the genome, suggesting families of TEs that coexist in time and space compete and cooperate with each other. We conclude that while the impact of transposition is highly family- and context-dependent, a family-level understanding of the ecology of TEs in the genome can refine our ability to predict the role of TEs in generating genetic and phenotypic diversity.‘Lumping our beautiful collection of transposons into a single category is a crime’-Michael R. Freeling, Mar. 10, 2017

Download Full-text

Meta-analysis of transcriptomic data reveals clusters of consistently deregulated gene and disease ontologies in Down syndrome

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009317 ◽

2021 ◽

Vol 17 (9) ◽

pp. e1009317

Author(s):

Ilario De Toma ◽

Cesar Sierra ◽

Mara Dierssen

Keyword(s):

Down Syndrome ◽

Differential Expression ◽

Web Application ◽

Meta Analysis ◽

Chromosome 21 ◽

Complex Disorders ◽

Transcriptomic Data ◽

Genome Wide ◽

A Genome ◽

The Impact

Trisomy of human chromosome 21 (HSA21) causes Down syndrome (DS). The trisomy does not simply result in the upregulation of HSA21--encoded genes but also leads to a genome-wide transcriptomic deregulation, which affect differently each tissue and cell type as a result of epigenetic mechanisms and protein-protein interactions. We performed a meta-analysis integrating the differential expression (DE) analyses of all publicly available transcriptomic datasets, both in human and mouse, comparing trisomic and euploid transcriptomes from different sources. We integrated all these data in a “DS network”. We found that genome wide deregulation as a consequence of trisomy 21 is not arbitrary, but involves deregulation of specific molecular cascades in which both HSA21 genes and HSA21 interactors are more consistently deregulated compared to other genes. In fact, gene deregulation happens in “clusters”, so that groups from 2 to 13 genes are found consistently deregulated. Most of these events of “co-deregulation” involve genes belonging to the same GO category, and genes associated with the same disease class. The most consistent changes are enriched in interferon related categories and neutrophil activation, reinforcing the concept that DS is an inflammatory disease. Our results also suggest that the impact of the trisomy might diverge in each tissue due to the different gene set deregulation, even though the triplicated genes are the same. Our original method to integrate transcriptomic data confirmed not only the importance of known genes, such as SOD1, but also detected new ones that could be extremely useful for generating or confirming hypotheses and supporting new putative therapeutic candidates. We created “metaDEA” an R package that uses our method to integrate every kind of transcriptomic data and therefore could be used with other complex disorders, such as cancer. We also created a user-friendly web application to query Ensembl gene IDs and retrieve all the information of their differential expression across the datasets.

Download Full-text

A porcine brain-wide RNA editing landscape

10.21203/rs.3.rs-110949/v1 ◽

2020 ◽

Author(s):

Jinrong Huang ◽

Lin Lin ◽

Zhanying Dong ◽

Ling Yang ◽

Tianyu Zheng ◽

...

Keyword(s):

Rna Editing ◽

Repetitive Sequences ◽

Brain Regions ◽

Mammalian Brain ◽

Protein Coding ◽

Porcine Brain ◽

Coding Regions ◽

Pig Brain ◽

Genome Wide ◽

A Genome

Abstract Adenosine-to-inosine (A-to-I) RNA editing, catalyzed by ADAR enzymes, is an essential post-transcriptional modiﬁcation. Although hundreds of thousands of RNA editing sites have been reported in mammals, brain-wide analysis of the RNA editing in the mammalian brain remains rare. Here, a genome-wide RNA editing investigation is performed in 119 samples, representing 30 anatomically defined subregions in the pig brain. We identify a total of 682,037 A-to-I RNA editing sites of which 97% are not identified before. Within the pig brain, cerebellum and olfactory bulb are regions with most edited transcripts. The editing level of sites residing in protein-coding regions are similar across brain regions, whereas region-distinct editing is observed in repetitive sequences. Highly edited conserved recoding events in pig and human brain are found in neurotransmitter receptors, demonstrating the evolutionary importance of RNA editing in neurotransmission functions. The porcine brain-wide RNA landscape provides a rich resource to better understand the evolutionally importance of post-transcriptional RNA editing.

Download Full-text

Genome-Wide Copy Number Variation Association Study of Atrial Fibrillation Related Thromboembolic Stroke

Journal of Clinical Medicine ◽

10.3390/jcm8030332 ◽

2019 ◽

Vol 8 (3) ◽

pp. 332 ◽

Cited By ~ 4

Author(s):

Chia-Shan Hsieh ◽

Pang-Shuo Huang ◽

Sheng-Nan Chang ◽

Cho-Kai Wu ◽

Juey-Jen Hwang ◽

...

Keyword(s):

Atrial Fibrillation ◽

Signaling Pathway ◽

Copy Number ◽

Genetic Factors ◽

Copy Number Variations ◽

Nucleotide Polymorphisms ◽

Thromboembolic Stroke ◽

Genome Wide ◽

A Genome ◽

The Impact

Atrial fibrillation (AF) is a common cardiac arrhythmia and is one of the major causes of ischemic stroke. In addition to the clinical factors such as CHADS2 or CHADS2-VASC score, the impact of genetic factors on the risk of thromboembolic stroke in patients with AF has been largely unknown. Single-nucleotide polymorphisms in several genomic regions have been found to be associated with AF. However, these loci do not contribute to all the genetic risks of AF or AF related thromboembolic risks, suggesting that there are other genetic factors or variants not yet discovered. In the human genome, copy number variations (CNVs) could also contribute to disease susceptibility. In the present study, we sought to identify CNVs determining the AF-related thromboembolic risk. Using a genome-wide approach in 109 patients with AF and thromboembolic stroke and 14,666 controls from the Taiwanese general population (Taiwan Biobank), we first identified deletions in chromosomal regions 1p36.32-1p36.33, 5p15.33, 8q24.3 and 19p13.3 and amplifications in 14q11.2 that were significantly associated with AF-related stroke in the Taiwanese population. In these regions, 148 genes were involved, including several microRNAs and long non-recoding RNAs. Using a pathway analysis, we found deletions in GNB1, PRKCZ, and GNG7 genes related to the alpha-adrenergic receptor signaling pathway that play a major role in determining the risk of an AF-related stroke. In conclusion, CNVs may be genetic predictors of a risk of a thromboembolic stroke for patients with AF, possibly pointing to an impaired alpha-adrenergic signaling pathway in the mechanism of AF-related thromboembolism.

Download Full-text

Contribution of retrotransposition to developmental disorders

Nature Communications ◽

10.1038/s41467-019-12520-y ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 10

Author(s):

Eugene J. Gardner ◽

Elena Prigmore ◽

Giuseppe Gallone ◽

Petr Danecek ◽

Kaitlin E. Samocha ◽

...

Keyword(s):

Developmental Disorders ◽

De Novo ◽

Purifying Selection ◽

Selective Constraint ◽

Protein Coding ◽

Genome Wide ◽

De Novo Gene ◽

The Impact ◽

Transcribed Sequences

Abstract Mobile genetic Elements (MEs) are segments of DNA which can copy themselves and other transcribed sequences through the process of retrotransposition (RT). In humans several disorders have been attributed to RT, but the role of RT in severe developmental disorders (DD) has not yet been explored. Here we identify RT-derived events in 9738 exome sequenced trios with DD-affected probands. We ascertain 9 de novo MEs, 4 of which are likely causative of the patient’s symptoms (0.04%), as well as 2 de novo gene retroduplications. Beyond identifying likely diagnostic RT events, we estimate genome-wide germline ME mutation rate and selective constraint and demonstrate that coding RT events have signatures of purifying selection equivalent to those of truncating mutations. Overall, our analysis represents a comprehensive interrogation of the impact of retrotransposition on protein coding genes and a framework for future evolutionary and disease studies.

Download Full-text

Identification of novel sources of resistance to ascochyta blight in a collection of wild Cicer accessions

Phytopathology ◽

10.1094/phyto-04-20-0137-r ◽

2020 ◽

Cited By ~ 1

Author(s):

Toby E. Newman ◽

Silke Jacques ◽

Christy Grime ◽

Fiona L. Kamphuis ◽

Robert C. Lee ◽

...

Keyword(s):

Genome Wide Association Study ◽

Ascochyta Blight ◽

Ascochyta Rabiei ◽

Sources Of Resistance ◽

Highly Pathogenic ◽

Cicer Echinospermum ◽

Genome Wide ◽

A Genome ◽

The Impact ◽

Chickpea Cultivars

Chickpea production is constrained worldwide by the necrotrophic fungal pathogen Ascochyta rabiei, the causal agent of ascochyta blight (AB). In order to reduce the impact of this disease, novel sources of resistance are required in chickpea cultivars. Here, we screened a new collection of wild Cicer accessions for AB resistance and identified accessions resistant to multiple, highly pathogenic isolates. In addition to this, analyses demonstrated that some collection sites of Cicer echinospermum harbour predominantly resistant accessions, knowledge that can inform future collection missions. Furthermore, a genome-wide association study identified regions of the Cicer reticulatum genome associated with AB resistance and investigation of these regions identified candidate resistance genes. Taken together, these results can be utilised to enhance the resistance of chickpea cultivars to this globally yield-limiting disease.

Download Full-text

Identification of ERG Specific Target Genes by Genome-Wide Screening in T-Lymphoblastic Leukemia

Blood ◽

10.1182/blood.v112.11.3788.3788 ◽

2008 ◽

Vol 112 (11) ◽

pp. 3788-3788

Author(s):

Liliana H Mochmann ◽

Konrad Neumann ◽

Juliane Bock ◽

Jutta Ortiz Tanchez ◽

Arend Bohne ◽

...

Keyword(s):

Target Genes ◽

Lymphoblastic Leukemia ◽

Specific Treatment ◽

P Value ◽

Dna Templates ◽

Genome Wide ◽

A Genome ◽

On Chip ◽

T Cell Leukemogenesis ◽

The Impact

Abstract The Ets related gene, ERG, encodes a transcription factor with a vital role in hematopoiesis. Recent findings have shown that ERG knockout mice require a minimum of one functional allele to ensure embryonic blood development and adult stem cell maintenance. Moreover, it was earlier reported that enforced expression of ERG induced oncogenic transformation in 3T3 cells. Overexpression of ERG, observed in a subset of acute T-lymphoblastic and acute myeloid leukemia patients, was associated with an inferior outcome. However, the impact of ERG contributing to this unfavourable phenotype has yet to be determined, as downstream targets of ERG in leukemia remain unknown. Herein, we conducted a genome-wide analysis of ERG target genes in T-lymphoblastic leukemia. Chromatin immunoprecipitation-on-chip array (ChIP-on-chip) analyses were performed using two ERG specific antibodies for the enrichment of ERG-bound DNA templates in T-lymphoblastic leukemia cells (Jurkat) with input DNA or IgG precipitated DNA as controls. Enriched DNA templates and control DNA were differentially labelled and co-hybridized to high resolution promoter chip arrays with 50–75mer probes (770,000) representing 29,000 annotated human transcripts (NimbleGen). Based on two independent ChIP-on-chip assays, bioinformatic analysis (ACME) yielded statistically significant enriched peaks (using a sliding window of 1000 bp, and a P-value < 0.0001) identifying promoter regions of 365 potential ERG target genes. From these genes, clustering by functional annotation was performed using the DAVID database and subsequently genes related to leukemia were further selected for quantitative PCR validation. The design of promoter primers included the highly conserved ETS GGAA DNA binding site. Genes with greater than two-fold enrichment (ERG ChIP versus control) included WNT2 (17-fold), OLIG2 (14-fold), WNT11 (7-fold), CCND1 (5-fold), WNT9A (4-fold), CD7 (3-fold), EPO (3-fold), ERBB4 (3-fold), RPBJL (3-fold), TRADD (3-fold), PIWIL1 (2-fold), TNFRSF25 (2-fold), TWIST1 (2-fold), and HDAC4 (2-fold). Interestingly, enriched target genes involved in developmental processes (WNT2, WNT9A, WNT11, TWIST1, PIWIL1, ERBB4, and OLIG2) have shown oncogenic potential when mutated or overexpressed. Thus, we hypothesize that overexpression of ERG may contribute to T-cell leukemogenesis by the deregulation of these oncogenic targets. Further disclosure of ERG directed downstream pathways may contribute to the design of specific treatment strategies (such as WNT inhibitors) with particular effectiveness in ERG deregulated leukemia.

Download Full-text