Pseudogenes in the mouse lineage: transcriptional activity and strain-specific history

Mapping Intimacies ◽

10.1101/386656 ◽

2018 ◽

Author(s):

Cristina Sisu ◽

Paul Muir ◽

Adam Frankish ◽

Ian Fiddes ◽

Mark Diekhans ◽

...

Keyword(s):

Evolutionary History ◽

Age Distribution ◽

Automatic Annotation ◽

Strain Specificity ◽

Protein Coding ◽

Manual Curation ◽

Genome Wide ◽

Processed Pseudogenes ◽

Mouse Reference Genome ◽

Human And Mouse

Pseudogenes are ideal markers of genome remodeling. In turn, the mouse is an ideal platform for studying them, particularly with the availability of developmental transcriptional data and the sequencing of 18 strains. Here, we present a comprehensive genome-wide annotation of the pseudogenes in the mouse reference genome and associated strains. We compiled this by combining manual curation of over 10,000 pseudogenes with results from automatic annotation pipelines. Also, by comparing the human and mouse, we annotated 165 unitary pseudogenes in mouse, and 303 unitaries in human. We make all our annotation available through mouse.pseudogene.org. The overall mouse pseudogene repertoire (in the reference and strains) is similar to human in terms of overall size, biotype distribution (~80% processed/~20% duplicated) and top family composition (with many GAPDH and ribosomal pseudogenes). However, notable differences arise in the pseudogene age distribution, with multiple retro-transpositional bursts in mouse evolutionary history and only one in human. Furthermore, in each strain about a fifth of the pseudogenes are unique, reflecting strain-specific functions and evolution. Additionally, we find that ~15% of the pseudogenes are transcribed, a fraction similar to that for human, and that pseudogene transcription exhibits greater tissue and strain specificity compared to protein-coding genes. Finally, we show that highly transcribed parent genes tend to give rise to processed pseudogenes.

Download Full-text

GENCODE 2021

Nucleic Acids Research ◽

10.1093/nar/gkaa1087 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D916-D923

Author(s):

Adam Frankish ◽

Mark Diekhans ◽

Irwin Jungreis ◽

Julien Lagarde ◽

Jane E Loveland ◽

...

Keyword(s):

Reference Genome ◽

Ucsc Genome Browser ◽

Primary Data ◽

Protein Coding ◽

Bioinformatic Tools ◽

Automated Annotation ◽

First Pass ◽

Mouse Reference Genome ◽

Human And Mouse

Abstract The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.

Download Full-text

Chromosome-level reference genome of the jellyfish Rhopilema esculentum

GigaScience ◽

10.1093/gigascience/giaa036 ◽

2020 ◽

Vol 9 (4) ◽

Cited By ~ 3

Author(s):

Yunfeng Li ◽

Lei Gao ◽

Yongjia Pan ◽

Meilin Tian ◽

Yulong Li ◽

...

Keyword(s):

Genome Assembly ◽

Evolutionary History ◽

Reference Genome ◽

Protein Coding ◽

Fishery Resource ◽

Rhopilema Esculentum ◽

Sequencing Technologies ◽

Genome Wide ◽

History Of ◽

Chromosome Level

Abstract Background Jellyfish belong to the phylum Cnidaria, which occupies an important phylogenetic location in the early-branching Metazoa lineages. The jellyfish Rhopilema esculentum is an important fishery resource in China. However, the genome resource of R. esculentum has not been reported to date. Findings In this study, we constructed a chromosome-level genome assembly of R. esculentum using Pacific Biosciences, Illumina, and Hi-C sequencing technologies. The final genome assembly was ∼275.42 Mb, with a contig N50 length of 1.13 Mb. Using Hi-C technology to identify the contacts among contigs, 260.17 Mb (94.46%) of the assembled genome were anchored onto 21 pseudochromosomes with a scaffold N50 of 12.97 Mb. We identified 17,219 protein-coding genes, with an average CDS length of 1,575 bp. The genome-wide phylogenetic analysis indicated that R. esculentum might have evolved more slowly than the other scyphozoan species used in this study. In addition, 127 toxin-like genes were identified, and 1 toxin-related “hub” was found by a genomic survey. Conclusions We have generated a chromosome-level genome assembly of R. esculentum that could provide a valuable genomic background for studying the biology and pharmacology of jellyfish, as well as the evolutionary history of Cnidaria.

Download Full-text

False gene and chromosome losses affected by assembly and sequence errors

10.1101/2021.04.09.438906 ◽

2021 ◽

Author(s):

Juwan Kim ◽

Chul Lee ◽

Byung June Ko ◽

DongAhn Yoo ◽

Sohyoung Won ◽

...

Keyword(s):

Genomic Sequence ◽

Protein Coding ◽

Manual Curation ◽

Genome Wide ◽

Long Reads ◽

High Gene ◽

Assembly Algorithms ◽

Genome Assemblies ◽

Regulatory Landscapes ◽

High Gene Density

Many genome assemblies have been found to be incomplete and contain mis-assemblies. The Vertebrate Genomes Project (VGP) has been producing assemblies with an emphasis on being as complete and error-free as possible, utilizing long reads, long-range scaffolding data, new assembly algorithms, and manual curation. Here we evaluate these new vertebrate genome assemblies relative to the previous references for the same species, including a mammal (platypus), two birds (zebra finch, Anna's hummingbird), and a fish (climbing perch). We found that 3 to 11% of genomic sequence was entirely missing in the previous reference assemblies, which included nearly entire GC-rich and repeat-rich microchromosomes with high gene density. Genome-wide, between 25 to 60% of the genes were either completely or partially missing in the previous assemblies, and this was in part due to a bias in GC-rich 5'-proximal promoters and 5' exon regions. Our findings reveal novel regulatory landscapes and protein coding sequences that have been greatly underestimated in previous assemblies and are now present in the VGP assemblies.

Download Full-text

Faculty Opinions recommendation of A systematic genome-wide analysis of zebrafish protein-coding gene function.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718007453.793477821 ◽

2013 ◽

Author(s):

Martin Lowe

Keyword(s):

Gene Function ◽

Protein Coding ◽

Genome Wide Analysis ◽

Genome Wide

Download Full-text

The open targets post-GWAS analysis pipeline

Bioinformatics ◽

10.1093/bioinformatics/btaa020 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2936-2937 ◽

Cited By ~ 4

Author(s):

Gareth Peat ◽

William Jones ◽

Michael Nuhn ◽

José Carlos Marugán ◽

William Newell ◽

...

Keyword(s):

Drug Targets ◽

Gene Expression Regulation ◽

Association Studies ◽

Genome Wide Association Studies ◽

Protein Coding ◽

Data Resource ◽

Coding Regions ◽

Genome Wide ◽

Causal Genes ◽

Interactive Data

Abstract Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.

Download Full-text

Promoter Methylation of PRKCB, ADAMTS12, and NAALAD2 Is Specific to Prostate Cancer and Predicts Biochemical Disease Recurrence

International Journal of Molecular Sciences ◽

10.3390/ijms22116091 ◽

2021 ◽

Vol 22 (11) ◽

pp. 6091

Author(s):

Kristina Daniunaite ◽

Arnas Bakavicius ◽

Kristina Zukauskaite ◽

Ieva Rauluseviciute ◽

Juozas Rimantas Lazutka ◽

...

Keyword(s):

Prostate Cancer ◽

Clinical Practice ◽

Promoter Methylation ◽

Disease Recurrence ◽

The Cancer Genome Atlas ◽

Cancer Dataset ◽

Protein Coding ◽

Diagnosis And Prognosis ◽

Genome Wide ◽

Cancer Genome Atlas

The molecular diversity of prostate cancer (PCa) has been demonstrated by recent genome-wide studies, proposing a significant number of different molecular markers. However, only a few of them have been transferred into clinical practice so far. The present study aimed to identify and validate novel DNA methylation biomarkers for PCa diagnosis and prognosis. Microarray-based methylome data of well-characterized cancerous and noncancerous prostate tissue (NPT) pairs was used for the initial screening. Ten protein-coding genes were selected for validation in a set of 151 PCa, 51 NPT, as well as 17 benign prostatic hyperplasia samples. The Prostate Cancer Dataset (PRAD) of The Cancer Genome Atlas (TCGA) was utilized for independent validation of our findings. Methylation frequencies of ADAMTS12, CCDC181, FILIP1L, NAALAD2, PRKCB, and ZMIZ1 were up to 91% in our study. PCa specific methylation of ADAMTS12, CCDC181, NAALAD2, and PRKCB was demonstrated by qualitative and quantitative means (all p < 0.05). In agreement with PRAD, promoter methylation of these four genes was associated with the transcript down-regulation in the Lithuanian cohort (all p < 0.05). Methylation of ADAMTS12, NAALAD2, and PRKCB was independently predictive for biochemical disease recurrence, while NAALAD2 and PRKCB increased the prognostic power of multivariate models (all p < 0.01). The present study identified methylation of ADAMTS12, NAALAD2, and PRKCB as novel diagnostic and prognostic PCa biomarkers that might guide treatment decisions in clinical practice.

Download Full-text

Alu RNA Modulates the Expression of Cell Cycle Genes in Human Fibroblasts

International Journal of Molecular Sciences ◽

10.3390/ijms20133315 ◽

2019 ◽

Vol 20 (13) ◽

pp. 3315 ◽

Cited By ~ 3

Author(s):

Simona Cantarella ◽

Davide Carnevali ◽

Marco Morselli ◽

Anastasia Conti ◽

Matteo Pellegrini ◽

...

Keyword(s):

Cell Cycle ◽

Cell Cycle Progression ◽

Human Fibroblasts ◽

Rna Polymerase Iii ◽

Protein Coding ◽

Non Coding Rna ◽

Genome Wide ◽

Hela Cell Lines ◽

Significant Enrichment ◽

Alu Rna

Alu retroelements, whose retrotransposition requires prior transcription by RNA polymerase III to generate Alu RNAs, represent the most numerous non-coding RNA (ncRNA) gene family in the human genome. Alu transcription is generally kept to extremely low levels by tight epigenetic silencing, but it has been reported to increase under different types of cell perturbation, such as viral infection and cancer. Alu RNAs, being able to act as gene expression modulators, may be directly involved in the mechanisms determining cellular behavior in such perturbed states. To directly address the regulatory potential of Alu RNAs, we generated IMR90 fibroblasts and HeLa cell lines stably overexpressing two slightly different Alu RNAs, and analyzed genome-wide the expression changes of protein-coding genes through RNA-sequencing. Among the genes that were upregulated or downregulated in response to Alu overexpression in IMR90, but not in HeLa cells, we found a highly significant enrichment of pathways involved in cell cycle progression and mitotic entry. Accordingly, Alu overexpression was found to promote transition from G1 to S phase, as revealed by flow cytometry. Therefore, increased Alu RNA may contribute to sustained cell proliferation, which is an important factor of cancer development and progression.

Download Full-text

A Nonsense Variant in Hephaestin Like 1 (HEPHL1) Is Responsible for Congenital Hypotrichosis in Belted Galloway Cattle

Genes ◽

10.3390/genes12050643 ◽

2021 ◽

Vol 12 (5) ◽

pp. 643

Author(s):

Thibaud Kuca ◽

Brandy M. Marron ◽

Joana G. P. Jacinto ◽

Julia M. Paris ◽

Christian Gerspach ◽

...

Keyword(s):

Genome Wide Association Study ◽

Homozygosity Mapping ◽

Mendelian Inheritance ◽

Large Animal Model ◽

Large Animal ◽

Loss Of Function ◽

Protein Coding ◽

Positional Candidate ◽

Genome Wide ◽

A Genome

Genodermatosis such as hair disorders mostly follow a monogenic mode of inheritance. Congenital hypotrichosis (HY) belong to this group of disorders and is characterized by abnormally reduced hair since birth. The purpose of this study was to characterize the clinical phenotype of a breed-specific non-syndromic form of HY in Belted Galloway cattle and to identify the causative genetic variant for this recessive disorder. An affected calf born in Switzerland presented with multiple small to large areas of alopecia on the limbs and on the dorsal part of the head, neck, and back. A genome-wide association study using Swiss and US Belted Galloway cattle encompassing 12 cases and 61 controls revealed an association signal on chromosome 29. Homozygosity mapping in a subset of cases refined the HY locus to a 1.5 Mb critical interval and subsequent Sanger sequencing of protein-coding exons of positional candidate genes revealed a stop gain variant in the HEPHL1 gene that encodes a multi-copper ferroxidase protein so-called hephaestin like 1 (c.1684A>T; p.Lys562*). A perfect concordance between the homozygous presence of this most likely pathogenic loss-of-function variant and the HY phenotype was found. Genotyping of more than 700 purebred Swiss and US Belted Galloway cattle showed the global spread of the mutation. This study provides a molecular test that will permit the avoidance of risk matings by systematic genotyping of relevant breeding animals. This rare recessive HEPHL1-related form of hypotrichosis provides a novel large animal model for similar human conditions. The results have been incorporated in the Online Mendelian Inheritance in Animals (OMIA) database (OMIA 002230-9913).

Download Full-text

Learning a genome-wide score of human–mouse conservation at the functional genomics level

Nature Communications ◽

10.1038/s41467-021-22653-8 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Soo Bin Kwon ◽

Jason Ernst

Keyword(s):

Mouse Model ◽

Functional Genomics ◽

Functional Genomic ◽

Transcriptomic Data ◽

Model Studies ◽

Genome Wide ◽

A Genome ◽

Important Challenge ◽

Genomic Regions ◽

Human And Mouse

AbstractIdentifying genomic regions with functional genomic properties that are conserved between human and mouse is an important challenge in the context of mouse model studies. To address this, we develop a method to learn a score of evidence of conservation at the functional genomics level by integrating information from a compendium of epigenomic, transcription factor binding, and transcriptomic data from human and mouse. The method, Learning Evidence of Conservation from Integrated Functional genomic annotations (LECIF), trains neural networks to generate this score for the human and mouse genomes. The resulting LECIF score highlights human and mouse regions with shared functional genomic properties and captures correspondence of biologically similar human and mouse annotations. Analysis with independent datasets shows the score also highlights loci associated with similar phenotypes in both species. LECIF will be a resource for mouse model studies by identifying loci whose functional genomic properties are likely conserved.

Download Full-text