PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions

M. F. Lin; I. Jungreis; M. Kellis

doi:10.1093/bioinformatics/btr209

Evolutionarily conserved non-protein-coding regions in the chicken genome harbor functionally important variation

10.1101/2020.03.27.012005 ◽

2020 ◽

Cited By ~ 1

Author(s):

Christian Groß ◽

Chiara Bortoluzzi ◽

Dick de Ridder ◽

Hendrik-Jan Megens ◽

Martien AM Groenen ◽

...

Keyword(s):

Comparative Genomics ◽

Chicken Genome ◽

Population Genomics ◽

Purifying Selection ◽

Disease Genes ◽

Functional Importance ◽

Protein Coding ◽

Frequency Distributions ◽

Functional Studies ◽

Coding Regions

AbstractThe availability of genomes for many species has advanced our understanding of the non-protein-coding fraction of the genome. Comparative genomics has proven to be an invaluable approach for the systematic, genome-wide identification of conserved non-protein-coding elements (CNEs). However, for many non-mammalian model species, including chicken, our capability to interpret the functional importance of variants overlapping CNEs has been limited by current genomic annotations, which rely on a single information type (e.g. conservation). We here studied CNEs in chicken using a combination of population genomics and comparative genomics. To investigate the functional importance of variants found in CNEs we develop a ch(icken) Combined Annotation-Dependent Depletion (chCADD), a variant effect prediction tool first introduced for humans and later on for mouse and pig. We show that 73 Mb of the chicken genome has been conserved across more than 280 million years of vertebrate evolution. The vast majority of the conserved elements are in non-protein-coding regions, which display SNP densities and allele frequency distributions characteristic of genomic regions constrained by purifying selection. By annotating SNPs with the chCADD score we are able to pinpoint specific subregions of the CNEs to be of higher functional importance, as supported by SNPs found in these subregions are associated with known disease genes in humans, mice, and rats. Taken together, our findings indicate that CNEs harbor variants of functional significance that should be object of further investigation along with protein-coding mutations. We therefore anticipate chCADD to be of great use to the scientific community and breeding companies in future functional studies in chicken.

Download Full-text

PhyloCSF: a comparative genomics method to distinguish protein-coding and non-coding regions

Nature Precedings ◽

10.1038/npre.2010.4784.1 ◽

2010 ◽

Cited By ~ 1

Author(s):

Michael Lin ◽

Irwin Jungreis ◽

Manolis Kellis

Keyword(s):

Comparative Genomics ◽

Protein Coding ◽

Coding Regions

Download Full-text

Fitness Landscape of the Fission Yeast Genome

10.1101/398024 ◽

2018 ◽

Author(s):

Leanne Grech ◽

Daniel Charlton Jeffares ◽

Christoph Yves Sadée ◽

María Rodríguez-López ◽

Danny Asher Bitton ◽

...

Keyword(s):

Comparative Genomics ◽

Fitness Landscape ◽

Yeast Genome ◽

Fine Scale ◽

Functional Elements ◽

Protein Coding ◽

Functional Roles ◽

Coding Regions ◽

Transposon Insertions ◽

Diversity Studies

AbstractBackgroundNon-protein-coding regions of eukaryotic genomes remain poorly understood. Diversity studies, comparative genomics and biochemical outputs of genomic sites can be indicators of functional elements, but none produce fine-scale genome-wide descriptions of all functional elements.ResultsTowards the generation of a comprehensive description of functional elements in the haploid Schizosaccharomyces pombe genome, we generated transposon mutagenesis libraries to a density of one insertion per 13 nucleotides of the genome. We applied a five-state hidden Markov model (HMM) to characterise insertion-depleted regions at nucleotide-level resolution. HMM-defined functional constraint was consistent with genetic diversity, comparative genomics, gene-expression data and genome annotation.ConclusionsWe infer that transposon insertions lead to fitness consequences in 90% of the genome, including 80% of the non-protein-coding regions, reflecting the presence of numerous non-coding elements in this compact genome that have functional roles. Display of this data in genome browsers provides fine-scale views of structure-function relationships within specific genes.

Download Full-text

Evolutionary Analysis of DNA-Protein-Coding Regions Based on a Genetic Code Cube Metric

Current Topics in Medicinal Chemistry ◽

10.2174/1568026613666131204110022 ◽

2014 ◽

Vol 14 (3) ◽

pp. 407-417

Author(s):

Robersy Sanchez

Keyword(s):

Genetic Code ◽

Evolutionary Analysis ◽

Protein Coding ◽

Coding Regions

Download Full-text

The open targets post-GWAS analysis pipeline

Bioinformatics ◽

10.1093/bioinformatics/btaa020 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2936-2937 ◽

Cited By ~ 4

Author(s):

Gareth Peat ◽

William Jones ◽

Michael Nuhn ◽

José Carlos Marugán ◽

William Newell ◽

...

Keyword(s):

Drug Targets ◽

Gene Expression Regulation ◽

Association Studies ◽

Genome Wide Association Studies ◽

Protein Coding ◽

Data Resource ◽

Coding Regions ◽

Genome Wide ◽

Causal Genes ◽

Interactive Data

Abstract Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.

Download Full-text

Comparative Genomics: Insights on the Pathogenicity and Lifestyle of Rhizoctonia solani

International Journal of Molecular Sciences ◽

10.3390/ijms22042183 ◽

2021 ◽

Vol 22 (4) ◽

pp. 2183

Author(s):

Nurhani Mat Razali ◽

Siti Norvahida Hisham ◽

Ilakiya Sharanee Kumar ◽

Rohit Nandan Shukla ◽

Melvin Lee ◽

...

Keyword(s):

Comparative Genomics ◽

Rhizoctonia Solani ◽

Abiotic Factors ◽

Biotic Factor ◽

Protein Coding ◽

Sustainable Food ◽

Repeat Elements ◽

Gene Sets ◽

Core Genes

Proper management of agricultural disease is important to ensure sustainable food security. Staple food crops like rice, wheat, cereals, and other cash crops hold great export value for countries. Ensuring proper supply is critical; hence any biotic or abiotic factors contributing to the shortfall in yield of these crops should be alleviated. Rhizoctonia solani is a major biotic factor that results in yield losses in many agriculturally important crops. This paper focuses on genome informatics of our Malaysian Draft R. solani AG1-IA, and the comparative genomics (inter- and intra- AG) with four AGs including China AG1-IA (AG1-IA_KB317705.1), AG1-IB, AG3, and AG8. The genomic content of repeat elements, transposable elements (TEs), syntenic genomic blocks, functions of protein-coding genes as well as core orthologous genic information that underlies R. solani’s pathogenicity strategy were investigated. Our analyses show that all studied AGs have low content and varying profiles of TEs. All AGs were dominant for Class I TE, much like other basidiomycete pathogens. All AGs demonstrate dominance in Glycoside Hydrolase protein-coding gene assignments suggesting its importance in infiltration and infection of host. Our profiling also provides a basis for further investigation on lack of correlation observed between number of pathogenicity and enzyme-related genes with host range. Despite being grouped within the same AG with China AG1-IA, our Draft AG1-IA exhibits differences in terms of protein-coding gene proportions and classifications. This implies that strains from similar AG do not necessarily have to retain similar proportions and classification of TE but must have the necessary arsenal to enable successful infiltration and colonization of host. In a larger perspective, all the studied AGs essentially share core genes that are generally involved in adhesion, penetration, and host colonization. However, the different infiltration strategies will depend on the level of host resilience where this is clearly exhibited by the gene sets encoded for the process of infiltration, infection, and protection from host.

Download Full-text

Novel exon 1 protein‐coding regions N‐terminally extend human KCNE3 and KCNE4

The FASEB Journal ◽

10.1096/fj.201600467r ◽

2016 ◽

Vol 30 (8) ◽

pp. 2959-2969 ◽

Cited By ~ 8

Author(s):

Geoffrey W. Abbott

Keyword(s):

Protein Coding ◽

Coding Regions ◽

Exon 1 ◽

Novel Exon

Download Full-text

Protein-coding structured RNAs: A computational survey of conserved RNA secondary structures overlapping coding regions in drosophilids

Biochimie ◽

10.1016/j.biochi.2011.07.023 ◽

2011 ◽

Vol 93 (11) ◽

pp. 2019-2023 ◽

Cited By ~ 8

Author(s):

Sven Findeiß ◽

Jan Engelhardt ◽

Sonja J. Prohaska ◽

Peter F. Stadler

Keyword(s):

Secondary Structures ◽

Protein Coding ◽

Rna Secondary Structures ◽

Coding Regions

Download Full-text

Structure and expression of canary myc family genes

Molecular and Cellular Biology ◽

10.1128/mcb.11.3.1770-1776.1991 ◽

1991 ◽

Vol 11 (3) ◽

pp. 1770-1776

Author(s):

R G Collum ◽

D F Clayton ◽

F W Alt

Keyword(s):

Untranslated Region ◽

Untranslated Regions ◽

Coding Region ◽

Protein Coding ◽

Coding Regions ◽

Neuronal Precursors ◽

Myc Gene ◽

Mature Neurons

We found that the canary N-myc gene is highly related to mammalian N-myc genes in both the protein-coding region and the long 3' untranslated region. Examined coding regions of the canary c-myc gene were also highly related to their mammalian counterparts, but in contrast to N-myc, the canary and mammalian c-myc genes were quite divergent in their 3' untranslated regions. We readily detected N-myc and c-myc expression in the adult canary brain and found N-myc expression both at sites of proliferating neuronal precursors and in mature neurons.

Download Full-text

Sequence and phylogenetic analysis of the non-structural 3A and 3B protein-coding regions of foot-and-mouth disease virus subtype A Iran 05

Journal of Veterinary Science ◽

10.4142/jvs.2010.11.3.243 ◽

2010 ◽

Vol 11 (3) ◽

pp. 243

Author(s):

Saber Jelokhani-Niaraki ◽

Majid Esmaelizad ◽

Morteza Daliri ◽

Rasoul Vaez-Torshizi ◽

Morteza Kamalzadeh ◽

...

Keyword(s):

Phylogenetic Analysis ◽

Disease Virus ◽

Foot And Mouth Disease ◽

Mouth Disease ◽

Protein Coding ◽

Coding Regions ◽

Mouth Disease Virus ◽

Foot And Mouth ◽

Subtype A ◽

Virus Subtype

Download Full-text