scholarly journals PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions

2011 ◽  
Vol 27 (13) ◽  
pp. i275-i282 ◽  
Author(s):  
M. F. Lin ◽  
I. Jungreis ◽  
M. Kellis
Author(s):  
Christian Groß ◽  
Chiara Bortoluzzi ◽  
Dick de Ridder ◽  
Hendrik-Jan Megens ◽  
Martien AM Groenen ◽  
...  

AbstractThe availability of genomes for many species has advanced our understanding of the non-protein-coding fraction of the genome. Comparative genomics has proven to be an invaluable approach for the systematic, genome-wide identification of conserved non-protein-coding elements (CNEs). However, for many non-mammalian model species, including chicken, our capability to interpret the functional importance of variants overlapping CNEs has been limited by current genomic annotations, which rely on a single information type (e.g. conservation). We here studied CNEs in chicken using a combination of population genomics and comparative genomics. To investigate the functional importance of variants found in CNEs we develop a ch(icken) Combined Annotation-Dependent Depletion (chCADD), a variant effect prediction tool first introduced for humans and later on for mouse and pig. We show that 73 Mb of the chicken genome has been conserved across more than 280 million years of vertebrate evolution. The vast majority of the conserved elements are in non-protein-coding regions, which display SNP densities and allele frequency distributions characteristic of genomic regions constrained by purifying selection. By annotating SNPs with the chCADD score we are able to pinpoint specific subregions of the CNEs to be of higher functional importance, as supported by SNPs found in these subregions are associated with known disease genes in humans, mice, and rats. Taken together, our findings indicate that CNEs harbor variants of functional significance that should be object of further investigation along with protein-coding mutations. We therefore anticipate chCADD to be of great use to the scientific community and breeding companies in future functional studies in chicken.


2018 ◽  
Author(s):  
Leanne Grech ◽  
Daniel Charlton Jeffares ◽  
Christoph Yves Sadée ◽  
María Rodríguez-López ◽  
Danny Asher Bitton ◽  
...  

AbstractBackgroundNon-protein-coding regions of eukaryotic genomes remain poorly understood. Diversity studies, comparative genomics and biochemical outputs of genomic sites can be indicators of functional elements, but none produce fine-scale genome-wide descriptions of all functional elements.ResultsTowards the generation of a comprehensive description of functional elements in the haploid Schizosaccharomyces pombe genome, we generated transposon mutagenesis libraries to a density of one insertion per 13 nucleotides of the genome. We applied a five-state hidden Markov model (HMM) to characterise insertion-depleted regions at nucleotide-level resolution. HMM-defined functional constraint was consistent with genetic diversity, comparative genomics, gene-expression data and genome annotation.ConclusionsWe infer that transposon insertions lead to fitness consequences in 90% of the genome, including 80% of the non-protein-coding regions, reflecting the presence of numerous non-coding elements in this compact genome that have functional roles. Display of this data in genome browsers provides fine-scale views of structure-function relationships within specific genes.


2020 ◽  
Vol 36 (9) ◽  
pp. 2936-2937 ◽  
Author(s):  
Gareth Peat ◽  
William Jones ◽  
Michael Nuhn ◽  
José Carlos Marugán ◽  
William Newell ◽  
...  

Abstract Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.


2021 ◽  
Vol 22 (4) ◽  
pp. 2183
Author(s):  
Nurhani Mat Razali ◽  
Siti Norvahida Hisham ◽  
Ilakiya Sharanee Kumar ◽  
Rohit Nandan Shukla ◽  
Melvin Lee ◽  
...  

Proper management of agricultural disease is important to ensure sustainable food security. Staple food crops like rice, wheat, cereals, and other cash crops hold great export value for countries. Ensuring proper supply is critical; hence any biotic or abiotic factors contributing to the shortfall in yield of these crops should be alleviated. Rhizoctonia solani is a major biotic factor that results in yield losses in many agriculturally important crops. This paper focuses on genome informatics of our Malaysian Draft R. solani AG1-IA, and the comparative genomics (inter- and intra- AG) with four AGs including China AG1-IA (AG1-IA_KB317705.1), AG1-IB, AG3, and AG8. The genomic content of repeat elements, transposable elements (TEs), syntenic genomic blocks, functions of protein-coding genes as well as core orthologous genic information that underlies R. solani’s pathogenicity strategy were investigated. Our analyses show that all studied AGs have low content and varying profiles of TEs. All AGs were dominant for Class I TE, much like other basidiomycete pathogens. All AGs demonstrate dominance in Glycoside Hydrolase protein-coding gene assignments suggesting its importance in infiltration and infection of host. Our profiling also provides a basis for further investigation on lack of correlation observed between number of pathogenicity and enzyme-related genes with host range. Despite being grouped within the same AG with China AG1-IA, our Draft AG1-IA exhibits differences in terms of protein-coding gene proportions and classifications. This implies that strains from similar AG do not necessarily have to retain similar proportions and classification of TE but must have the necessary arsenal to enable successful infiltration and colonization of host. In a larger perspective, all the studied AGs essentially share core genes that are generally involved in adhesion, penetration, and host colonization. However, the different infiltration strategies will depend on the level of host resilience where this is clearly exhibited by the gene sets encoded for the process of infiltration, infection, and protection from host.


Biochimie ◽  
2011 ◽  
Vol 93 (11) ◽  
pp. 2019-2023 ◽  
Author(s):  
Sven Findeiß ◽  
Jan Engelhardt ◽  
Sonja J. Prohaska ◽  
Peter F. Stadler

1991 ◽  
Vol 11 (3) ◽  
pp. 1770-1776
Author(s):  
R G Collum ◽  
D F Clayton ◽  
F W Alt

We found that the canary N-myc gene is highly related to mammalian N-myc genes in both the protein-coding region and the long 3' untranslated region. Examined coding regions of the canary c-myc gene were also highly related to their mammalian counterparts, but in contrast to N-myc, the canary and mammalian c-myc genes were quite divergent in their 3' untranslated regions. We readily detected N-myc and c-myc expression in the adult canary brain and found N-myc expression both at sites of proliferating neuronal precursors and in mature neurons.


2010 ◽  
Vol 11 (3) ◽  
pp. 243
Author(s):  
Saber Jelokhani-Niaraki ◽  
Majid Esmaelizad ◽  
Morteza Daliri ◽  
Rasoul Vaez-Torshizi ◽  
Morteza Kamalzadeh ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document