Comparative Gene Prediction Based on Gene Structure Conservation

Gene structure conservation aids similarity based gene prediction

Nucleic Acids Research ◽

10.1093/nar/gkh211 ◽

2004 ◽

Vol 32 (2) ◽

pp. 776-783 ◽

Cited By ~ 53

Author(s):

I. M. Meyer

Keyword(s):

Gene Structure ◽

Gene Prediction ◽

Structure Conservation

Download Full-text

A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms

10.21203/rs.2.19444/v1 ◽

2019 ◽

Author(s):

Nicolas Scalzitti ◽

Anne Jeannin-Girardon ◽

Pierre Collet ◽

Olivier Poch ◽

Julie Dawn Thompson

Keyword(s):

Ab Initio ◽

Gene Structure ◽

Structure Prediction ◽

Gene Prediction ◽

Draft Genome ◽

Prediction Methods ◽

Eukaryotic Genes ◽

Test Sets ◽

Structure Complexity ◽

Genome Assemblies

Abstract Background: The draft genome assemblies produced by new sequencing technologies present important challenges for automatic gene prediction pipelines, leading to less accurate gene models. New benchmark methods are needed to evaluate the accuracy of gene prediction methods in the face of incomplete genome assemblies, low genome coverage and quality, complex gene structures, or a lack of suitable sequences for evidence-based annotations. Results: We describe the construction of a new benchmark, called G3PO (benchmark for Gene and Protein Prediction PrOgrams), designed to represent many of the typical challenges faced by current genome annotation projects. The benchmark is based on a carefully validated and curated set of real eukaryotic genes from 147 phylogenetically disperse organisms, and a number of test sets are defined to evaluate the effects of different features, including genome sequence quality, gene structure complexity, protein length, etc. We used the benchmark to perform an independent comparative analysis of the most widely used ab initio gene prediction programs and identified the main strengths and weaknesses of the programs. More importantly, we highlight a number of features that could be exploited in order to improve the accuracy of current prediction tools. Conclusions: The experiments showed that ab initio gene structure prediction is a very challenging task, which should be further investigated. We believe that the baseline results associated with the complex gene test sets in G3PO provide useful guidelines for future studies.

Download Full-text

Whokaryote: distinguishing eukaryotic and prokaryotic contigs in metagenomes based on gene structure

10.1101/2021.11.15.468626 ◽

2021 ◽

Author(s):

Lotte J U Pronk ◽

Marnix H Medema

Keyword(s):

Microbial Community ◽

Gene Structure ◽

Gene Prediction ◽

Gene Clusters ◽

Intergenic Distance ◽

Biosynthetic Gene Clusters ◽

Functional Potential ◽

Eukaryotic Genes ◽

Eukaryotic Gene ◽

Eukaryotic Microbes

Metagenomics has become a prominent technology to study the functional potential of all organisms in a microbial community. Most studies focus on the bacterial content of these communities, while ignoring eukaryotic microbes. Indeed, many metagenomics analysis pipelines silently assume that all contigs in a metagenome are prokaryotic. However, because of marked differences in gene structure, prokaryotic gene prediction tools fail to accurately predict eukaryotic genes. Here, we developed a classifier that distinguishes eukaryotic from prokaryotic contigs based on foundational differences between these taxa in gene structure. We first developed a random forest classifier that uses intergenic distance, gene density and gene length as the most important features. We show that, with an estimated accuracy of 97%, this classifier with principled features grounded in biology can perform almost as well as the classifiers EukRep and Tiara, which use k-mer frequencies as features. By re-training our classifier with Tiara predictions as additional feature, weaknesses of both types of classifiers are compensated; the result is an enhanced classifier that outperforms all individual classifiers, with an F1-score of 1.00 on precision, recall and accuracy for both eukaryotes and prokaryotes, while still being fast. In a reanalysis of metagenome data from a disease-suppressive plant endosphere microbial community, we show how using Whokaryote to select contigs for eukaryotic gene prediction facilitates the discovery of several biosynthetic gene clusters that were missed in the original study. Our enhanced classifier, which we call ′Whokaryote′, is wrapped in an easily installable package and is freely available from https://git.wageningenur.nl/lotte.pronk/whokaryote.

Download Full-text

A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms

10.21203/rs.2.19444/v2 ◽

2020 ◽

Author(s):

Nicolas Scalzitti ◽

Anne Jeannin-Girardon ◽

Pierre Collet ◽

Olivier Poch ◽

Julie Dawn Thompson

Keyword(s):

Ab Initio ◽

Gene Structure ◽

Structure Prediction ◽

Gene Prediction ◽

Draft Genome ◽

Prediction Methods ◽

Eukaryotic Genes ◽

Test Sets ◽

Structure Complexity ◽

Genome Assemblies

Abstract Background: The draft genome assemblies produced by new sequencing technologies present important challenges for automatic gene prediction pipelines, leading to less accurate gene models. New benchmark methods are needed to evaluate the accuracy of gene prediction methods in the face of incomplete genome assemblies, low genome coverage and quality, complex gene structures, or a lack of suitable sequences for evidence-based annotations. Results: We describe the construction of a new benchmark, called G3PO (benchmark for Gene and Protein Prediction PrOgrams), designed to represent many of the typical challenges faced by current genome annotation projects. The benchmark is based on a carefully validated and curated set of real eukaryotic genes from 147 phylogenetically disperse organisms, and a number of test sets are defined to evaluate the effects of different features, including genome sequence quality, gene structure complexity, protein length, etc. We used the benchmark to perform an independent comparative analysis of the most widely used ab initio gene prediction programs and identified the main strengths and weaknesses of the programs. More importantly, we highlight a number of features that could be exploited in order to improve the accuracy of current prediction tools. Conclusions: The experiments showed that ab initio gene structure prediction is a very challenging task, which should be further investigated. We believe that the baseline results associated with the complex gene test sets in G3PO provide useful guidelines for future studies.

Download Full-text

Definition of the Gene Content of the Human Genome: The Need for Deep Experimental Verification

Comparative and Functional Genomics ◽

10.1002/cfg.81 ◽

2001 ◽

Vol 2 (3) ◽

pp. 169-175 ◽

Cited By ~ 2

Author(s):

Andrew J. G. Simpson ◽

Sandro J. de Souza ◽

Anamaria A. Camargo ◽

Ricardo R. Brentani

Keyword(s):

Human Genome ◽

Gene Structure ◽

Experimental Verification ◽

Human Gene ◽

Sequence Data ◽

Gene Prediction ◽

Human Genes ◽

Number Of Genes ◽

Definition Of

Based on the analysis of the drafts of the human genome sequence, it is being speculated that our species may possess an unexpectedly low number of genes. The quality of the drafts, the impossibility of accurate gene prediction and the lack of sufficient transcript sequence data, however, render such speculations very premature. The complexity of human gene structure requires additional and extensive experimental verification of transcripts that may result in major revisions of these early estimates of the number of human genes.

Download Full-text

Leukocyte Common Antigen-Related Phosphatase (LRP) Gene Structure: Conservation of the Genomic Organization of Transmembrane Protein Tyrosine Phosphatases

Genomics ◽

10.1006/geno.1993.1279 ◽

1993 ◽

Vol 17 (1) ◽

pp. 33-38 ◽

Cited By ~ 12

Author(s):

Edward C.C. Wong ◽

Jerald E. Mullersman ◽

Matthew L. Thomas

Keyword(s):

Gene Structure ◽

Genomic Organization ◽

Transmembrane Protein ◽

Protein Tyrosine Phosphatases ◽

Common Antigen ◽

Tyrosine Phosphatases ◽

Leukocyte Common Antigen ◽

Protein Tyrosine ◽

Structure Conservation

Download Full-text

Molecular cloning, complete nucleotide sequence, and gene structure of the provirusgenome of a retrovirus produced in a human lymphoblastoid cell line

Virology ◽

10.1016/s0042-6822(88)90109-2 ◽

1988 ◽

Vol 167 (2) ◽

pp. 468-476 ◽

Cited By ~ 1

Author(s):

T ODA ◽

S IKEDA ◽

S WATANABE ◽

M HATSUSHIKA ◽

K AKIYAMA ◽

...

Keyword(s):

Nucleotide Sequence ◽

Cell Line ◽

Molecular Cloning ◽

Gene Structure ◽

Complete Nucleotide Sequence ◽

Lymphoblastoid Cell Line ◽

Human Lymphoblastoid Cell ◽

Human Lymphoblastoid Cell Line

Download Full-text

Murine CD70: cDNA cloning and gene structure

Immunology Letters ◽

10.1016/s0165-2478(97)87049-6 ◽

1997 ◽

Vol 56 (1-3) ◽

pp. 56

Author(s):

K Tesselaar

Keyword(s):

Cdna Cloning ◽

Gene Structure

Download Full-text

Glucokinase gene structure. Functional implications of molecular genetic studies

Diabetes ◽

10.2337/diabetes.39.5.523 ◽

1990 ◽

Vol 39 (5) ◽

pp. 523-527 ◽

Cited By ~ 21

Author(s):

M. A. Magnuson

Keyword(s):

Gene Structure ◽

Molecular Genetic ◽

Glucokinase Gene ◽

Genetic Studies ◽

Functional Implications

Download Full-text

Human hexokinase II mRNA and gene structure

Diabetes ◽

10.2337/diabetes.44.3.290 ◽

1995 ◽

Vol 44 (3) ◽

pp. 290-294 ◽

Cited By ~ 10

Author(s):

R. L. Printz ◽

H. Ardehali ◽

S. Koch ◽

D. K. Granner

Keyword(s):

Gene Structure ◽

Hexokinase Ii

Download Full-text