An undergraduate bioinformatics curriculum that teaches eukaryotic gene structure

Metagenomics has become a prominent technology to study the functional potential of all organisms in a microbial community. Most studies focus on the bacterial content of these communities, while ignoring eukaryotic microbes. Indeed, many metagenomics analysis pipelines silently assume that all contigs in a metagenome are prokaryotic. However, because of marked differences in gene structure, prokaryotic gene prediction tools fail to accurately predict eukaryotic genes. Here, we developed a classifier that distinguishes eukaryotic from prokaryotic contigs based on foundational differences between these taxa in gene structure. We first developed a random forest classifier that uses intergenic distance, gene density and gene length as the most important features. We show that, with an estimated accuracy of 97%, this classifier with principled features grounded in biology can perform almost as well as the classifiers EukRep and Tiara, which use k-mer frequencies as features. By re-training our classifier with Tiara predictions as additional feature, weaknesses of both types of classifiers are compensated; the result is an enhanced classifier that outperforms all individual classifiers, with an F1-score of 1.00 on precision, recall and accuracy for both eukaryotes and prokaryotes, while still being fast. In a reanalysis of metagenome data from a disease-suppressive plant endosphere microbial community, we show how using Whokaryote to select contigs for eukaryotic gene prediction facilitates the discovery of several biosynthetic gene clusters that were missed in the original study. Our enhanced classifier, which we call ′Whokaryote′, is wrapped in an easily installable package and is freely available from https://git.wageningenur.nl/lotte.pronk/whokaryote.

Download Full-text

Evolution of Eukaryotic Gene Repertoire and Gene Structure: Discovering the Unexpected Dynamics of Genome Evolution

Cold Spring Harbor Symposia on Quantitative Biology ◽

10.1101/sqb.2003.68.293 ◽

2003 ◽

Vol 68 (0) ◽

pp. 293-302 ◽

Cited By ~ 2

Author(s):

I.B. ROGOZIN ◽

V.N. BABENKO ◽

N.D. FEDOROVA ◽

J. D. JACKSON ◽

A.R. JACOBS ◽

...

Keyword(s):

Genome Evolution ◽

Gene Structure ◽

Gene Repertoire ◽

Eukaryotic Gene

Download Full-text

Role of small nuclear RNAs in eukaryotic gene expression

Essays in Biochemistry ◽

10.1042/bse0540079 ◽

2013 ◽

Vol 54 ◽

pp. 79-90 ◽

Cited By ~ 34

Author(s):

Saba Valadkhan ◽

Lalith S. Gunawardane

Keyword(s):

Gene Expression ◽

Protein Interactions ◽

Rna Stability ◽

Splice Sites ◽

Branch Site ◽

Small Nuclear Rnas ◽

Eukaryotic Gene Expression ◽

Eukaryotic Gene ◽

Rna Biogenesis

Eukaryotic cells contain small, highly abundant, nuclear-localized non-coding RNAs [snRNAs (small nuclear RNAs)] which play important roles in splicing of introns from primary genomic transcripts. Through a combination of RNA–RNA and RNA–protein interactions, two of the snRNPs, U1 and U2, recognize the splice sites and the branch site of introns. A complex remodelling of RNA–RNA and protein-based interactions follows, resulting in the assembly of catalytically competent spliceosomes, in which the snRNAs and their bound proteins play central roles. This process involves formation of extensive base-pairing interactions between U2 and U6, U6 and the 5′ splice site, and U5 and the exonic sequences immediately adjacent to the 5′ and 3′ splice sites. Thus RNA–RNA interactions involving U2, U5 and U6 help position the reacting groups of the first and second steps of splicing. In addition, U6 is also thought to participate in formation of the spliceosomal active site. Furthermore, emerging evidence suggests additional roles for snRNAs in regulation of various aspects of RNA biogenesis, from transcription to polyadenylation and RNA stability. These snRNP-mediated regulatory roles probably serve to ensure the co-ordination of the different processes involved in biogenesis of RNAs and point to the central importance of snRNAs in eukaryotic gene expression.

Download Full-text