scholarly journals EnTAP: Bringing Faster and Smarter Functional Annotation to Non-Model Eukaryotic Transcriptomes

2018 ◽  
Author(s):  
Alexander J. Hart ◽  
Samuel Ginzburg ◽  
Muyang (Sam) Xu ◽  
Cera R. Fisher ◽  
Nasim Rahmatpour ◽  
...  

ABSTRACTEnTAP (Eukaryotic Non-Model Transcriptome Annotation Pipeline) was designed to improve the accuracy, speed, and flexibility of functional gene annotation for de novo assembled transcriptomes in non-model eukaryotes. This software package addresses the fragmentation and related assembly issues that result in inflated transcript estimates and poor annotation rates, while focusing primarily on protein-coding transcripts. Following filters applied through assessment of true expression and frame selection, open-source tools are leveraged to functionally annotate the translated proteins. Downstream features include fast similarity search across three repositories, protein domain assignment, orthologous gene family assessment, and Gene Ontology term assignment. The final annotation integrates across multiple databases and selects an optimal assignment from a combination of weighted metrics describing similarity search score, taxonomic relationship, and informativeness. Researchers have the option to include additional filters to identify and remove contaminants, identify associated pathways, and prepare the transcripts for enrichment analysis. This fully featured pipeline is easy to install, configure, and runs significantly faster than comparable annotation packages. EnTAP is optimized to generate extensive functional information for the gene space of organisms with limited or poorly characterized genomic resources.

2017 ◽  
Vol 5 (28) ◽  
Author(s):  
Su-Yeon Lee ◽  
Ji-eun An ◽  
Sun-Hwa Ryu ◽  
Myungkil Kim

ABSTRACT Polyporus brumalis is able to synthesize several sesquiterpenes during fungal growth. Using a single-molecule real-time sequencing platform, we present the 53-Mb draft genome of P. brumalis, which contains 6,231 protein-coding genes. Gene annotation and isolation support genetic information, which can increase the understanding of sesquiterpene metabolism in P. brumalis.


Genes ◽  
2019 ◽  
Vol 10 (9) ◽  
pp. 677 ◽  
Author(s):  
Chuang Zhou ◽  
Hongmei Tu ◽  
Haoran Yu ◽  
Shuai Zheng ◽  
Bo Dai ◽  
...  

The Sichuan partridge (Arborophila rufipectus, Phasianidae, Galliformes) is distributed in south-west China, and classified as endangered grade. To examine the evolution and genomic features of Sichuan partridge, we de novo assembled the Sichuan partridge reference genome. The final draft assembly consisted of approximately 1.09 Gb, and had a scaffold N50 of 4.57 Mb. About 1.94 million heterozygous single-nucleotide polymorphisms (SNPs) were detected, 17,519 protein-coding genes were predicted, and 9.29% of the genome was identified as repetitive elements. A total of 56 olfactory receptor (OR) genes were found in Sichuan partridge, and conserved motifs were detected. Comparisons between the Sichuan partridge genome and chicken genome revealed a conserved genome structure, and phylogenetic analysis demonstrated that Arborophila possessed a basal phylogenetic position within Phasianidae. Gene Ontology (GO) enrichment analysis of positively selected genes (PSGs) in Sichuan partridge showed over-represented GO functions related to environmental adaptation, such as energy metabolism and behavior. Pairwise sequentially Markovian coalescent analysis revealed the recent demographic trajectory for the Sichuan partridge. Our data and findings provide valuable genomic resources not only for studying the evolutionary adaptation, but also for facilitating the long-term conservation and genetic diversity for this endangered species.


Insects ◽  
2021 ◽  
Vol 12 (4) ◽  
pp. 281
Author(s):  
Haixia Zhan ◽  
Youssef Dewer ◽  
Cheng Qu ◽  
Shiyong Yang ◽  
Chen Luo ◽  
...  

Donacia provosti (Fairmaire, 1885) is a major pest of aquatic crops. It has been widely distributed in the world causing extensive damage to lotus and rice plants. Changes in gene regulation may play an important role in adaptive evolution, particularly during adaptation to feeding and living habits. However, little is known about the evolution and molecular mechanisms underlying the adaptation of D. provosti to its lifestyle and living habits. To address this question, we generated the first larval transcriptome of D. provosti. A total of 20,692 unigenes were annotated from the seven public databases and around 18,536 protein-coding genes have been predicted from the analysis of D. provosti transcriptome. About 5036 orthologous cutlers were identified among four species and 494 unique clusters were identified from D. provosti larvae including the visual perception. Furthermore, to reveal the molecular difference between D. provosti and the Colorado potato beetle Leptinotarsa decemlineata, a comparison between CDS of the two beetles was conducted and 6627 orthologous gene pairs were identified. Based on the ratio of nonsynonymous and synonymous substitutions, 93 orthologous gene pairs were found evolving under positive selection. Interestingly, our results also show that there are 4 orthologous gene pairs of the 93 gene pairs were associated with the “mTOR signaling pathway”, which are predicted to be involved in the molecular mechanism of D. provosti adaptation to the underwater environment. This study will provide us with an important scientific basis for building effective prevention and control system of the aquatic leaf beetle Donacia provosti.


2020 ◽  
Author(s):  
Jianing Gao ◽  
Huan Zhang ◽  
Xiaohua Jiang ◽  
Asim Ali ◽  
Daren Zhao ◽  
...  

AbstractExploring the genetic basis of human infertility is currently under intensive investigation. However, only a handful of genes are validated in animal models as disease-causing genes in infertile men. Thus, to better understand the genetic basis of spermatogenesis in human and to bridge the knowledge gap between human and other animal species, we have constructed FertilityOnline database, which is a resource that integrates the functional genes reported in literature related to spermatogenesis into an existing spermatogenic database, SpermatogenesisOnline 1.0. Additional features like functional annotation and statistical analysis of genetic variants of human genes, are also incorporated into FertilityOnline. By searching this database, users can focus on the top candidate genes associated with infertility and can perform enrichment analysis to instantly refine the number of candidates in a user-friendly web interface. Clinical validation of this database is established by the identification of novel causative mutations in SYCE1 and STAG3 in azoospermia men. In conclusion, FertilityOnline is not only an integrated resource for analysis of spermatogenic genes, but also a useful tool that facilitates to study underlying genetic basis of male infertility.AvailabilityFertilityOnline can be freely accessed at http://mcg.ustc.edu.cn/bsc/spermgenes2.0/index.html.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Alexandre Lomsadze ◽  
Christophe Bonny ◽  
Francesco Strozzi ◽  
Mark Borodovsky

Abstract Computational reconstruction of nearly complete genomes from metagenomic reads may identify thousands of new uncultured candidate bacterial species. We have shown that reconstructed prokaryotic genomes along with genomes of sequenced microbial isolates can be used to support more accurate gene prediction in novel metagenomic sequences. We have proposed an approach that used three types of gene prediction algorithms and found for all contigs in a metagenome nearly optimal models of protein-coding regions either in libraries of pre-computed models or constructed de novo. The model selection process and gene annotation were done by the new GeneMark-HM pipeline. We have created a database of the species level pan-genomes for the human microbiome. To create a library of models representing each pan-genome we used a self-training algorithm GeneMarkS-2. Genes initially predicted in each contig served as queries for a fast similarity search through the pan-genome database. The best matches led to selection of the model for gene prediction. Contigs not assigned to pan-genomes were analyzed by crude, but still accurate models designed for sequences with particular GC compositions. Tests of GeneMark-HM on simulated metagenomes demonstrated improvement in gene annotation of human metagenomic sequences in comparison with the current state-of-the-art gene prediction tools.


2018 ◽  
Vol 19 (9) ◽  
pp. 2525 ◽  
Author(s):  
Wei Liu ◽  
LianFu Chen ◽  
YingLi Cai ◽  
QianQian Zhang ◽  
YinBing Bian

Morchella is a popular edible fungus worldwide due to its rich nutrition and unique flavor. Many research efforts were made on the domestication and cultivation of Morchella all over the world. In recent years, the cultivation of Morchella was successfully commercialized in China. However, the biology is not well understood, which restricts the further development of the morel fungus cultivation industry. In this paper, we performed de novo sequencing and assembly of the genomes of two monospores with a different mating type (M04M24 and M04M26) isolated from the commercially cultivated strain M04. Gene annotation and comparative genome analysis were performed to study differences in CAZyme (Carbohydrate-active enzyme) enzyme content, transcription factors, duplicated sequences, structure of mating type sites, and differences at the gene and functional levels between the two monospore strains of M. importuna. Results showed that the de novo assembled haploid M04M24 and M04M26 genomes were 48.98 and 51.07 Mb, respectively. A complete fine physical map of M. importuna was obtained from genome coverage and gene completeness evaluation. A total of 10,852 and 10,902 common genes and 667 and 868 endemic genes were identified from the two monospore strains, respectively. The Gene Ontology (GO) and KAAS (KEGG Automatic Annotation Serve) enrichment analyses showed that the endemic genes performed different functions. The two monospore strains had 99.22% collinearity with each other, accompanied with certain position and rearrangement events. Analysis of complete mating-type loci revealed that the two monospore M. importuna strains contained an independent mating-type structure and remained conserved in sequence and location. The phylogenetic and divergence time of M. importuna was analyzed at the whole-genome level for the first time. The bifurcation time of morel and tuber was estimated to be 201.14 million years ago (Mya); the two monospore strains with a different mating type represented the evolution of different nuclei, and the single copy homologous genes between them were also different due to a genetic differentiation distance about 0.65 Mya. Compared with truffles, M. importuna had an extension of 28 clusters of orthologous genes (COGs) and a contraction of two COGs. The two different polar nuclei with different degrees of contraction and expansion suggested that they might have undergone different evolutionary processes. The different mating-type structures, together with the functional clustering and enrichment analysis results of the endemic genes of the two different polar nuclei, imply that M. importuna might be a heterothallic fungus and the interaction between the endemic genes may be necessary for its complete life history. Studies on the genome of M. importuna facilitate a better understanding of morel biology and evolution.


2021 ◽  
Vol 43 (8) ◽  
Author(s):  
Guobao Wang ◽  
Li Qin

AbstractQ. liaotungensis is an important drought-resistant tree species in Northeast China where the climate is dry and rainless. In this study, we performed a deep transcriptomic sequencing in Q. liaotungensis leaves, including de novo assembly and functional annotation for screening the candidate genes involved in drought avoidance. A total of 25,593 unigenes were obtained from Illumina sequencing platform. According to Gene Ontology annotation and KEGG pathway enrichment analysis, we screened a series of candidate genes encoding SOD, POD, CAT, DREB, MYB, WRKY, bZIP, and NAC from the Q. liaotungensis leaf transcriptome, all of which are potentially involved in drought resistance. The results of this study expanded the genetic resources of Q. liaotungensis and provided a theoretical basis for further exploring the functional gene information of Q. liaotungensis.


2021 ◽  
Vol 11 ◽  
Author(s):  
Ruitao Liu ◽  
Yiming Wang ◽  
Peng Li ◽  
Lei Sun ◽  
Jianfu Jiang ◽  
...  

Grape white rot caused by Coniella diplodiella (Speg.) affects the production and quality of grapevine in China and other grapevine-growing countries. Despite the importance of C. diplodiella as a serious disease-causing agent in grape, the genome information and molecular mechanisms underlying its pathogenicity are poorly understood. To bridge this gap, 40.93 Mbp of C. diplodiella strain WR01 was de novo assembled. A total of 9,403 putative protein-coding genes were predicted. Among these, 608 and 248 genes are potentially secreted proteins and candidate effector proteins (CEPs), respectively. Additionally, the transcriptome of C. diplodiella was analyzed after feeding with crude grapevine leaf homogenates, which reveals the transcriptional expression of 9,115 genes. Gene ontology enrichment analysis indicated that the highly enriched genes are related with carbohydrate metabolism and secondary metabolite synthesis. Forty-three putative effectors were cloned from C. diplodiella, and applied for further functional analysis. Among them, one protein exhibited strong effect in the suppression of BCL2-associated X (BAX)-induced hypersensitive response after transiently expressed in Nicotiana benthamiana leaves. This work facilitates valuable genetic basis for understanding the molecular mechanism underlying C. diplodiella-grapevine interaction.


2017 ◽  
Author(s):  
James M. Havrilla ◽  
Brent S. Pedersen ◽  
Ryan M. Layer ◽  
Aaron R. Quinlan

ABSTRACTDeep catalogs of genetic variation collected from many thousands of humans enable the detection of intraspecies constraint by revealing coding regions with a scarcity of variation. While existing techniques summarize constraint for entire genes, single metrics cannot capture the fine-scale variability in constraint within each protein-coding gene. To provide greater resolution, we have created a detailed map of constrained coding regions (CCRs) in the human genome by leveraging coding variation observed among 123,136 humans from the Genome Aggregation Database (gnomAD). The most constrained coding regions in our map are enriched for both pathogenic variants in ClinVar and de novo mutations underlying developmental disorders. CCRs also reveal protein domain families under high constraint, suggest unannotated or incomplete protein domains, and facilitate the prioritization of previously unseen variation in studies of disease. Finally, a subset of CCRs with the highest constraint likely exist within genes that cause yet unobserved human phenotypes owing to strong purifying selection.


Insects ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 652
Author(s):  
Hongwei Tan ◽  
Muhammad Naeem ◽  
Hussain Ali ◽  
Muhammad Shakeel ◽  
Haiou Kuang ◽  
...  

In Pakistan, Apis cerana, the Asian honeybee, has been used for honey production and pollination services. However, its genomic makeup and phylogenetic relationship with those in other countries are still unknown. We collected A. cerana samples from the main cerana-keeping region in Pakistan and performed whole genome sequencing. A total of 28 Gb of Illumina shotgun reads were generated, which were used to assemble the genome. The obtained genome assembly had a total length of 214 Mb, with a GC content of 32.77%. The assembly had a scaffold N50 of 2.85 Mb and a BUSCO completeness score of 99%, suggesting a remarkably complete genome sequence for A. cerana in Pakistan. A MAKER pipeline was employed to annotate the genome sequence, and a total of 11,864 protein-coding genes were identified. Of them, 6750 genes were assigned at least one GO term, and 8813 genes were annotated with at least one protein domain. Genome-scale phylogeny analysis indicated an unexpectedly close relationship between A. cerana in Pakistan and those in China, suggesting a potential human introduction of the species between the two countries. Our results will facilitate the genetic improvement and conservation of A. cerana in Pakistan.


Sign in / Sign up

Export Citation Format

Share Document