annotation pipeline
Recently Published Documents


TOTAL DOCUMENTS

74
(FIVE YEARS 5)

H-INDEX

18
(FIVE YEARS 0)

Author(s):  
Natascha van Lieshout ◽  
Martijn van Kaauwen ◽  
Linda Kodde ◽  
Paul Arens ◽  
Marinus J M Smulders ◽  
...  

Abstract Chrysanthemum is among the top ten cut, potted and perennial garden flowers in the world. Despite this, to date, only the genomes of two wild diploid chrysanthemums have been sequenced and assembled. Here we present the most complete and contiguous chrysanthemum de novo assembly published so far, as well as a corresponding ab initio annotation. The cultivated hexaploid varieties are thought to originate from a hybrid of wild chrysanthemums, among which the diploid Chrysanthemum makinoi has been mentioned. Using a combination of Oxford Nanopore long reads, Pacific Biosciences long reads, Illumina short reads, Dovetail sequences and a genetic map, we assembled 3.1 Gb of its sequence into 9 pseudochromosomes, with an N50 of 330 Mb and BUSCO complete score of 92.1%. Our ab initio annotation pipeline predicted 95 074 genes and marked 80.0% of the genome as repetitive. This genome assembly of C. makinoi provides an important step forward in understanding the chrysanthemum genome, evolution and history.


2021 ◽  
Vol 10 (29) ◽  
Author(s):  
Hatim Almutairi ◽  
Michael D. Urbaniak ◽  
Michelle D. Bates ◽  
Narissara Jariyapan ◽  
Godwin Kwakye-Nuako ◽  
...  

We present the LGAAP computational pipeline, which was successfully used to assemble six genomes of the parasite subfamily Leishmaniinae to chromosome-scale completeness from a combination of long- and short-read sequencing data. LGAAP is open source, and we suggest that it may easily be ported for assembly of any genome of comparable size (∼35 Mb).


2021 ◽  
Author(s):  
Natascha van Lieshout ◽  
Martijn van Kaauwen ◽  
Linda Kodde ◽  
Paul Arens ◽  
Marinus J. M. Smulders ◽  
...  

Chrysanthemum is among the top ten cut, potted and perennial garden flowers in the world. Despite this, to date, only the genomes of two wild diploid chrysanthemums have been sequenced and assembled. Here we present the most complete and contiguous chrysanthemum de novo assembly published so far, as well as a corresponding ab initio annotation. The wild diploid Chrysanthemum makinoi is thought to be one of the ancestors of the cultivated hexaploid varieties which are currently grown all around the world. Using a combination of Oxford Nanopore long reads, Pacific Biosciences long reads, Illumina short reads, Dovetail sequences and a genetic map, we assembled 3.1 Gb of its sequence into 9 pseudochromosomes, with an N50 of 330 Mb and BUSCO complete score of 92.1%. Our ab initio annotation pipeline predicted 95 074 genes and marked 80.0% of the genome as repetitive. This genome assembly of C. makinoi provides an important step forward in understanding the chrysanthemum genome, evolution and history.


Author(s):  
Chong Tang ◽  
Yeming Xie ◽  
Mei Guo ◽  
Wei Yan

Abstract Small noncoding RNAs deep sequencing (sncRNA-Seq) has become a routine for sncRNA detection and quantification. However, the software packages currently available for sncRNA annotation can neither recognize sncRNA variants in the sequencing reads, nor annotate all known sncRNA simultaneously. Here, we report a novel anchor alignment-based small RNA annotation (AASRA) software package (https://github.com/biogramming/AASRA). AASRA represents an all-in-one sncRNA annotation pipeline, which allows for high-speed, simultaneous annotation of all known sncRNA species with the capability to distinguish mature from precursor miRNAs, and to identify novel sncRNA variants in the sncRNA-Seq sequencing reads.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Artem V. Luzhin ◽  
Arkadiy K. Golov ◽  
Alexey A. Gavrilov ◽  
Artem K. Velichko ◽  
Sergey V. Ulianov ◽  
...  

AbstractChromatin loops represent one of the major levels of hierarchical folding of the genome. Although the situation is evolving, current methods have various difficulties with the accurate mapping of loops even in mammalian Hi-C data, and most of them fail to identify chromatin loops in animal species with substantially different genome architecture. This paper presents the loop and significant contact annotation (LASCA) pipeline, which uses Weibull distribution-based modeling to effectively identify loops and enhancer–promoter interactions in Hi-C data from evolutionarily distant species: from yeast and worms to mammals. Available at: https://github.com/ArtemLuzhin/LASCA_pipeline.


2020 ◽  
Vol 49 (D1) ◽  
pp. D1020-D1028
Author(s):  
Wenjun Li ◽  
Kathleen R O’Neill ◽  
Daniel H Haft ◽  
Michael DiCuccio ◽  
Vyacheslav Chetvernin ◽  
...  

Abstract The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP) since 2018 have resulted in a substantial reduction in spurious annotation. The hierarchical collection of protein family models (PFMs) used by PGAP as evidence for structural and functional annotation was expanded to over 35 000 protein profile hidden Markov models (HMMs), 12 300 BlastRules and 36 000 curated CDD architectures. As a result, >122 million or 79% of RefSeq proteins are now named based on a match to a curated PFM. Gene symbols, Enzyme Commission numbers or supporting publication attributes are available on over 40% of the PFMs and are inherited by the proteins and features they name, facilitating multi-genome analyses and connections to the literature. In adherence with the principles of FAIR (findable, accessible, interoperable, reusable), the PFMs are available in the Protein Family Models Entrez database to any user. Finally, the reference and representative genome set, a taxonomically diverse subset of RefSeq prokaryotic genomes, is now recalculated regularly and available for download and homology searches with BLAST. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.


2020 ◽  
Vol 21 (23) ◽  
pp. 9029
Author(s):  
Olivia J. Veatch ◽  
Merlin G. Butler ◽  
Sarah H. Elsea ◽  
Beth A. Malow ◽  
James S. Sutcliffe ◽  
...  

Human genetic studies have implicated more than a hundred genes in Autism Spectrum Disorder (ASD). Understanding how variation in implicated genes influence expression of co-occurring conditions and drug response can inform more effective, personalized approaches for treatment of individuals with ASD. Rapidly translating this information into the clinic requires efficient algorithms to sort through the myriad of genes implicated by rare gene-damaging single nucleotide and copy number variants, and common variation detected in genome-wide association studies (GWAS). To pinpoint genes that are more likely to have clinically relevant variants, we developed a functional annotation pipeline. We defined clinical relevance in this project as any ASD associated gene with evidence indicating a patient may have a complex, co-occurring condition that requires direct intervention (e.g., sleep and gastrointestinal disturbances, attention deficit hyperactivity, anxiety, seizures, depression), or is relevant to drug development and/or approaches to maximizing efficacy and minimizing adverse events (i.e., pharmacogenomics). Starting with a list of all candidate genes implicated in all manifestations of ASD (i.e., idiopathic and syndromic), this pipeline uses databases that represent multiple lines of evidence to identify genes: (1) expressed in the human brain, (2) involved in ASD-relevant biological processes and resulting in analogous phenotypes in mice, (3) whose products are targeted by approved pharmaceutical compounds or possessing pharmacogenetic variation and (4) whose products directly interact with those of genes with variants recommended to be tested for by the American College of Medical Genetics (ACMG). Compared with 1000 gene sets, each with a random selection of human protein coding genes, more genes in the ASD set were annotated for each category evaluated (p ≤ 1.99 × 10−2). Of the 956 ASD-implicated genes in the full set, 18 were flagged based on evidence in all categories. Fewer genes from randomly drawn sets were annotated in all categories (x = 8.02, sd = 2.56, p = 7.75 × 10−4). Notably, none of the prioritized genes are represented among the 59 genes compiled by the ACMG, and 78% had a pathogenic or likely pathogenic variant in ClinVar. Results from this work should rapidly prioritize potentially actionable results from genetic studies and, in turn, inform future work toward clinical decision support for personalized care based on genetic testing.


2020 ◽  
Vol 49 (D1) ◽  
pp. D751-D763 ◽  
Author(s):  
I-Min A Chen ◽  
Ken Chu ◽  
Krishnaveni Palaniappan ◽  
Anna Ratner ◽  
Jinghua Huang ◽  
...  

Abstract The Integrated Microbial Genomes & Microbiomes system (IMG/M: https://img.jgi.doe.gov/m/) contains annotated isolate genome and metagenome datasets sequenced at the DOE’s Joint Genome Institute (JGI), submitted by external users, or imported from public sources such as NCBI. IMG v 6.0 includes advanced search functions and a new tool for statistical analysis of mixed sets of genomes and metagenome bins. The new IMG web user interface also has a new Help page with additional documentation and webinar tutorials to help users better understand how to use various IMG functions and tools for their research. New datasets have been processed with the prokaryotic annotation pipeline v.5, which includes extended protein family assignments.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Alexander Eng ◽  
Adrian J. Verster ◽  
Elhanan Borenstein

Abstract Background Microbial communities have become an important subject of research across multiple disciplines in recent years. These communities are often examined via shotgun metagenomic sequencing, a technology which can offer unique insights into the genomic content of a microbial community. Functional annotation of shotgun metagenomic data has become an increasingly popular method for identifying the aggregate functional capacities encoded by the community’s constituent microbes. Currently available metagenomic functional annotation pipelines, however, suffer from several shortcomings, including limited pipeline customization options, lack of standard raw sequence data pre-processing, and insufficient capabilities for integration with distributed computing systems. Results Here we introduce MetaLAFFA, a functional annotation pipeline designed to take unfiltered shotgun metagenomic data as input and generate functional profiles. MetaLAFFA is implemented as a Snakemake pipeline, which enables convenient integration with distributed computing clusters, allowing users to take full advantage of available computing resources. Default pipeline settings allow new users to run MetaLAFFA according to common practices while a Python module-based configuration system provides advanced users with a flexible interface for pipeline customization. MetaLAFFA also generates summary statistics for each step in the pipeline so that users can better understand pre-processing and annotation quality. Conclusions MetaLAFFA is a new end-to-end metagenomic functional annotation pipeline with distributed computing compatibility and flexible customization options. MetaLAFFA source code is available at https://github.com/borenstein-lab/MetaLAFFA and can be installed via Conda as described in the accompanying documentation.


Viruses ◽  
2020 ◽  
Vol 12 (8) ◽  
pp. 892 ◽  
Author(s):  
Adriano de Bernadi Schneider ◽  
Denis Jacob Machado ◽  
Sayal Guirales ◽  
Daniel A. Janies

Responding to the ongoing and severe public health threat of viruses of the family Flaviviridae, including dengue, hepatitis C, West Nile, yellow fever, and Zika, demands a greater understanding of how these viruses emerge and spread. Updated phylogenies are central to this understanding. Most cladograms of Flaviviridae focus on specific lineages and ignore outgroups, hampering the efficacy of the analysis to test ingroup monophyly and relationships. This is due to the lack of annotated Flaviviridae genomes, which has gene content variation among genera. This variation makes analysis without partitioning difficult. Therefore, we developed an annotation pipeline for the genera of Flaviviridae (Flavirirus, Hepacivirus, Pegivirus, and Pestivirus, named “Fast Loci Annotation of Viruses” (FLAVi; http://flavi-web.com/), that combines ab initio and homology-based strategies. FLAVi recovered 100% of the genes in Flavivirus and Hepacivirus genomes. In Pegivirus and Pestivirus, annotation efficiency was 100% except for one partition each. There were no false positives. The combined phylogenetic analysis of multiple genes made possible by annotation has clear impacts over the tree topology compared to phylogenies that we inferred without outgroups or data partitioning. The final tree is largely congruent with previous hypotheses and adds evidence supporting the close phylogenetic relationship between dengue and Zika.


Sign in / Sign up

Export Citation Format

Share Document