scholarly journals A weighted sequence alignment strategy for gene structure annotation lift over from reference genome to a newly sequenced individual

2019 ◽  
Author(s):  
Baoxing Song ◽  
Qing Sang ◽  
Hai Wang ◽  
Huimin Pei ◽  
Fen Wang ◽  
...  

AbstractGenome sequences and gene structure annotation are very important for genomic analysis, while only the reference gene structure annotation is widely used for a wide range of investigations of different natural variation individuals. Herein, we are reporting the software GEAN which could lift over the reference gene structure annotation to other individuals belonging to the same or closely related species whose genome sequence was determined by whole-genome resequencing or de novo assembly. We found that inconsistent sequence alignment makes the coordinate lift over between different individual genomes unreliable, thus obscuring the lift over of gene structure annotations and genomic variants functional prediction. We designed a zebraic dynamic programming (ZDP) algorithm by providing different weights to different genetic features to refine the gene structure lift over. Using the lift over gene structure annotation as anchors, a base-pair resolution whole-genome-wide sequence alignment and variant calling pipeline for de novo assembly have been implemented. Taking Arabidopsis thaliana as example, we show that the natural variation alleles expression level of apoptosis death and defence response related genes might could be better quantified using GEAN. And GEAN could be used to refine the functional annotation of genetic variants, annotate de novo assembly genome sequence, detect syntenic blocks, improve the quantification of gene expression levels using RNA-seq data and genomic variants encoding for population genetic analysis. We expect that GEAN will be a standard gene structure annotation lift over and genome sequence alignment tool for the coming age of de novo assembly population genetics analysis.

Author(s):  
Sabyasachi Mukherjee ◽  
Zexi Cai ◽  
Anupama Mukherjee ◽  
Imsusosang Longkumer ◽  
Moonmoon Mech ◽  
...  

2017 ◽  
Vol 5 (46) ◽  
Author(s):  
Pushpa Lata ◽  
Subramaniam S. Govindarajan ◽  
Feng Qi ◽  
Jian-Liang Li ◽  
Santosh K. Maurya ◽  
...  

ABSTRACT Pantoea americana strain VS1, an extended-spectrum β-lactamase-producing epibiont, was isolated from Magnolia grandiflora in central Florida, USA. Here, we report the de novo whole-genome sequence of this strain, which consists of a total of 191 contigs spanning 5,412,831 bp, with a GC content of 57.3% and comprising 4,836 predicted coding sequences.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 751
Author(s):  
Kathleen O'Neill ◽  
Stacy Pirro

The Sweetleaf (Stevia rebaudiana: Asteraceae) is widely grown for use as a sweetener.  We present the whole genome sequence and annotation of this species.  A total of 146,838,888 paired-end reads consisting of 22.2G bases were obtained by sequencing one leaf from a commercially grown seedling.  The reads were assembled by a de-novo method followed by alignment to related species.   Annotation was performed via GenMark-ES. The raw and assembled data is publicly available via GenBank: Sequence Read Archive (SRR6792730) and Assembly (GCA_009936405).


2021 ◽  
Author(s):  
Tofazzal Islam ◽  
Nadia Afroz ◽  
ChuShin Koh ◽  
M. Nazmul Haque ◽  
Md. Jillur Rahman ◽  
...  

Abstract Background Jackfruit (Artocarpus heterophyllus Lam.) is a tropical and sub-tropical fruit tree distributed in Asia, Africa, and South America. It is the national fruit of Bangladesh and produces fruit in the summer season only. However, a year-round jackfruit variety, BARI Kanthal-3 developed by Bangladesh Agricultural Research Institute (BARI) provides fruits from September to June. This study aimed to evaluate the agronomic performance of BARI Kanthal-3 and to generate a draft whole genome sequence to obtain molecular insights of this important unique variety. Results Number of fruits, average each fruit weight, fruit yield per plant, edible portion in fruit and ß carotene content of BARI Kanthal-3 (n = 5) were 422/plant/year, 5.60 kg, 236.32 kg/year, 53.5% and 3614 mg/100g, respectively. During de novo assembly, 817.7 Mb of the BARI Kanthal-3 genome was scaffolded. However, in the reference-guided genome assembly, almost 843 Mb of the BARI Kanthal-3 genome was scaffolded. Through BUSCO assessment, 97.2% of the core genes were represented in the assembly with 1.3% and 1.5% either fragmented or missing, respectively. By comparing the single copy orthologues (SCOs) in three closely and one distantly related species of BARI Kanthal-3, 706 SCOs were found to be shared across the genomes of the five species. The phylogenetic analysis of the shared SCOs showed that A. heterophyllus is the closest species to BARI Kantal-3. The estimated genome size of BARI Kanthal-3 was 1.04 giga base pairs (Gbp) with a heterozygosity rate of 1.62%. The estimated GC content was 34.10%. Variant analysis revealed that BARI Kanthal-3 includes 5.7 M (35%) and 10.4 M (65%) simple and heterozygous single nucleotide polymorphisms (SNPs), and about 90% of all these polymorphisms are located in inter-genic regions. Conclusion The whole-genome sequence of A. heterophyllus cv. BARI Kanthal-3 reveals extremely high single nucleotide polymorphisms in inter-genic regions. The findings of this study will help better understanding the evolution, domestication, phylogenetic relationships, year-round fruiting and the markers development for molecular breeding of this highly nutritious fruit crop.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Sabyasachi Mukherjee ◽  
Zexi Cai ◽  
Anupama Mukherjee ◽  
Imsusosang Longkumer ◽  
Moonmoon Mech ◽  
...  

2017 ◽  
Vol 5 (28) ◽  
Author(s):  
Pushpa Lata ◽  
Subramaniam S. Govindarajan ◽  
Feng Qi ◽  
Jian-Liang Li ◽  
Santosh K. Maurya ◽  
...  

ABSTRACT Pantoea latae strain AS1 was isolated from the rhizophere of a cycad, Zamia floridana, in central Florida, USA. Here, we report the de novo whole-genome sequence of this strain, which consists of a total of 83 contigs spanning 4,960,415 bp, with a G+C content of 59.6%, and comprising 4,527 predicted coding sequences.


2018 ◽  
Vol 6 (11) ◽  
Author(s):  
Khawla Seddiki ◽  
François Godart ◽  
Riccardo Aiese Cigliano ◽  
Walter Sanseverino ◽  
Mohamed Barakat ◽  
...  

ABSTRACT Thraustochytrids are ecologically and biotechnologically relevant marine species. We report here the de novo assembly and annotation of the whole-genome sequence of a new thraustochytrid strain, CCAP_4062/3. The genome size was estimated at 38.7 Mb with 11,853 predicted coding sequences, and the GC content was scored at 57%.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 342-342
Author(s):  
Younes Miar ◽  
Graham Plastow ◽  
Zhiquan Wang ◽  
Mehdi Sargolzaei

Abstract The fur industry is one of the oldest and the most historically significant industries in Canada. The industry has used American mink (Neovison vison) as the major source of fur for decades because of their high-quality fur and wide range of colours. This project will seek to (1) create the first accurate whole-genome sequence assembly of mink using next-generation sequencing technology to help understanding the biology and evolution of the order Carnivora, (2) design a robust and informative SNP assay for genomics discovery in mink, (3) discover genome structure and signature of selection as well as identify new genetic variants explaining variation in economically important traits, and (4) identify the genetic relationships among these traits including feed efficiency, Aleutian disease resilience, fur quality, reproductive performance, growth rate and pelt size. One hundred mink DNA samples from the Canadian Centre for Fur Animal Research at Dalhousie Agriculture Campus (Truro, Nova Scotia), and one breeding population (Millbank Fur Farm Limited, Rockwood, Ontario) were sequenced using next-generation whole-genome sequencing with more than 30x coverage to create the first SNP assay for American mink. A DNA panel composed of these sequenced mink from five color-types were assembled to identify the most homozygous individual as the reference animal for whole-genome sequence assembly development. The phenotypic data and DNA samples from 3,323 animals were collected and will be genotyped using the customized assay. The ultimate objective is to develop new tools for implementation of marker assisted selection or genomic selection in mink breeding programs for development of superior, highly efficient, and healthy animals. This approach will help improve the overall performance of the North American mink industry, which is now in difficulty due to several economic factors such as the high price of feed, declining price of fur and prevalence of diseases.


2019 ◽  
Author(s):  
Raúl A. González-Pech ◽  
Yibi Chen ◽  
Timothy G. Stephens ◽  
Sarah Shah ◽  
Amin R. Mohamed ◽  
...  

AbstractDinoflagellates of the family Symbiodiniaceae (Order Suessiales) are predominantly symbiotic, and many are known for their association with corals. The genetic and functional diversity among Symbiodiniaceae is well acknowledged, but the genome-wide sequence divergence among these lineages remains little known. Here, we present de novo genome assemblies of five isolates from the basal genus Symbiodinium, encompassing distinct ecological niches. Incorporating existing data from Symbiodiniaceae and other Suessiales (15 genome datasets in total), we investigated genome features that are common or unique to these Symbiodiniaceae, to genus Symbiodinium, and to the individual species S. microadriaticum and S. tridacnidorum. Our whole-genome comparisons reveal extensive sequence divergence, with no sequence regions common to all 15. Based on similarity of k-mers from whole-genome sequences, the distances among Symbiodinium isolates are similar to those between isolates of distinct genera. We observed extensive structural rearrangements among symbiodiniacean genomes; those from two distinct Symbiodinium species share the most (853) syntenic gene blocks. Functions enriched in genes core to Symbiodiniaceae are also enriched in those core to Symbiodinium. Gene functions related to symbiosis and stress response exhibit similar relative abundance in all analysed genomes. Our results suggest that structural rearrangements contribute to genome sequence divergence in Symbiodiniaceae even within a same species, but the gene functions have remained largely conserved in Suessiales. This is the first comprehensive comparison of Symbiodiniaceae based on whole-genome sequence data, including comparisons at the intra-genus and intra-species levels.


Sign in / Sign up

Export Citation Format

Share Document