genome sequence assembly
Recently Published Documents


TOTAL DOCUMENTS

60
(FIVE YEARS 6)

H-INDEX

16
(FIVE YEARS 0)

2021 ◽  
Vol 182 (2) ◽  
pp. 63-71
Author(s):  
M. M. Agakhanov ◽  
E. A. Grigoreva ◽  
E. K. Potokina ◽  
P. S. Ulianich ◽  
Y. V. Ukhatova

The immune North American grapevine species Vitis rotundifolia Michaux (subgen. Muscadinia Planch.) is regarded as a potential donor of disease resistance genes, withstanding such dangerous diseases of grapes as powdery and downy mildews. The cultivar ‘Dixie’ is the only representative of this species preserved ex situ in Russia: it is maintained by the N.I. Vavilov All-Russian Institute of Plant Genetic Resources (VIR) in the orchards of its branch, Krymsk Experiment Breeding Station. Third-generation sequencing on the MinION platform was performed to obtain information on the primary structure of the cultivar’s genomic DNA, employing also the results of Illumina sequencing available in databases. A detailed description of the technique with modifications at various stages is presented, as it was used for grapevine genome sequencing and whole-genome sequence assembly. The modified technique included the main stages of the original protocol recommended by the MinION producer: 1) DNA extraction; 2) preparation of libraries for sequencing; 3) MinION sequencing and bioinformatic data processing; 4) de novo whole-genome sequence assembly using only MinION data or hybrid assembly (MinION+Illumina data); and 5) functional annotation of the whole-genome assembly. Stage 4 included not only de novo sequencing, but also the analysis of the available bioinformatic data, thus minimizing errors and increasing precision during the assembly of the studied genome. The DNA isolated from the leaves of cv. ‘Dixie’ was sequenced using two MinION flow cells (R9.4.1).


2021 ◽  
Author(s):  
Kenta Shirasawa ◽  
Nobuo Kobayashi ◽  
Akira Nakatsuka ◽  
Hideya Ohta ◽  
Sachiko Isobe

To enhance the genomics and genetics of azalea, the whole-genome sequences of two species of Rhododendron were determined and analyzed in this study: Rhododendron ripense, the cytoplasmic donor and ancestral species of large-flowered and evergreen azalea cultivars, respectively; and Rhododendron kiyosumense, a native of Chiba prefecture (Japan) seldomly bred and cultivated. A chromosome-level genome sequence assembly of R. ripense was constructed by single-molecule real-time (SMRT) sequencing and genetic mapping, while the genome sequence of R. kiyosumense was assembled using the single-tube long fragment read (stLFR) sequencing technology. The R. ripense genome assembly contained 319 contigs (506.7 Mb; N50 length: 2.5 Mb) and was assigned to the genetic map to establish 13 pseudomolecule sequences. On the other hand, the genome of R. kiyosumense was assembled into 32,308 contigs (601.9 Mb; N50 length: 245.7 kb). A total of 34,606 genes were predicted in the R. ripense genome, while 35,785 flower and 48,041 leaf transcript isoforms were identified in R. kiyosumense through Iso-Seq analysis. Overall, the genome sequence information generated in this study enhances our understanding of genome evolution in the Ericales and reveals the phylogenetic relationship of closely-related species. This information will also facilitate the development of phenotypically attractive azalea cultivars.


Author(s):  
Kazuaki Yamaguchi ◽  
Mitsutaka Kadota ◽  
Osamu Nishimura ◽  
Yuta Ohishi ◽  
Yuki Naito ◽  
...  

The recent development of ecological studies has been fueled by the introduction of massive information based on chromosome-scale genome sequences, even for species for which genetic linkage is not accessible. This was enabled mainly by the application of Hi-C, a method for genome-wide chromosome conformation capture that was originally developed for investigating the long-range interaction of chromatins. Performing genomic scaffolding using Hi-C data is highly resource-demanding and employs elaborate laboratory steps for sample preparation. It starts with building a primary genome sequence assembly as an input, which is followed by computation for genome scaffolding using Hi-C data, requiring careful validation. This article presents technical considerations for obtaining optimal Hi-C scaffolding results and provides a test case of its application to a reptile species, the Madagascar ground gecko (Paroedura picta). Among the metrics that are frequently used for evaluating scaffolding results, we investigate the validity of the completeness assessment of chromosome-scale genome assemblies using single-copy reference orthologs, and report problems of the widely used program pipeline BUSCO.


2021 ◽  
Vol 10 (17) ◽  
Author(s):  
Thidathip Wongsurawat ◽  
Nuntaya Punyadee ◽  
Piroon Jenjaroenpun ◽  
Dumrong Mairiang ◽  
Nattaya Tangthawornchaikul ◽  
...  

ABSTRACT We present RNA sequencing data sets and their genome sequence assembly for dengue virus that was isolated from a patient with dengue hemorrhagic fever and serially propagated in Vero cells. RNA sequencing data obtained from the first, third, and fifth passages and their corresponding whole-genome sequences are provided in this work.


Genes ◽  
2021 ◽  
Vol 12 (4) ◽  
pp. 518
Author(s):  
Zequn Chen ◽  
Xiwu Qi ◽  
Xu Yu ◽  
Ying Zheng ◽  
Zhiqi Liu ◽  
...  

Terpenoids are a wide variety of natural products and terpene synthase (TPS) plays a key role in the biosynthesis of terpenoids. Mentha plants are rich in essential oils, whose main components are terpenoids, and their biosynthetic pathways have been basically elucidated. However, there is a lack of systematic identification and study of TPS in Mentha plants. In this work, we genome-widely identified and analyzed the TPS gene family in Mentha longifolia, a model plant for functional genomic research in the genus Mentha. A total of 63 TPS genes were identified in the M. longifolia genome sequence assembly, which could be divided into six subfamilies. The TPS-b subfamily had the largest number of genes, which might be related to the abundant monoterpenoids in Mentha plants. The TPS-e subfamily had 18 members and showed a significant species-specific expansion compared with other sequenced Lamiaceae plant species. The 63 TPS genes could be mapped to nine scaffolds of the M. longifolia genome sequence assembly and the distribution of these genes is uneven. Tandem duplicates and fragment duplicates contributed greatly to the increase in the number of TPS genes in M. longifolia. The conserved motifs (RR(X)8W, NSE/DTE, RXR, and DDXXD) were analyzed in M. longifolia TPSs, and significant differentiation was found between different subfamilies. Adaptive evolution analysis showed that M. longifolia TPSs were subjected to purifying selection after the species-specific expansion, and some amino acid residues under positive selection were identified. Furthermore, we also cloned and analyzed the catalytic activity of a single terpene synthase, MlongTPS29, which belongs to the TPS-b subfamily. MlongTPS29 could encode a limonene synthase and catalyze the biosynthesis of limonene, an important precursor of essential oils from the genus Mentha. This study provides useful information for the biosynthesis of terpenoids in the genus Mentha.


Author(s):  
Kazuaki Yamaguchi ◽  
Mitsutaka Kadota ◽  
Osamu Nishimura ◽  
Yuta Ohishi ◽  
Yuki Naito ◽  
...  

Recent development of ecological studies has been fueled by the introduction of massive information based on chromosome-scale genome sequences, even for species whose genetic linkage was previously not accessible. This was enabled mainly by the application of Hi-C, a method for genome-wide chromosome conformation capture which was originally developed for investigating long-range interaction of chromatins. Performing genomic scaffolding using Hi-C data is highly resource-demanding in elaborate laboratory steps for sequencing sample preparation, building primary genome sequence assembly as an input, and computation for genome scaffolding using Hi-C data, followed by careful validation. This article summarizes existing solutions for these steps and provides a test case of its application to a reptile species, the Madagascar ground gecko (Paroedura picta). Among frequently exerted metrics for evaluating scaffolding results, we investigate the validity of completeness assessment using single-copy reference orthologs and report problems with the widely used program pipeline BUSCO.


Gigabyte ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Shivraj Braich ◽  
Rebecca C. Baillie ◽  
German C. Spangenberg ◽  
Noel O. I. Cogan

Cannabis is a diploid species (2n = 20), the estimated haploid genome sizes of the female and male plants using flow cytometry are 818 and 843 Mb respectively. Although the genome of Cannabis has been sequenced (from hemp, wild and high-THC strains), all assemblies have significant gaps. In addition, there are inconsistencies in the chromosome numbering which limits their use. A new comprehensive draft genome sequence assembly (∼900 Mb) has been generated from the medicinal cannabis strain Cannbio-2, that produces a balanced ratio of cannabidiol and delta-9-tetrahydrocannabinol using long-read sequencing. The assembly was subsequently analysed for completeness by ordering the contigs into chromosome-scale pseudomolecules using a reference genome assembly approach, annotated and compared to other existing reference genome assemblies. The Cannbio-2 genome sequence assembly was found to be the most complete genome sequence available based on nucleotides assembled and BUSCO evaluation in Cannabis sativa with a comprehensive genome annotation. The new draft genome sequence is an advancement in Cannabis genomics permitting pan-genome analysis, genomic selection as well as genome editing.


2020 ◽  
Author(s):  
Shivraj Braich ◽  
Rebecca C. Baillie ◽  
German Spangenberg ◽  
Noel O.I. Cogan

Cannabis is a diploid species (2n = 20), the estimated haploid genome sizes of the female and male plants using flow cytometry are 818 and 843 Mb respectively. Although the genome of Cannabis has been sequenced (from hemp, wild and high-THC strains), all assemblies have significant gaps. In addition, there are inconsistencies in the chromosome numbering which limits their use. A new comprehensive draft genome sequence assembly (~900 Mb) has been generated from the medicinal cannabis strain Cannbio-2, that produces a balanced ratio of cannabidiol and delta-9-tetrahydrocannabinol using long-read sequencing. The assembly was subsequently analysed for completeness by ordering the contigs into chromosome-scale pseudomolecules using a reference genome assembly approach, annotated and compared to other existing reference genome assemblies. The Cannbio-2 genome sequence assembly was found to be the most complete genome sequence available based on nucleotides assembled and BUSCO evaluation in Cannabis sativa with a comprehensive genome annotation. The new draft genome sequence is an advancement in Cannabis genomics permitting pan-genome analysis, genomic selection as well as genome editing.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 342-342
Author(s):  
Younes Miar ◽  
Graham Plastow ◽  
Zhiquan Wang ◽  
Mehdi Sargolzaei

Abstract The fur industry is one of the oldest and the most historically significant industries in Canada. The industry has used American mink (Neovison vison) as the major source of fur for decades because of their high-quality fur and wide range of colours. This project will seek to (1) create the first accurate whole-genome sequence assembly of mink using next-generation sequencing technology to help understanding the biology and evolution of the order Carnivora, (2) design a robust and informative SNP assay for genomics discovery in mink, (3) discover genome structure and signature of selection as well as identify new genetic variants explaining variation in economically important traits, and (4) identify the genetic relationships among these traits including feed efficiency, Aleutian disease resilience, fur quality, reproductive performance, growth rate and pelt size. One hundred mink DNA samples from the Canadian Centre for Fur Animal Research at Dalhousie Agriculture Campus (Truro, Nova Scotia), and one breeding population (Millbank Fur Farm Limited, Rockwood, Ontario) were sequenced using next-generation whole-genome sequencing with more than 30x coverage to create the first SNP assay for American mink. A DNA panel composed of these sequenced mink from five color-types were assembled to identify the most homozygous individual as the reference animal for whole-genome sequence assembly development. The phenotypic data and DNA samples from 3,323 animals were collected and will be genotyped using the customized assay. The ultimate objective is to develop new tools for implementation of marker assisted selection or genomic selection in mink breeding programs for development of superior, highly efficient, and healthy animals. This approach will help improve the overall performance of the North American mink industry, which is now in difficulty due to several economic factors such as the high price of feed, declining price of fur and prevalence of diseases.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Rim El Jeni ◽  
Kais Ghedira ◽  
Monia El Bour ◽  
Sonia Abdelhak ◽  
Alia Benkahla ◽  
...  

Abstract Background Whole-genome sequencing using high throughput technologies has revolutionized and speeded up the scientific investigation of bacterial genetics, biochemistry, and molecular biology. Lactic acid bacteria (LABs) have been extensively used in fermentation and more recently as probiotics in food products that promote health. Genome sequencing and functional genomics investigations of LABs varieties provide rapid and important information about their diversity and their evolution, revealing a significant molecular basis. This study investigated the whole genome sequences of the Enterococcus faecium strain (HG937697), isolated from the mucus of freshwater fish in Tunisian dams. Genomic DNA was extracted using the Quick-GDNA kit and sequenced using the Illumina HiSeq2500 system. Sequences quality assessment was performed using FastQC software. The complete genome annotation was carried out with the Rapid Annotation using Subsystem Technology (RAST) web server then NCBI PGAAP. Results The Enterococcus faecium R.A73 assembled in 28 contigs consisting of 2,935,283 bps. The genome annotation revealed 2884 genes in total including 2834 coding sequences and 50 RNAs containing 3 rRNAs (one rRNA 16 s, one rRNA 23 s and one rRNA 5 s) and 47 tRNAs. Twenty-two genes implicated in bacteriocin production are identified within the Enterococcus faecium R.A73 strain. Conclusion Data obtained provide insights to further investigate the effective strategy for testing this Enterococcus faecium R.A73 strain in the industrial manufacturing process. Studying their metabolism with bioinformatics tools represents the future challenge and contribution to improving the utilization of the multi-purpose bacteria in food.


Sign in / Sign up

Export Citation Format

Share Document