Coverage-versus-Length plots, a simple quality control step for de novo yeast genome sequence assemblies

Mapping Intimacies ◽

10.1101/421347 ◽

2018 ◽

Author(s):

Alexander P. Douglass ◽

Caoimhe E. O’Brien ◽

Benjamin Offei ◽

Aisling Y. Coughlan ◽

Raúl A. Ortiz-Merino ◽

...

Keyword(s):

Genome Sequence ◽

Genome Assembly ◽

De Novo ◽

Yeast Genome ◽

Coverage Level ◽

Microsoft Excel ◽

Simple Method ◽

A Genome ◽

Low Coverage ◽

Yeast Genomes

AbstractIllumina sequencing has revolutionized yeast genomics, with prices for commercial draft genome sequencing now below $200. The popular SPAdes assembler makes it simple to generate a de novo genome assembly for any yeast species. However, whereas making genome assemblies has become routine, understanding what they contain is still challenging. Here, we show how graphing the information that SPAdes provides about the length and coverage of each scaffold can be used to investigate the nature of an assembly, and to diagnose possible problems. Scaffolds derived from mitochondrial DNA, ribosomal DNA, and yeast plasmids can be identified by their high coverage. Contaminating data, such as cross-contamination from other samples in a multiplex sequencing run, can be identified by its low coverage. Scaffolds derived from the bacteriophage PhiX174 and Lambda DNAs that are frequently used as molecular standards in Illumina protocols can also be detected. Assemblies of yeast genomes with high heterozygosity, such as interspecies hybrids, often contain two types of scaffold: regions of the genome where the two alleles assembled into two separate scaffolds and each has a coverage level C, and regions where the two alleles co-assembled (collapsed) into a single scaffold that has a coverage level 2C. Visualizing the data with Coverage-versus-Length (CVL) plots, which can be done using Microsoft Excel or Google Sheets, provides a simple method to understand the structure of a genome assembly and detect aberrant scaffolds or contigs. We provide a Python script that allows assemblies to be filtered to remove contaminants identified in CVL plots.100-word article summaryWe describe a simple new method, Coverage-versus-Length plots, for examining de novo genome sequence assemblies. These plots enable researchers to detect scaffolds that have unusually high or unusually low coverage, which allows contaminants, and scaffolds that come from atypical parts of the organism’s DNA complement, to be detected. We show that contaminants are common in yeast genomes sequenced in multiplex Illumina runs. We provide instructions for making plots using Microsoft Excel or Google Sheets, and software for filtering assemblies to remove contaminants. Contaminants can be detected and removed, even without knowing their source.

Download Full-text

Genomic Characterization Provides an Insight into the Pathogenicity of the Poplar Canker Bacterium Lonsdalea populi

Genes ◽

10.3390/genes12020246 ◽

2021 ◽

Vol 12 (2) ◽

pp. 246

Author(s):

Xiaomeng Chen ◽

Rui Li ◽

Yonglin Wang ◽

Aining Li

Keyword(s):

Genome Sequence ◽

Extracellular Enzymes ◽

De Novo ◽

Whole Genome Sequence ◽

Hybrid Poplars ◽

A Genome ◽

Conserved Genes ◽

Genomic Characterization ◽

Molecular Bases ◽

Insight Into

An emerging poplar canker caused by the gram-negative bacterium, Lonsdalea populi, has led to high mortality of hybrid poplars Populus × euramericana in China and Europe. The molecular bases of pathogenicity and bark adaptation of L. populi have become a focus of recent research. This study revealed the whole genome sequence and identified putative virulence factors of L. populi. A high-quality L. populi genome sequence was assembled de novo, with a genome size of 3,859,707 bp, containing approximately 3434 genes and 107 RNAs (75 tRNA, 22 rRNA, and 10 ncRNA). The L. populi genome contained 380 virulence-associated genes, mainly encoding for adhesion, extracellular enzymes, secretory systems, and two-component transduction systems. The genome had 110 carbohydrate-active enzyme (CAZy)-coding genes and putative secreted proteins. The antibiotic-resistance database annotation listed that L. populi was resistant to penicillin, fluoroquinolone, and kasugamycin. Analysis of comparative genomics found that L. populi exhibited the highest homology with the L. britannica genome and L. populi encompassed 1905 specific genes, 1769 dispensable genes, and 1381 conserved genes, suggesting high evolutionary diversity and genomic plasticity. Moreover, the pan genome analysis revealed that the N-5-1 genome is an open genome. These findings provide important resources for understanding the molecular basis of the pathogenicity and biology of L. populi and the poplar-bacterium interaction.

Download Full-text

The genome sequence of the European peacock butterfly, Aglais io (Linnaeus, 1758)

Wellcome Open Research ◽

10.12688/wellcomeopenres.17204.1 ◽

2021 ◽

Vol 6 ◽

pp. 258

Author(s):

Konrad Lohse ◽

Alexander Mackintosh ◽

Roger Vila ◽

◽

...

Keyword(s):

Genome Sequence ◽

Genome Assembly ◽

Sex Chromosome ◽

Gene Annotation ◽

Protein Coding ◽

Individual Male ◽

Protein Coding Genes ◽

A Genome ◽

Inachis Io

We present a genome assembly from an individual male Aglais io (also known as Inachis io and Nymphalis io) (the European peacock; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 384 megabases in span. The majority (99.91%) of the assembly is scaffolded into 31 chromosomal pseudomolecules, with the Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 11,420 protein coding genes.

Download Full-text

The genome sequence of the common malachite beetle, Malachius bipustulatus (Linnaeus, 1758)

Wellcome Open Research ◽

10.12688/wellcomeopenres.17381.1 ◽

2021 ◽

Vol 6 ◽

pp. 322

Author(s):

Liam Crowley ◽

◽

...

Keyword(s):

Genome Sequence ◽

Genome Assembly ◽

Sex Chromosome ◽

Individual Female ◽

A Genome ◽

The Common

We present a genome assembly from an individual female Malachius bipustulatus (the common malachite beetle; Arthropoda; Insecta; Coleoptera; Melyridae). The genome sequence is 544 megabases in span. The majority (99.70%) of the assembly is scaffolded into 10 chromosomal pseudomolecules, with the X sex chromosome assembled.

Download Full-text

Gene Annotation and Transcriptome Delineation on a De Novo Genome Assembly for the Reference Leishmania major Friedlin Strain

Genes ◽

10.3390/genes12091359 ◽

2021 ◽

Vol 12 (9) ◽

pp. 1359

Author(s):

Esther Camacho ◽

Sandra González-de la Fuente ◽

Jose C. Solana ◽

Alberto Rastrojo ◽

Fernando Carrasco-Ramiro ◽

...

Keyword(s):

Genome Sequence ◽

Genome Assembly ◽

Molecular Mechanisms ◽

High Throughput Sequencing ◽

Leishmania Major ◽

De Novo ◽

Gene Annotation ◽

Leishmania Species ◽

De Novo Genome Assembly ◽

Sequencing Platforms

Leishmania major is the main causative agent of cutaneous leishmaniasis in humans. The Friedlin strain of this species (LmjF) was chosen when a multi-laboratory consortium undertook the objective of deciphering the first genome sequence for a parasite of the genus Leishmania. The objective was successfully attained in 2005, and this represented a milestone for Leishmania molecular biology studies around the world. Although the LmjF genome sequence was done following a shotgun strategy and using classical Sanger sequencing, the results were excellent, and this genome assembly served as the reference for subsequent genome assemblies in other Leishmania species. Here, we present a new assembly for the genome of this strain (named LMJFC for clarity), generated by the combination of two high throughput sequencing platforms, Illumina short-read sequencing and PacBio Single Molecular Real-Time (SMRT) sequencing, which provides long-read sequences. Apart from resolving uncertain nucleotide positions, several genomic regions were reorganized and a more precise composition of tandemly repeated gene loci was attained. Additionally, the genome annotation was improved by adding 542 genes and more accurate coding-sequences defined for around two hundred genes, based on the transcriptome delimitation also carried out in this work. As a result, we are providing gene models (including untranslated regions and introns) for 11,238 genes. Genomic information ultimately determines the biology of every organism; therefore, our understanding of molecular mechanisms will depend on the availability of precise genome sequences and accurate gene annotations. In this regard, this work is providing an improved genome sequence and updated transcriptome annotations for the reference L. major Friedlin strain.

Download Full-text

The genome sequence of the European golden eagle, Aquila chrysaetos chrysaetos Linnaeus 1758

Wellcome Open Research ◽

10.12688/wellcomeopenres.16631.1 ◽

2021 ◽

Vol 6 ◽

pp. 112

Author(s):

Dan Mead ◽

Rob Ogden ◽

Anna Meredith ◽

Gabriela Peniche ◽

Michelle Smith ◽

...

Keyword(s):

Genome Sequence ◽

Sex Chromosomes ◽

Genome Assembly ◽

Golden Eagle ◽

Aquila Chrysaetos ◽

Individual Female ◽

A Genome

We present a genome assembly from an individual female Aquila chrysaetos chrysaetos (the European golden eagle; Chordata; Aves; Accipitridae). The genome sequence is 1.23 gigabases in span. The majority of the assembly is scaffolded into 28 chromosomal pseudomolecules, including the W and Z sex chromosomes.

Download Full-text

The genome sequence of the eastern grey squirrel, Sciurus carolinensis Gmelin, 1788

Wellcome Open Research ◽

10.12688/wellcomeopenres.15721.1 ◽

2020 ◽

Vol 5 ◽

pp. 27

Author(s):

Dan Mead ◽

Kathryn Fingland ◽

Rachel Cripps ◽

Roberto Portela Miguez ◽

Michelle Smith ◽

...

Keyword(s):

Genome Sequence ◽

Sex Chromosomes ◽

Genome Assembly ◽

Sciurus Carolinensis ◽

Grey Squirrel ◽

Individual Male ◽

A Genome

We present a genome assembly from an individual male Sciurus carolinensis (the eastern grey squirrel; Vertebrata; Mammalia; Eutheria; Rodentia; Sciuridae). The genome sequence is 2.82 gigabases in span. The majority of the assembly (92.3%) is scaffolded into 21 chromosomal-level scaffolds, with both X and Y sex chromosomes assembled.

Download Full-text

Genome assembly of the JD17 soybean provides a new reference genome for Comparative genomics

10.1101/2021.11.23.469778 ◽

2021 ◽

Author(s):

Xinxin Yi ◽

Jing Liu ◽

Shengcai Chen ◽

Hao Wu ◽

Min Liu ◽

...

Keyword(s):

Nitrogen Fixation ◽

Genome Assembly ◽

Reference Genome ◽

De Novo ◽

Genomic Analysis ◽

Comparative Genomic ◽

High Quality ◽

Genome Wide ◽

A Genome ◽

Cultivated Soybean

Cultivated soybean (Glycine max) is an important source for protein and oil. Many elite cultivars with different traits have been developed for different conditions. Each soybean strain has its own genetic diversity, and the availability of more high-quality soybean genomes can enhance comparative genomic analysis for identifying genetic underpinnings for its unique traits. In this study, we constructed a high-quality de novo assembly of an elite soybean cultivar Jidou 17 (JD17) with chromsome contiguity and high accuracy. We annotated 52,840 gene models and reconstructed 74,054 high-quality full-length transcripts. We performed a genome-wide comparative analysis based on the reference genome of JD17 with three published soybeans (WM82, ZH13 and W05) , which identified five large inversions and two large translocations specific to JD17, 20,984 - 46,912 PAVs spanning 13.1 - 46.9 Mb in size, and 5 - 53 large PAV clusters larger than 500kb. 1,695,741 - 3,664,629 SNPs and 446,689 - 800,489 Indels were identified and annotated between JD17 and them. Symbiotic nitrogen fixation (SNF) genes were identified and the effects from these variants were further evaluated. It was found that the coding sequences of 9 nitrogen fixation-related genes were greatly affected. The high-quality genome assembly of JD17 can serve as a valuable reference for soybean functional genomics research.

Download Full-text

De novo whole-genome assembly in interspecific hybrid table grape, ‘Shine Muscat’

10.1101/730762 ◽

2019 ◽

Cited By ~ 2

Author(s):

Kenta Shirasawa ◽

Akifumi Azuma ◽

Fumiya Taniguchi ◽

Toshiya Yamamoto ◽

Akihiko Sato ◽

...

Keyword(s):

Genome Sequence ◽

Genome Assembly ◽

De Novo ◽

Sequence Data ◽

Genome Structure ◽

Table Grape ◽

Sequencing Analysis ◽

Entire Genome ◽

Table Grapes ◽

Eukaryotic Genes

AbstractThis study presents the first genome sequence of an interspecific grape hybrid, ‘Shine Muscat’ (Vitis labruscana × V. vinifera), an elite table grape cultivar bred in Japan. The complexity of the genome structure, arising from the interspecific hybridization, necessitated the use of a sophisticated genome assembly pipeline with short-read genome sequence data. The resultant genome assemblies consisted of two types of sequences: a haplotype-phased sequence of the highly heterozygous genomes and an unphased sequence representing a “haploid” genome. The unphased sequences spanned 490.1 Mb in length, 99.4% of the estimated genome size, with 8,696 scaffold sequences with an N50 length of 13.2 Mb. The phased sequences had 15,650 scaffolds spanning 1.0 Gb with N50 of 4.2 Mb. The two sequences comprised 94.7% and 96.3% of the core eukaryotic genes, indicating that the entire genome of ‘Shine Muscat’ was represented. Examination of genome structures revealed possible genome rearrangements between the genomes of ‘Shine Muscat’ and a V. vinifera line. Furthermore, full-length transcriptome sequencing analysis revealed 13,947 gene loci on the ‘Shine Muscat’ genome, from which 26,199 transcript isoforms were transcribed. These genome resources provide new insights that could help cultivation and breeding strategies produce more high-quality table grapes such as ‘Shine Muscat’.

Download Full-text

The genome sequence of the sycamore, Acronicta aceris (Linnaeus, 1758)

Wellcome Open Research ◽

10.12688/wellcomeopenres.17354.1 ◽

2021 ◽

Vol 6 ◽

pp. 326

Author(s):

Douglas Boyes ◽

Liam Crowley ◽

Peter W.H. Holland ◽

◽

...

Keyword(s):

Genome Sequence ◽

Genome Assembly ◽

Sex Chromosome ◽

Individual Female ◽

A Genome ◽

Lepidoptera Noctuidae

We present a genome assembly from an individual female Acronicta aceris (the sycamore; Arthropoda; Insecta; Lepidoptera; Noctuidae). The genome sequence is 466 megabases in span. The complete assembly is scaffolded into 32 chromosomal pseudomolecules, with the W and Z sex chromosome assembled.

Download Full-text

Highly Contiguous Nanopore Genome Assembly of Chlamydomonas reinhardtii CC-1690

Microbiology Resource Announcements ◽

10.1128/mra.00726-20 ◽

2020 ◽

Vol 9 (37) ◽

Author(s):

Samuel O’Donnell ◽

Frederic Chaux ◽

Gilles Fischer

Keyword(s):

Chlamydomonas Reinhardtii ◽

Genome Size ◽

Genome Assembly ◽

Reference Genome ◽

De Novo ◽

Content Type ◽

Oxford Nanopore ◽

A Genome ◽

Reference Quality

ABSTRACT The current Chlamydomonas reinhardtii reference genome remains fragmented due to gaps stemming from large repetitive regions. To overcome the vast majority of these gaps, publicly available Oxford Nanopore Technology data were used to create a new reference-quality de novo genome assembly containing only 21 contigs, 30/34 telomeric ends, and a genome size of 111 Mb.

Download Full-text