Genome of an iconic Australian bird: Chromosome-scale assembly and linkage map of the superb fairy-wren (Malurus cyaneus)

Mapping Intimacies ◽

10.1101/742965 ◽

2019 ◽

Author(s):

Joshua V. Peñalba ◽

Yuan Deng ◽

Qi Fang ◽

Leo Joseph ◽

Craig Moritz ◽

...

Keyword(s):

Linkage Map ◽

Genetic Map ◽

Genome Assembly ◽

Genetic Maps ◽

Comparative Genomic ◽

High Quality ◽

Long Reads ◽

Mate Pair ◽

Malurus Cyaneus ◽

Genome Assemblies

AbstractThe superb fairy-wren, Malurus cyaneus, is one of the most iconic Australian passerine species. This species belongs to an endemic Australasian clade, Meliphagides, which diversified early in the evolution of the oscine passerines. Today, the oscine passerines comprise almost half of all avian species diversity. Despite the rapid increase of available bird genome assemblies, this part of the avian tree has not yet been represented by a high-quality reference. To rectify that, we present the first chromosome-scale genome assembly of a Meliphagides representative: the superb fairy-wren. We combined Illumina shotgun and mate-pair sequences, PacBio long-reads, and a genetic linkage map from an intensively sampled pedigree of a wild population to generate this genome assembly. Of the final assembled 1.07Gb genome, 894Mb (84.8%) was anchored onto 25 chromosomes resulting in a final scaffold N50 of 68.11 Mb. This high-quality bird genome assembly is also one of only a handful which is also accompanied by a genetic map and recombination landscape. In comparison to other pedigree-based bird genetic maps, we find that the zebrafinch (Taeniopygia) genetic map more closely resembles the fairy-wren map rather than the map from the more closely-related Ficedula flycatcher. Lastly, we also provide a predictive gene and repeat annotation of the genome assembly. This new high quality, annotated genome assembly will be an invaluable resource not only to the superb fairy-wren species and relatives but also broadly across the avian tree by providing a new reference point for comparative genomic analyses.

Download Full-text

LongStitch: High-quality genome assembly correction and scaffolding using long reads

10.1101/2021.06.17.448848 ◽

2021 ◽

Author(s):

Lauren Coombe ◽

Janet X Li ◽

Theodora Lo ◽

Johnathan Wong ◽

Vladimir Nikolic ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Draft Genome ◽

Model Organisms ◽

High Quality ◽

De Novo Genome Assembly ◽

Long Reads ◽

Long Read ◽

Genomic Regions ◽

Genome Assemblies

Background Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. Results LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 2.0-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently runs in under five hours using less than 23GB of RAM. Conclusions Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch.

Download Full-text

Restriction site-associated DNA sequencing for SNP discovery and high-density genetic map construction in southern catfish ( Silurus meridionalis )

Royal Society Open Science ◽

10.1098/rsos.172054 ◽

2018 ◽

Vol 5 (5) ◽

pp. 172054 ◽

Cited By ~ 5

Author(s):

Mimi Xie ◽

Yao Ming ◽

Feng Shao ◽

Jianbo Jian ◽

Yaoguang Zhang ◽

...

Keyword(s):

Linkage Map ◽

Genetic Map ◽

Restriction Site ◽

Genome Structure ◽

High Density ◽

Genetic Maps ◽

High Quality ◽

Silurus Meridionalis ◽

Map Construction ◽

High Density Linkage Map

Single-nucleotide polymorphism (SNP) markers and high-density genetic maps are important resources for marker-assisted selection, mapping of quantitative trait loci (QTLs) and genome structure analysis. Although linkage maps in certain catfish species have been obtained, high-density maps remain unavailable in the economically important southern catfish ( Silurus meridionalis ). Recently developed restriction site-associated DNA (RAD) markers have proved to be a promising tool for SNP detection and genetic map construction. The objective of the present study was to construct a high-density linkage map using SNPs generated by next-generation RAD sequencing in S. meridionalis for future genetic and genomic studies. An F1 population of 100 individuals was obtained by intraspecific crossing of two wild heterozygous individuals. In total, 77 634 putative high-quality bi-allelic SNPs between the parents were discovered by mapping the parents' paired-end RAD reads onto the reference contigs from both parents, of which 54.7% were transitions and 45.3% were transversions (transition/transversion ratio of 1.2). Finally, 26 714 high-quality RAD markers were grouped into 29 linkage groups by using de novo clustering methods (Stacks). Among these markers, 4514 were linked to the female genetic map, 23 718 to the male map and 6715 effective loci were linked to the integrated map spanning 5918.31 centimorgans (cM), with an average marker interval of 0.89 cM. High-resolution genetic maps are a useful tool for both marker-assisted breeding and various genome investigations in catfish, such as sequence assembly, gene localization, QTL detection and genome structure comparison. Hence, such a high-density linkage map will serve as a valuable resource for comparative genomics and fine-scale QTL mapping in catfish species.

Download Full-text

LongStitch: high-quality genome assembly correction and scaffolding using long reads

BMC Bioinformatics ◽

10.1186/s12859-021-04451-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Lauren Coombe ◽

Janet X. Li ◽

Theodora Lo ◽

Johnathan Wong ◽

Vladimir Nikolic ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Draft Genome ◽

Model Organisms ◽

High Quality ◽

De Novo Genome Assembly ◽

Long Reads ◽

Long Read ◽

Genomic Regions ◽

Genome Assemblies

Abstract Background Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. Results LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of Caenorhabditis elegans, Oryza sativa, and three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 1.2-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently improves upon human assemblies in under five hours using less than 23 GB of RAM. Conclusions Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch.

Download Full-text

High-Density Genetic Map Construction and Identification of QTLs Controlling Leaf Abscission Trait in Poncirus trifoliata

International Journal of Molecular Sciences ◽

10.3390/ijms22115723 ◽

2021 ◽

Vol 22 (11) ◽

pp. 5723

Author(s):

Yuan-Yuan Xu ◽

Sheng-Rui Liu ◽

Zhi-Meng Gan ◽

Ren-Fang Zeng ◽

Jin-Zhi Zhang ◽

...

Keyword(s):

Qtl Mapping ◽

Linkage Map ◽

Genetic Map ◽

Expression Patterns ◽

High Density ◽

Poncirus Trifoliata ◽

Leaf Abscission ◽

Comparative Genomic ◽

Map Comparison ◽

Genomic Studies

A high-density genetic linkage map is essential for genetic and genomic studies including QTL mapping, genome assembly, and comparative genomic analysis. Here, we constructed a citrus high-density linkage map using SSR and SNP markers, which are evenly distributed across the citrus genome. The integrated linkage map contains 4163 markers with an average distance of 1.12 cM. The female and male linkage maps contain 1478 and 2976 markers with genetic lengths of 1093.90 cM and 1227.03 cM, respectively. Meanwhile, a genetic map comparison demonstrates that the linear order of common markers is highly conserved between the clementine mandarin and Poncirus trifoliata. Based on this high-density integrated citrus genetic map and two years of deciduous phenotypic data, two loci conferring leaf abscission phenotypic variation were detected on scaffold 1 (including 36 genes) and scaffold 8 (including 107 genes) using association analysis. Moreover, the expression patterns of 30 candidate genes were investigated under cold stress conditions because cold temperature is closely linked with the deciduous trait. The developed high-density genetic map will facilitate QTL mapping and genomic studies, and the localization of the leaf abscission deciduous trait will be valuable for understanding the mechanism of this deciduous trait and citrus breeding.

Download Full-text

Heterogeneity in Rates of Recombination Across the Mouse Genome

Genetics ◽

10.1093/genetics/142.2.537 ◽

1996 ◽

Vol 142 (2) ◽

pp. 537-548 ◽

Cited By ~ 2

Author(s):

Michael W Nachman ◽

Gary A Churchill

Keyword(s):

Linkage Map ◽

Genetic Map ◽

Large Scale ◽

Physical Map ◽

Genetic Maps ◽

Recombination Rates ◽

Physical Maps ◽

Genomic Patterns ◽

Mary Lyon ◽

Microsatellite Linkage

Abstract If loci are randomly distributed on a physical map, the density of markers on a genetic map will be inversely proportional to recombination rate. First proposed by MARY LYON, we have used this idea to estimate recombination rates from the Drosophila melanogaster linkage map. These results were compared with results of two other studies that estimated regional recombination rates in D. melanogaster using both physical and genetic maps. The three methods were largely concordant in identifying large-scale genomic patterns of recombination. The marker density method was then applied to the Mus musculus microsatellite linkage map. The distribution of microsatellites provided evidence for heterogeneity in recombination rates. Centromeric regions for several mouse chromosomes had significantly greater numbers of markers than expected, suggesting that recombination rates were lower in these regions. In contrast, most telomeric regions contained significantly fewer markers than expected. This indicates that recombination rates are elevated at the telomeres of many mouse chromosomes and is consistent with a comparison of the genetic and cytogenetic maps in these regions. The density of markers on a genetic map may provide a generally useful way to estimate regional recombination rates in species for which genetic, but not physical, maps are available.

Download Full-text

Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab034 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Jean-Marc Aury ◽

Benjamin Istace

Keyword(s):

Single Molecule ◽

Direct Consequence ◽

High Quality ◽

Sequencing Errors ◽

Coding Regions ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Genome Assemblies

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.

Download Full-text

An enhanced molecular marker based genetic map of perennial ryegrass (Lolium perenne) reveals comparative relationships with other Poaceae genomes

Genome ◽

10.1139/g01-144 ◽

2002 ◽

Vol 45 (2) ◽

pp. 282-295 ◽

Cited By ~ 175

Author(s):

Elizabeth S Jones ◽

Natalia L Mahoney ◽

Michael D Hayward ◽

Ian P Armstead ◽

J Gilbert Jones ◽

...

Keyword(s):

Molecular Marker ◽

Linkage Map ◽

Genetic Map ◽

Lolium Perenne ◽

Perennial Ryegrass ◽

Genetic Linkage Map ◽

Population Based ◽

Genetic Maps ◽

Linkage Groups ◽

Integrated Genetic Map

A molecular-marker linkage map has been constructed for perennial ryegrass (Lolium perenne L.) using a one-way pseudo-testcross population based on the mating of a multiple heterozygous individual with a doubled haploid genotype. RFLP, AFLP, isoenzyme, and EST data from four collaborating laboratories within the International Lolium Genome Initiative were combined to produce an integrated genetic map containing 240 loci covering 811 cM on seven linkage groups. The map contained 124 codominant markers, of which 109 were heterologous anchor RFLP probes from wheat, barley, oat, and rice, allowing comparative relationships between perennial ryegrass and other Poaceae species to be inferred. The genetic maps of perennial ryegrass and the Triticeae cereals are highly conserved in terms of synteny and colinearity. This observation was supported by the general agreement of the syntenic relationships between perennial ryegrass, oat, and rice and those between the Triticeae and these species. A lower level of synteny and colinearity was observed between perennial ryegrass and oat compared with the Triticeae, despite the closer taxonomic affinity between these species. It is proposed that the linkage groups of perennial ryegrass be numbered in accordance with these syntenic relationships, to correspond to the homoeologous groups of the Triticeae cereals.Key words: Lolium perenne, genetic linkage map, RFLP, AFLP, conserved synteny.

Download Full-text

A highly contiguous nuclear genome assembly of the mandarinfish Synchiropus splendidus (Syngnathiformes: Callionymidae)

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab306 ◽

2021 ◽

Author(s):

Martin Stervander ◽

William A Cresko

Keyword(s):

Genome Assembly ◽

Nuclear Genome ◽

Phylogenetic Position ◽

Blue Color ◽

Long Reads ◽

The Family ◽

Commercially Important ◽

Genomic Resource ◽

Genome Assemblies ◽

Important Fish

Abstract The fish order Syngnathiformes has been referred to as a collection of misfit fishes, comprising commercially important fish such as red mullets as well as the highly diverse seahorses, pipefishes, and seadragons—the well-known family Syngnathidae, with their unique adaptations including male pregnancy. Another ornate member of this order is the species mandarinfish. No less than two types of chromatophores have been discovered in the spectacularly colored mandarinfish: the cyanophore (producing blue color) and the dichromatic cyano-erythrophore (producing blue and red). The phylogenetic position of mandarinfish in Syngnathiformes, and their promise of additional genetic discoveries beyond the chromatophores, made mandarinfish an appealing target for whole genome sequencing. We used linked sequences to create synthetic long reads, producing a highly contiguous genome assembly for the mandarinfish. The genome assembly comprises 483 Mbp (longest scaffold 29 Mbp), has an N50 of 12 Mbp, and an L50 of 14 scaffolds. The assembly completeness is also high, with 92.6% complete, 4.4% fragmented, and 2.9% missing out of 4,584 BUSCO genes found in ray-finned fishes. Outside the family Syngnathidae, the mandarinfish represents one of the most contiguous syngnathiform genome assemblies to date. The mandarinfish genomic resource will likely serve as a high-quality outgroup to syngnathid fish, and furthermore for research on the genomic underpinnings of the evolution of novel pigmentation.

Download Full-text

Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise

10.1101/2019.12.19.882399 ◽

2019 ◽

Cited By ~ 5

Author(s):

Valentina Peona ◽

Mozes P.K. Blom ◽

Luohao Xu ◽

Reto Burri ◽

Shawn Sullivan ◽

...

Keyword(s):

Dark Matter ◽

Genome Assembly ◽

Sex Chromosome ◽

De Novo ◽

Model Organism ◽

Technology Choice ◽

High Quality ◽

Sequencing Technologies ◽

Downstream Analysis ◽

Genome Assemblies

AbstractGenome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies have opened up a whole new world of genomic biodiversity. Although these technologies generate high-quality genome assemblies, there are still genomic regions difficult to assemble, like repetitive elements and GC-rich regions (genomic “dark matter”). In this study, we compare the efficiency of currently used sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter starting from the same sample. By adopting different de-novo assembly strategies, we were able to compare each individual draft assembly to a curated multiplatform one and identify the nature of the previously missing dark matter with a particular focus on transposable elements, multi-copy MHC genes, and GC-rich regions. Thanks to this multiplatform approach, we demonstrate the feasibility of producing a high-quality chromosome-level assembly for a non-model organism (paradise crow) for which only suboptimal samples are available. Our approach was able to reconstruct complex chromosomes like the repeat-rich W sex chromosome and several GC-rich microchromosomes. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects around the completeness of both the coding and non-coding parts of the genomes.

Download Full-text

AFLAP: Assembly-Free Linkage Analysis Pipeline using k-mers from whole genome sequencing data

10.1101/2020.09.14.296525 ◽

2020 ◽

Author(s):

Kyle Fletcher ◽

Lin Zhang ◽

Juliana Gil ◽

Rongkui Han ◽

Keri Cavanaugh ◽

...

Keyword(s):

Linkage Analysis ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genetic Map ◽

Genotyping By Sequencing ◽

Genetic Maps ◽

Whole Genome ◽

Sequencing Data ◽

Analysis Pipeline ◽

Genome Assemblies

AbstractBackgroundGenetic maps are an important resource for validation of genome assemblies, trait discovery, and breeding. Next generation sequencing has enabled production of high-density genetic maps constructed with 10,000s of markers. Most current approaches require a genome assembly to identify markers. Our Assembly Free Linkage Analysis Pipeline (AFLAP) removes this requirement by using uniquely segregating k-mers as markers to rapidly construct a genotype table and perform subsequent linkage analysis. This avoids potential biases including preferential read alignment and variant calling.ResultsThe performance of AFLAP was determined in simulations and contrasted to a conventional workflow. We tested AFLAP using 100 F2 individuals of Arabidopsis thaliana, sequenced to low coverage. Genetic maps generated using k-mers contained over 130,000 markers that were concordant with the genomic assembly. The utility of AFLAP was then demonstrated by generating an accurate genetic map using genotyping-by-sequencing data of 235 recombinant inbred lines of Lactuca spp. AFLAP was then applied to 83 F1 individuals of the oomycete Bremia lactucae, sequenced to >5x coverage. The genetic map contained over 90,000 markers ordered in 19 large linkage groups. This genetic map was used to fragment, order, orient, and scaffold the genome, resulting in a much-improved reference assembly.ConclusionsAFLAP can be used to generate high density linkage maps and improve genome assemblies of any organism when a mapping population is available using whole genome sequencing or genotyping-by-sequencing data. Genetic maps produced for B. lactucae were accurately aligned to the genome and guided significant improvements of the reference assembly.

Download Full-text