Chicago and Dovetail Hi-C proximity ligation yield chromosome length scaffolds of Ixodes scapularis genome

Mapping Intimacies ◽

10.1101/392126 ◽

2018 ◽

Cited By ~ 3

Author(s):

Andrew B. Nuss ◽

Arvind Sharma ◽

Monika Gulia-Nuss

Keyword(s):

Molecular Level ◽

Ixodes Scapularis ◽

Repetitive Sequences ◽

Chromosome Length ◽

Genome Architecture ◽

High Quality ◽

Proximity Ligation ◽

Sequencing Technologies ◽

Functional Gene Analysis ◽

High Quality Genome

AbstractA high-quality genome sequence is essential for understanding an organism on molecular level. However, the larger genomes with substantial repetitive sequences are challenging to assemble with the sequencing technologies. Hi-C technique is changing the genome architecture landscape by providing links across a variety of length scales, spanning even whole chromosomes. Ixodes scapularis haploid genome is 2.1 gbp and the current assembly consists of 369,495 scaffolds representing 57% of the genome. The fragmented genome poses challenges with functional gene analysis and an improved assembly is needed. We therefore used the Hi C technique to achieve chromosomal level assembly of tick genome. With Chicago and Dovetail Hi C assemblies, we were able to achieve 28 >10Mb sequences that correspond to 28 chromosomes in I. scapularis.

Download Full-text

Twelve quick steps for genome assembly and annotation in the classroom

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008325 ◽

2020 ◽

Vol 16 (11) ◽

pp. e1008325

Author(s):

Hyungtaek Jung ◽

Tomer Ventura ◽

J. Sook Chung ◽

Woo-Jin Kim ◽

Bo-Hye Nam ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Repetitive Sequences ◽

Genome Project ◽

Model Organisms ◽

High Quality ◽

Sequencing Technologies ◽

A Genome ◽

Sequencing Platforms ◽

High Quality Genome

Eukaryotic genome sequencing and de novo assembly, once the exclusive domain of well-funded international consortia, have become increasingly affordable, thus fitting the budgets of individual research groups. Third-generation long-read DNA sequencing technologies are increasingly used, providing extensive genomic toolkits that were once reserved for a few select model organisms. Generating high-quality genome assemblies and annotations for many aquatic species still presents significant challenges due to their large genome sizes, complexity, and high chromosome numbers. Indeed, selecting the most appropriate sequencing and software platforms and annotation pipelines for a new genome project can be daunting because tools often only work in limited contexts. In genomics, generating a high-quality genome assembly/annotation has become an indispensable tool for better understanding the biology of any species. Herein, we state 12 steps to help researchers get started in genome projects by presenting guidelines that are broadly applicable (to any species), sustainable over time, and cover all aspects of genome assembly and annotation projects from start to finish. We review some commonly used approaches, including practical methods to extract high-quality DNA and choices for the best sequencing platforms and library preparations. In addition, we discuss the range of potential bioinformatics pipelines, including structural and functional annotations (e.g., transposable elements and repetitive sequences). This paper also includes information on how to build a wide community for a genome project, the importance of data management, and how to make the data and results Findable, Accessible, Interoperable, and Reusable (FAIR) by submitting them to a public repository and sharing them with the research community.

Download Full-text

SIGAR: Inferring features of genome architecture and DNA rearrangements by split read mapping

10.1101/2020.05.05.079426 ◽

2020 ◽

Author(s):

Yi Feng ◽

Leslie Y. Beh ◽

Wei-Jen Chang ◽

Laura F. Landweber

Keyword(s):

Genome Assembly ◽

Repetitive Sequences ◽

Genome Architecture ◽

Dna Rearrangements ◽

High Quality ◽

Microbial Eukaryotes ◽

Ciliate Species ◽

Split Read ◽

High Level ◽

Genome Assemblies

AbstractCiliates are microbial eukaryotes with distinct somatic and germline genomes. Post-zygotic development involves extensive remodeling of the germline genome to form somatic chromosomes. Ciliates therefore offer a valuable model for studying the architecture and evolution of programmed genome rearrangements. Current studies usually focus on a few model species, where rearrangement features are annotated by aligning reference germline and somatic genomes. While many high-quality somatic genomes have been assembled, a high quality germline genome assembly is difficult to obtain due to its smaller DNA content and abundance of repetitive sequences. To overcome these hurdles, we propose a new pipeline SIGAR (Splitread Inference of Genome Architecture and Rearrangements) to infer germline genome architecture and rearrangement features without a germline genome assembly, requiring only short germline DNA sequencing reads. As a proof of principle, 93% of rearrangement junctions identified by SIGAR in the ciliate Oxytricha trifallax were validated by the existing germline assembly. We then applied SIGAR to six diverse ciliate species without germline genome assemblies, including Ichthyophthirius multifilii, a fish pathogen. Despite the high level of somatic DNA contamination in each sample, SIGAR successfully inferred rearrangement junctions, short eliminated sequences and potential scrambled genes in each species. This pipeline enables pilot surveys or exploration of DNA rearrangements in species with limited DNA material access, thereby providing new insights into the evolution of chromosome rearrangements.

Download Full-text

High-quality genome assembly, annotation and evolutionary analysis of the mungbean (Vigna radiata) genome

10.22541/au.160587196.63922177/v1 ◽

2020 ◽

Author(s):

Qiang Yan ◽

Qiong Wang ◽

Cheng Xuzhen ◽

Lixia Wang ◽

Prakit Somta ◽

...

Keyword(s):

Genome Assembly ◽

Vigna Radiata ◽

Crop Improvement ◽

Repetitive Sequences ◽

Gene Families ◽

Close Relative ◽

Specific Gene ◽

Evolutionary Analysis ◽

High Quality ◽

High Quality Genome

Mungbean (Vigna radiata [L.]) is an important economic crop grown in South, and East Asia. The low contiguity of the current assembly of V. radiata genome has limited its application. Here, we report a high-quality chromosome-scale assembled genome of V. radiata to facilitate the investigation of its genome characteristics and evolution. By combination of Nanopore long reads, Illumina short reads and Hi-C data, we generated a high-quality genome assembly of V. radiata, with 473.67 megabases assembled into 11 chromosomes with contig N50 and scaffold N50 of 11.3 and 42.4 megabases, respectively. A total of 52.8% of the genome was annotated as repetitive sequences, among which LTRs (long terminal repeats) were predominant (33.9%). The genome of V. radiata was predicted to contain 33,924 genes, 32,470 (95.7%) of which could be functionally annotated. Evolutionary analysis revealed an estimated divergence time of V. radiata from its close relative V. angularis of ~11.66 million years ago. In addition, 277 V. radiata specific gene families, 18 positively selected genes were detected and functionally annotated. This high-quality mungbean genome will provide valuable resources for further genetic analysis and crop improvement of mungbean and other legume species.

Download Full-text

Easy Hi-C: A simple efficient protocol for 3D genome mapping in small cell populations

10.1101/245688 ◽

2018 ◽

Cited By ~ 6

Author(s):

Leina Lu ◽

Xiaoxiao Liu ◽

Jun Peng ◽

Yan Li ◽

Fulai Jin

Keyword(s):

Genome Organization ◽

Genome Mapping ◽

Mammalian Genome ◽

Small Cell ◽

Genome Architecture ◽

Enzymatic Reactions ◽

High Quality ◽

3D Genome ◽

Proximity Ligation ◽

Genome Wide

Despite the growing interest in studying the mammalian genome organization, it is still challenging to map the DNA contacts genome-wide. Here we present easy Hi-C (eHi-C), a highly efficient method for unbiased mapping of 3D genome architecture. The eHi-C protocol only involves a series of enzymatic reactions and maximizes the recovery of DNA products from proximity ligation. We show that eHi-C can be performed with 0.1 million cells and yields high quality libraries comparable to Hi-C.

Download Full-text

High-quality genome assembly of Pseudopestalotiopsis theae, the pathogenic fungus of tea grey blight

Plant Disease ◽

10.1094/pdis-02-21-0318-a ◽

2021 ◽

Author(s):

Shiqin Zheng ◽

Ruiqi Chen ◽

Zhe Wang ◽

Juan Liu ◽

Yan Cai ◽

...

Keyword(s):

Genome Assembly ◽

Pathogenic Fungus ◽

High Quality ◽

Tea Tree ◽

Sequencing Technologies ◽

Host Interaction ◽

Long Read ◽

Infection Mechanisms ◽

Grey Blight ◽

High Quality Genome

Tea grey blight is one of the most serious foliar diseases of tea tree caused by the plant pathogenic fungus Pseudopestalotiopsis theae which can affect production and quality of tea worldwide. We generated a highly contiguous, 50.41Mbp genome assembly (N50 1.30 Mbp) of P. theae strain CYF27 by combining PacBio long-read and Illumina short-read sequencing technologies. We identified a total of 15,626 gene models, of which 1,038 genes encode putative secreted proteins. The high-quality genome assembly and annotation resource reported here will be useful for the study of fungal infection mechanisms and pathogen-host interaction.

Download Full-text

SIGAR: Inferring Features of Genome Architecture and DNA Rearrangements by Split-Read Mapping

Genome Biology and Evolution ◽

10.1093/gbe/evaa147 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1711-1718

Author(s):

Yi Feng ◽

Leslie Y Beh ◽

Wei-Jen Chang ◽

Laura F Landweber

Keyword(s):

Genome Assembly ◽

Repetitive Sequences ◽

Genome Architecture ◽

Dna Rearrangements ◽

High Quality ◽

Microbial Eukaryotes ◽

Ciliate Species ◽

Split Read ◽

High Level ◽

Genome Assemblies

Abstract Ciliates are microbial eukaryotes with distinct somatic and germline genomes. Postzygotic development involves extensive remodeling of the germline genome to form somatic chromosomes. Ciliates therefore offer a valuable model for studying the architecture and evolution of programed genome rearrangements. Current studies usually focus on a few model species, where rearrangement features are annotated by aligning reference germline and somatic genomes. Although many high-quality somatic genomes have been assembled, a high-quality germline genome assembly is difficult to obtain due to its smaller DNA content and abundance of repetitive sequences. To overcome these hurdles, we propose a new pipeline, SIGAR (Split-read Inference of Genome Architecture and Rearrangements) to infer germline genome architecture and rearrangement features without a germline genome assembly, requiring only short DNA sequencing reads. As a proof of principle, 93% of rearrangement junctions identified by SIGAR in the ciliate Oxytricha trifallax were validated by the existing germline assembly. We then applied SIGAR to six diverse ciliate species without germline genome assemblies, including Ichthyophthirius multifilii, a fish pathogen. Despite the high level of somatic DNA contamination in each sample, SIGAR successfully inferred rearrangement junctions, short eliminated sequences, and potential scrambled genes in each species. This pipeline enables pilot surveys or exploration of DNA rearrangements in species with limited DNA material access, thereby providing new insights into the evolution of chromosome rearrangements.

Download Full-text

Pseudo-chromosome–length genome assembly of a double haploid “Bartlett” pear (Pyrus communis L.)

GigaScience ◽

10.1093/gigascience/giz138 ◽

2019 ◽

Vol 8 (12) ◽

Cited By ~ 11

Author(s):

Gareth Linsmith ◽

Stephane Rombauts ◽

Sara Montanari ◽

Cecilia H Deng ◽

Jean-Marc Celton ◽

...

Keyword(s):

Double Haploid ◽

Gene Annotation ◽

Repetitive Sequences ◽

Chromosome Length ◽

Pyrus Communis ◽

Chromatin Interaction ◽

Haploid Plant ◽

High Quality ◽

European Pear ◽

Pyrus Communis L

Abstract Background We report an improved assembly and scaffolding of the European pear (Pyrus communis L.) genome (referred to as BartlettDHv2.0), obtained using a combination of Pacific Biosciences RSII long-read sequencing, Bionano optical mapping, chromatin interaction capture (Hi-C), and genetic mapping. The sample selected for sequencing is a double haploid derived from the same “Bartlett” reference pear that was previously sequenced. Sequencing of di-haploid plants makes assembly more tractable in highly heterozygous species such as P. communis. Findings A total of 496.9 Mb corresponding to 97% of the estimated genome size were assembled into 494 scaffolds. Hi-C data and a high-density genetic map allowed us to anchor and orient 87% of the sequence on the 17 pear chromosomes. Approximately 50% (247 Mb) of the genome consists of repetitive sequences. Gene annotation confirmed the presence of 37,445 protein-coding genes, which is 13% fewer than previously predicted. Conclusions We showed that the use of a doubled-haploid plant is an effective solution to the problems presented by high levels of heterozygosity and duplication for the generation of high-quality genome assemblies. We present a high-quality chromosome-scale assembly of the European pear Pyrus communis and demostrate its high degree of synteny with the genomes of Malus x Domestica and Pyrus x bretschneideri.

Download Full-text

Faculty Opinions recommendation of How can a high-quality genome assembly help plant breeders?

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.735958664.793561468 ◽

2019 ◽

Author(s):

Dirk Hincha

Keyword(s):

Genome Assembly ◽

High Quality ◽

High Quality Genome

Download Full-text

High-quality genome assembly of Huazhan and Tianfeng, the parents of an elite rice hybrid Tian-you-hua-zhan

Science China Life Sciences ◽

10.1007/s11427-020-1940-9 ◽

2021 ◽

Author(s):

Hui Zhang ◽

Yuexing Wang ◽

Ce Deng ◽

Sheng Zhao ◽

Peng Zhang ◽

...

Keyword(s):

Genome Assembly ◽

High Quality ◽

Rice Hybrid ◽

High Quality Genome

Download Full-text

Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab034 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Jean-Marc Aury ◽

Benjamin Istace

Keyword(s):

Single Molecule ◽

Direct Consequence ◽

High Quality ◽

Sequencing Errors ◽

Coding Regions ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Genome Assemblies

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.

Download Full-text