de novo genome assembly
Recently Published Documents


TOTAL DOCUMENTS

243
(FIVE YEARS 53)

H-INDEX

25
(FIVE YEARS 0)

Author(s):  
Stephanie H. Chen ◽  
Maurizio Rossetto ◽  
Marlien Merwe ◽  
Patricia Lu‐Irving ◽  
Jia‐Yee S. Yap ◽  
...  

2022 ◽  
Vol 9 ◽  
Author(s):  
Na Liu ◽  
Yongchao Niu ◽  
Guwen Zhang ◽  
Zhijuan Feng ◽  
Yuanpeng Bo ◽  
...  

Abstract Vegetable soybean is one of the most important vegetables in China, and the demand for this vegetable has markedly increased worldwide over the past two decades. Here, we present a high-quality de novo genome assembly of the vegetable soybean cultivar Zhenong 6 (ZN6), which is one of the most popular cultivars in China. The 20 pseudochromosomes cover 94.57% of the total 1.01 Gb assembly size, with contig N50 of 3.84 Mb and scaffold N50 of 48.41 Mb. A total of 55 517 protein-coding genes were annotated. Approximately 54.85% of the assembled genome was annotated as repetitive sequences, with the most abundant long terminal repeat transposable elements. Comparative genomic and phylogenetic analyses with grain soybean Williams 82, six other Fabaceae species and Arabidopsis thaliana genomes highlight the difference of ZN6 with other species. Furthermore, we resequenced 60 vegetable soybean accessions. Alongside 103 previously resequenced wild soybean and 155 previously resequenced grain soybean accessions, we performed analyses of population structure and selective sweep of vegetable, grain, and wild soybean. They were clearly divided into three clades. We found 1112 and 1047 genes under selection in the vegetable soybean and grain soybean populations compared with the wild soybean population, respectively. Among them, we identified 134 selected genes shared between vegetable soybean and grain soybean populations. Additionally, we report four sucrose synthase genes, one sucrose-phosphate synthase gene, and four sugar transport genes as candidate genes related to important traits such as seed sweetness and seed size in vegetable soybean. This study provides essential genomic resources to promote evolutionary and functional genomics studies and genomically informed breeding for vegetable soybean.


Life ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 30
Author(s):  
Konstantina Athanasopoulou ◽  
Michaela A. Boti ◽  
Panagiotis G. Adamopoulos ◽  
Paraskevi C. Skourou ◽  
Andreas Scorilas

Although next-generation sequencing (NGS) technology revolutionized sequencing, offering a tremendous sequencing capacity with groundbreaking depth and accuracy, it continues to demonstrate serious limitations. In the early 2010s, the introduction of a novel set of sequencing methodologies, presented by two platforms, Pacific Biosciences (PacBio) and Oxford Nanopore Sequencing (ONT), gave birth to third-generation sequencing (TGS). The innovative long-read technologies turn genome sequencing into an ease-of-handle procedure by greatly reducing the average time of library construction workflows and simplifying the process of de novo genome assembly due to the generation of long reads. Long sequencing reads produced by both TGS methodologies have already facilitated the decipherment of transcriptional profiling since they enable the identification of full-length transcripts without the need for assembly or the use of sophisticated bioinformatics tools. Long-read technologies have also provided new insights into the field of epitranscriptomics, by allowing the direct detection of RNA modifications on native RNA molecules. This review highlights the advantageous features of the newly introduced TGS technologies, discusses their limitations and provides an in-depth comparison regarding their scientific background and available protocols as well as their potential utility in research and clinical applications.


Plants ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 2740
Author(s):  
Yuya Liang ◽  
Shichen Wang ◽  
Chersty L. Harper ◽  
Nithya K. Subramanian ◽  
Rodante E. Tabien ◽  
...  

Global climate change has increased the number of severe flooding events that affect agriculture, including rice production in the U.S. and internationally. Heavy rainfall can cause rice plants to be completely submerged, which can significantly affect grain yield or completely destroy the plants. Recently, a major effect submergence tolerance QTL during the vegetative stage, qSub8.1, which originated from Ciherang-Sub1, was identified in a mapping population derived from a cross between Ciherang-Sub1 and IR10F365. Ciherang-Sub1 was, in turn, derived from a cross between Ciherang and IR64-Sub1. Here, we characterize the qSub8.1 region by analyzing the sequence information of Ciherang-Sub1 and its two parents (Ciherang and IR64-Sub1) and compare the whole genome profile of these varieties with the Nipponbare and Minghui 63 (MH63) reference genomes. The three rice varieties were sequenced with 150 bp pair-end whole-genome shotgun sequencing (Illumina HiSeq4000), followed by performing the Trimmomatic-SOAPdenovo2-MUMmer3 pipeline for genome assembly, resulting in approximate genome sizes of 354.4, 343.7, and 344.7 Mb, with N50 values of 25.1, 25.4, and 26.1 kb, respectively. The results showed that the Ciherang-Sub1 genome is composed of 59–63% Ciherang, 22–24% of IR64-Sub1, and 15–17% of unknown sources. The genome profile revealed a more detailed genomic composition than previous marker-assisted breeding and showed that the qSub8.1 region is mostly from Ciherang, with some introgressed segments from IR64-Sub1 and currently unknown source(s).


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yu Chen ◽  
Yixin Zhang ◽  
Amy Y. Wang ◽  
Min Gao ◽  
Zechen Chong

AbstractLong-read de novo genome assembly continues to advance rapidly. However, there is a lack of effective tools to accurately evaluate the assembly results, especially for structural errors. We present Inspector, a reference-free long-read de novo assembly evaluator which faithfully reports types of errors and their precise locations. Notably, Inspector can correct the assembly errors based on consensus sequences derived from raw reads covering erroneous regions. Based on in silico and long-read assembly results from multiple long-read data and assemblers, we demonstrate that in addition to providing generic metrics, Inspector can accurately identify both large-scale and small-scale assembly errors.


2021 ◽  
Author(s):  
Andrea Minio ◽  
Noe Cochetel ◽  
Amanda M Vondras ◽  
Melanie Massonnet ◽  
Dario Cantu

De novo genome assembly is essential for genomic research. High-quality genomes assembled into phased pseudomolecules are challenging to produce and often contain assembly errors caused by repeats, heterozygosity, or the chosen assembly strategy. Although algorithms exist that produce partially phased assemblies, haploid draft assemblies that may lack biological information remain favored because they are easier to generate and use. We developed HaploSync, a suite of tools that produces fully phased, chromosome-scale diploid genome assemblies and performs extensive quality control to limit assembly artifacts. HaploSync uses a genetic map and/or the genome of a closely related species to guide the scaffolding of a diploid assembly into phased pseudomolecules for each chromosome. It compares alternative haplotypes to identify and correct misassemblies independent of a reference, fills assembly gaps with unplaced sequences, and resolves collapsed homozygous regions. In a series of plant, fungal, and animal kingdom case studies, we demonstrate that HaploSync increases the assembly contiguity of phased chromosomes, improves completeness by filling gaps, corrects scaffolding, and correctly phases highly heterozygous, complex regions.


2021 ◽  
Vol 1 (2) ◽  
pp. 42-48
Author(s):  
Raissa Graciano ◽  
Rafael Sachetto Oliveira ◽  
Isllas Miguel dos Santos ◽  
Gabriel de Menezes Yazbeck

The predicted sequence for thousands of genes revealed by a preliminary low-coverage genome assembly is presented for Brycon orbignyanus, an endangered migratory fish. Neotropical migratory fish stocks have been drastically reduced due to accumulated environmental pressure. Brycon orbignyanus, once one of the main fisheries species in the Platine Basin, is now very rare in nature and relies on spawning programs and a few well preserved or still untouched sites. The use of high-throughput DNA sequencing is still untapped regarding the functional genome information from B. orbignyanus. In order to help bridging this gap, we present a dataset resulting from the first functional annotation from a de novo genome assembly for B. orbignyanus, from short reads (90 bp), obtained by the HiSeq 2000 platform (Illumina). The annotation was performed for scaffolds over 10 kb using the Maker pipeline, with reference sequences taken from the NCBI for the Characiformes order. This annotation resulted in the prediction of 12,734 genes, classified with the aid of PANTHER. The data presented here can facilitate the development of basic research in this threatened species, along with practical biotechnological tools for different areas, such as commercial and environmental fish spawning operations (e.g. hormonal induction, growth) and human health.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yu Xia ◽  
Zhi-Yuan Wei ◽  
Rui He ◽  
Jia-Huan Li ◽  
Zhi-Xin Wang ◽  
...  

Our previous study identified a new β-galactosidase in Erwinia sp. E602. To further understand the lactose metabolism in this strain, de novo genome assembly was conducted by using a strategy combining Illumina and PacBio sequencing technology. The whole genome of Erwinia sp. E602 includes a 4.8 Mb chromosome and a 326 kb large plasmid. A total of 4,739 genes, including 4,543 protein-coding genes, 25 rRNAs, 82 tRNAs and 7 other ncRNAs genes were annotated. The plasmid was the largest one characterized in genus Erwinia by far, and it contained a number of genes and pathways responsible for lactose metabolism and regulation. Moreover, a new plasmid-borne lac operon that lacked a typical β-galactoside transacetylase (lacA) gene was identified in the strain. Phylogenetic analysis showed that the genes lacY and lacZ in the operon were under positive selection, indicating the adaptation of lactose metabolism to the environment in Erwinia sp. E602. Our current study demonstrated that the hybrid de novo genome assembly using Illumina and PacBio sequencing technologies, as well as the metabolic pathway analysis, provided a useful strategy for better understanding of the evolution of undiscovered microbial species or strains.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Lauren Coombe ◽  
Janet X. Li ◽  
Theodora Lo ◽  
Johnathan Wong ◽  
Vladimir Nikolic ◽  
...  

Abstract Background Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. Results LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of Caenorhabditis elegans, Oryza sativa, and three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 1.2-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently improves upon human assemblies in under five hours using less than 23 GB of RAM. Conclusions Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch.


Sign in / Sign up

Export Citation Format

Share Document