scholarly journals A Robust Method for Finding the Automated Best Matched Genes Based on Grouping Similar Fragments of Large-Scale References for Genome Assembly

Symmetry ◽  
2017 ◽  
Vol 9 (9) ◽  
pp. 192 ◽  
Author(s):  
Jaehee Jung ◽  
Jong Kim ◽  
Young-Sik Jeong ◽  
Gangman Yi
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yu Chen ◽  
Yixin Zhang ◽  
Amy Y. Wang ◽  
Min Gao ◽  
Zechen Chong

AbstractLong-read de novo genome assembly continues to advance rapidly. However, there is a lack of effective tools to accurately evaluate the assembly results, especially for structural errors. We present Inspector, a reference-free long-read de novo assembly evaluator which faithfully reports types of errors and their precise locations. Notably, Inspector can correct the assembly errors based on consensus sequences derived from raw reads covering erroneous regions. Based on in silico and long-read assembly results from multiple long-read data and assemblers, we demonstrate that in addition to providing generic metrics, Inspector can accurately identify both large-scale and small-scale assembly errors.


2019 ◽  
Author(s):  
Priyanka Ghosh ◽  
Sriram Krishnamoorthy ◽  
Ananth Kalyanaraman

AbstractDe novo genome assembly is a fundamental problem in the field of bioinformatics, that aims to assemble the DNA sequence of an unknown genome from numerous short DNA fragments (aka reads) obtained from it. With the advent of high-throughput sequencing technologies, billions of reads can be generated in a matter of hours, necessitating efficient parallelization of the assembly process. While multiple parallel solutions have been proposed in the past, conducting a large-scale assembly at scale remains a challenging problem because of the inherent complexities associated with data movement, and irregular access footprints of memory and I/O operations. In this paper, we present a novel algorithm, called PaKman, to address the problem of performing large-scale genome assemblies on a distributed memory parallel computer. Our approach focuses on improving performance through a combination of novel data structures and algorithmic strategies for reducing the communication and I/O footprint during the assembly process. PaKman presents a solution for the two most time-consuming phases in the full genome assembly pipeline, namely, k-mer counting and contig generation.A key aspect of our algorithm is its graph data structure, which comprises fat nodes (or what we call “macro-nodes”) that reduce the communication burden during contig generation. We present an extensive performance and qualitative evaluation of our algorithm, including comparisons to other state-of-the-art parallel assemblers. Our results demonstrate the ability to achieve near-linear speedups on up to 8K cores (tested); outperform state-of-the-art distributed memory and shared memory tools in performance while delivering comparable (if not better) quality; and reduce time to solution significantly. For instance, PaKman is able to generate a high-quality set of assembled contigs for complex genomes such as the human and wheat genomes in a matter of minutes on 8K cores.


2019 ◽  
Author(s):  
Weihua Pan ◽  
Tao Jiang ◽  
Stefano Lonardi

AbstractDue to the current limitations of sequencing technologies,de novogenome assembly is typically carried out in two stages, namely contig (sequence) assembly and scaffolding. While scaffolding is computationally easier than sequence assembly, the scaffolding problem can be challenging due to the high repetitive content of eukaryotic genomes, possible mis-joins in assembled contigs and inaccuracies in the linkage information. Genome scaffolding tools either use paired-end/mate-pair/linked/Hi-C reads or genome-wide maps (optical, physical or genetic) as linkage information. Optical maps (in particular Bionano Genomics maps) have been extensively used in many recent large-scale genome assembly projects (e.g., goat, apple, barley, maize, quinoa, sea bass, among others). However, the most commonly used scaffolding tools have a serious limitation: they can only deal with one optical map at a time, forcing users to alternate or iterate over multiple maps. In this paper, we introduce a novel scaffolding algorithm called OMGS that for the first time can take advantages of multiple optical maps. OMGS solves several optimization problems to generate scaffolds with optimal contiguity and correctness. Extensive experimental results demonstrate that our tool outperforms existing methods when multiple optical maps are available, and produces comparable scaffolds using a single optical map. OMGS can be obtained fromhttps://github.com/ucrbioinfo/OMGS


2019 ◽  
Author(s):  
Mats E. Pettersson ◽  
Christina M. Rochus ◽  
Fan Han ◽  
Junfeng Chen ◽  
Jason Hill ◽  
...  

ABSTRACTThe Atlantic herring is a model species for exploring the genetic basis for ecological adaptation, due to its huge population size and extremely low genetic differentiation at selectively neutral loci. However, such studies have so far been hampered because of a highly fragmented genome assembly. Here, we deliver a chromosome-level genome assembly based on a hybrid approach combining ade novoPacBio assembly with Hi-C-supported scaffolding. The assembly comprises 26 autosomes with sizes ranging from 12.4 to 33.1 Mb and a total size, in chromosomes, of 726 Mb. The development of a high-resolution linkage map confirmed the global chromosome organization and the linear order of genomic segments along the chromosomes. A comparison between the herring genome assembly with other high-quality assemblies from bony fishes revealed few interchromosomal but frequent intrachromosomal rearrangements. The improved assembly makes the analysis of previously intractable large-scale structural variation more feasible; allowing, for example, the detection of a 7.8 Mb inversion on chromosome 12 underlying ecological adaptation. This supergene shows strong genetic differentiation between populations from the northern and southern parts of the species distribution. The chromosome-based assembly also markedly improves the interpretation of previously detected signals of selection, allowing us to reveal hundreds of independent loci associated with ecological adaptation in the Atlantic herring.


2014 ◽  
Vol 43 (D1) ◽  
pp. D690-D697 ◽  
Author(s):  
G. dos Santos ◽  
A. J. Schroeder ◽  
J. L. Goodman ◽  
V. B. Strelets ◽  
M. A. Crosby ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document