ISEA: Iterative Seed-Extension Algorithm for De Novo Assembly Using Paired-End Information and Insert Size Distribution

Author(s):  
Min Li ◽  
Zhongxiang Liao ◽  
Yiming He ◽  
Jianxin Wang ◽  
Junwei Luo ◽  
...  
2015 ◽  
Author(s):  
Kristoffer Sahlin ◽  
Mattias Frånberg ◽  
Lars Arvestad

Insert size distributions from paired read protocols are used for inference in bioinformatic applications such as genome assembly and structural variation detection. However, many of the models that are being used are subject to bias. This bias arises when we assume that all insert sizes within a distribution are equally likely to be observed, when in fact, size matters. These systematic errors exist in popular software even when the assumptions made about data are true. We have previously shown that bias occurs for scaffolders in genome assembly. Here, we generalize the theory and demonstrate that it is applicable in other contexts. We provide examples of bias in state-of the-art software and improve them using our model. One key application of our theory is structural variation detection using read pairs. We show that an incorrect null-hypothesis is commonly used in popular tools and can be corrected using our theory. Furthermore, we approximate the smallest size of indels that are possible to discover given an insert size distribution. Two other applications are inference of insert size distribution on \emph{de novo} genome assemblies and error correction of genome assemblies using mated reads. Our theory is implemented in a tool called GetDistr (\url{https://github.com/ksahlin/GetDistr}).


2014 ◽  
Vol 31 (6) ◽  
pp. 825-833 ◽  
Author(s):  
Junwei Luo ◽  
Jianxin Wang ◽  
Zhen Zhang ◽  
Fang-Xiang Wu ◽  
Min Li ◽  
...  
Keyword(s):  
De Novo ◽  

2012 ◽  
Vol 24 (2) ◽  
pp. 660-675 ◽  
Author(s):  
Anna Stengel ◽  
Irene L. Gügel ◽  
Daniel Hilger ◽  
Birgit Rengstl ◽  
Heinrich Jung ◽  
...  

2021 ◽  
Vol 18 (2) ◽  
pp. 170-175 ◽  
Author(s):  
Haoyu Cheng ◽  
Gregory T. Concepcion ◽  
Xiaowen Feng ◽  
Haowen Zhang ◽  
Heng Li
Keyword(s):  

Author(s):  
Seyoung Mun ◽  
Songmi Kim ◽  
Wooseok Lee ◽  
Keunsoo Kang ◽  
Thomas J. Meyer ◽  
...  

AbstractAdvances in next-generation sequencing (NGS) technology have made personal genome sequencing possible, and indeed, many individual human genomes have now been sequenced. Comparisons of these individual genomes have revealed substantial genomic differences between human populations as well as between individuals from closely related ethnic groups. Transposable elements (TEs) are known to be one of the major sources of these variations and act through various mechanisms, including de novo insertion, insertion-mediated deletion, and TE–TE recombination-mediated deletion. In this study, we carried out de novo whole-genome sequencing of one Korean individual (KPGP9) via multiple insert-size libraries. The de novo whole-genome assembly resulted in 31,305 scaffolds with a scaffold N50 size of 13.23 Mb. Furthermore, through computational data analysis and experimental verification, we revealed that 182 TE-associated structural variation (TASV) insertions and 89 TASV deletions contributed 64,232 bp in sequence gain and 82,772 bp in sequence loss, respectively, in the KPGP9 genome relative to the hg19 reference genome. We also verified structural differences associated with TASVs by comparative analysis with TASVs in recent genomes (AK1 and TCGA genomes) and reported their details. Here, we constructed a new Korean de novo whole-genome assembly and provide the first study, to our knowledge, focused on the identification of TASVs in an individual Korean genome. Our findings again highlight the role of TEs as a major driver of structural variations in human individual genomes.


Author(s):  
Guangtu Gao ◽  
Susana Magadan ◽  
Geoffrey C Waldbieser ◽  
Ramey C Youngblood ◽  
Paul A Wheeler ◽  
...  

Abstract Currently, there is still a need to improve the contiguity of the rainbow trout reference genome and to use multiple genetic backgrounds that will represent the genetic diversity of this species. The Arlee doubled haploid line was originated from a domesticated hatchery strain that was originally collected from the northern California coast. The Canu pipeline was used to generate the Arlee line genome de-novo assembly from high coverage PacBio long-reads sequence data. The assembly was further improved with Bionano optical maps and Hi-C proximity ligation sequence data to generate 32 major scaffolds corresponding to the karyotype of the Arlee line (2 N = 64). It is composed of 938 scaffolds with N50 of 39.16 Mb and a total length of 2.33 Gb, of which ∼95% was in 32 chromosome sequences with only 438 gaps between contigs and scaffolds. In rainbow trout the haploid chromosome number can vary from 29 to 32. In the Arlee karyotype the haploid chromosome number is 32 because chromosomes Omy04, 14 and 25 are divided into six acrocentric chromosomes. Additional structural variations that were identified in the Arlee genome included the major inversions on chromosomes Omy05 and Omy20 and additional 15 smaller inversions that will require further validation. This is also the first rainbow trout genome assembly that includes a scaffold with the sex-determination gene (sdY) in the chromosome Y sequence. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes on chromosomes Omy12 and Omy13.


2018 ◽  
Vol 19 (2) ◽  
pp. 520 ◽  
Author(s):  
Le Zhao ◽  
Xinmei Zhang ◽  
Zhongying Qiu ◽  
Yuan Huang
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document