ISEA: Iterative Seed-Extension Algorithm for De Novo Assembly Using Paired-End Information and Insert Size Distribution

Min Li; Zhongxiang Liao; Yiming He; Jianxin Wang; Junwei Luo; Yi Pan

doi:10.1109/tcbb.2016.2550433

ISEA: Iterative Seed-Extension Algorithm for De Novo Assembly Using Paired-End Information and Insert Size Distribution

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2016.2550433 ◽

2017 ◽

Vol 14 (4) ◽

pp. 916-925 ◽

Cited By ~ 13

Author(s):

Min Li ◽

Zhongxiang Liao ◽

Yiming He ◽

Jianxin Wang ◽

Junwei Luo ◽

...

Keyword(s):

Size Distribution ◽

De Novo Assembly ◽

De Novo ◽

Insert Size ◽

Insert Size Distribution

Download Full-text

Correcting bias from stochastic insert size in read pair data — applications to structural variation detection and genome assembly

10.1101/023929 ◽

2015 ◽

Cited By ~ 1

Author(s):

Kristoffer Sahlin ◽

Mattias Frånberg ◽

Lars Arvestad

Keyword(s):

Size Distribution ◽

Genome Assembly ◽

Structural Variation ◽

De Novo ◽

State Of The Art ◽

Size Distributions ◽

Insert Size ◽

Genome Assemblies ◽

Paired Read ◽

Insert Size Distribution

Insert size distributions from paired read protocols are used for inference in bioinformatic applications such as genome assembly and structural variation detection. However, many of the models that are being used are subject to bias. This bias arises when we assume that all insert sizes within a distribution are equally likely to be observed, when in fact, size matters. These systematic errors exist in popular software even when the assumptions made about data are true. We have previously shown that bias occurs for scaffolders in genome assembly. Here, we generalize the theory and demonstrate that it is applicable in other contexts. We provide examples of bias in state-of the-art software and improve them using our model. One key application of our theory is structural variation detection using read pairs. We show that an incorrect null-hypothesis is commonly used in popular tools and can be corrected using our theory. Furthermore, we approximate the smallest size of indels that are possible to discover given an insert size distribution. Two other applications are inference of insert size distribution on \emph{de novo} genome assemblies and error correction of genome assemblies using mated reads. Our theory is implemented in a tool called GetDistr (\url{https://github.com/ksahlin/GetDistr}).

Download Full-text

EPGA: de novo assembly using the distributions of reads and insert size

Bioinformatics ◽

10.1093/bioinformatics/btu762 ◽

2014 ◽

Vol 31 (6) ◽

pp. 825-833 ◽

Cited By ~ 15

Author(s):

Junwei Luo ◽

Jianxin Wang ◽

Zhen Zhang ◽

Fang-Xiang Wu ◽

Min Li ◽

...

Keyword(s):

De Novo Assembly ◽

De Novo ◽

Insert Size

Download Full-text

Faculty Opinions recommendation of Efficient de novo assembly of single-cell bacterial genomes from short-read data sets.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13296960.14657061 ◽

2011 ◽

Author(s):

Steven Salzberg

Keyword(s):

Single Cell ◽

De Novo Assembly ◽

De Novo ◽

Data Sets ◽

Bacterial Genomes ◽

Short Read

Download Full-text

Faculty Opinions recommendation of The sequence and de novo assembly of the giant panda genome.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.2367956.1997054 ◽

2010 ◽

Author(s):

Victoria Prince

Keyword(s):

De Novo Assembly ◽

Giant Panda ◽

De Novo

Download Full-text

Initial Steps of Photosystem II de Novo Assembly and Preloading with Manganese Take Place in Biogenesis Centers in Synechocystis

The Plant Cell ◽

10.1105/tpc.111.093914 ◽

2012 ◽

Vol 24 (2) ◽

pp. 660-675 ◽

Cited By ~ 54

Author(s):

Anna Stengel ◽

Irene L. Gügel ◽

Daniel Hilger ◽

Birgit Rengstl ◽

Heinrich Jung ◽

...

Keyword(s):

Photosystem Ii ◽

De Novo Assembly ◽

De Novo

Download Full-text

Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm

Nature Methods ◽

10.1038/s41592-020-01056-5 ◽

2021 ◽

Vol 18 (2) ◽

pp. 170-175 ◽

Cited By ~ 2

Author(s):

Haoyu Cheng ◽

Gregory T. Concepcion ◽

Xiaowen Feng ◽

Haowen Zhang ◽

Heng Li

Keyword(s):

De Novo Assembly ◽

De Novo

Download Full-text

A study of transposable element-associated structural variations (TASVs) using a de novo-assembled Korean genome

Experimental & Molecular Medicine ◽

10.1038/s12276-021-00586-y ◽

2021 ◽

Author(s):

Seyoung Mun ◽

Songmi Kim ◽

Wooseok Lee ◽

Keunsoo Kang ◽

Thomas J. Meyer ◽

...

Keyword(s):

Genome Sequencing ◽

Genome Assembly ◽

De Novo ◽

Personal Genome ◽

Human Populations ◽

Whole Genome ◽

Structural Variations ◽

Insert Size ◽

Human Genomes ◽

Next Generation Sequencing Ngs

AbstractAdvances in next-generation sequencing (NGS) technology have made personal genome sequencing possible, and indeed, many individual human genomes have now been sequenced. Comparisons of these individual genomes have revealed substantial genomic differences between human populations as well as between individuals from closely related ethnic groups. Transposable elements (TEs) are known to be one of the major sources of these variations and act through various mechanisms, including de novo insertion, insertion-mediated deletion, and TE–TE recombination-mediated deletion. In this study, we carried out de novo whole-genome sequencing of one Korean individual (KPGP9) via multiple insert-size libraries. The de novo whole-genome assembly resulted in 31,305 scaffolds with a scaffold N50 size of 13.23 Mb. Furthermore, through computational data analysis and experimental verification, we revealed that 182 TE-associated structural variation (TASV) insertions and 89 TASV deletions contributed 64,232 bp in sequence gain and 82,772 bp in sequence loss, respectively, in the KPGP9 genome relative to the hg19 reference genome. We also verified structural differences associated with TASVs by comparative analysis with TASVs in recent genomes (AK1 and TCGA genomes) and reported their details. Here, we constructed a new Korean de novo whole-genome assembly and provide the first study, to our knowledge, focused on the identification of TASVs in an individual Korean genome. Our findings again highlight the role of TEs as a major driver of structural variations in human individual genomes.

Download Full-text

Corrigendum to “Transcriptome de novo assembly and analysis of differentially expressed genes related to cytoplasmic male sterility in onion” [Plant Physiol. Biochem. 125 (2018) 35–44]

Plant Physiology and Biochemistry ◽

10.1016/j.plaphy.2018.06.038 ◽

2018 ◽

Vol 129 ◽

pp. 437

Author(s):

Qiaoling Yuan ◽

Ce Song ◽

Luyao Gao ◽

Huihui Zhang ◽

Cuicui Yang ◽

...

Keyword(s):

Cytoplasmic Male Sterility ◽

Male Sterility ◽

Differentially Expressed Genes ◽

De Novo Assembly ◽

De Novo ◽

Differentially Expressed ◽

Onion Plant

Download Full-text

A long reads-based de-novo assembly of the genome of the Arlee homozygous line reveals chromosomal rearrangements in rainbow trout

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab052 ◽

2021 ◽

Author(s):

Guangtu Gao ◽

Susana Magadan ◽

Geoffrey C Waldbieser ◽

Ramey C Youngblood ◽

Paul A Wheeler ◽

...

Keyword(s):

Rainbow Trout ◽

Chromosome Number ◽

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Sequence Data ◽

Structural Variations ◽

High Coverage ◽

Haploid Chromosome Number ◽

Long Reads

Abstract Currently, there is still a need to improve the contiguity of the rainbow trout reference genome and to use multiple genetic backgrounds that will represent the genetic diversity of this species. The Arlee doubled haploid line was originated from a domesticated hatchery strain that was originally collected from the northern California coast. The Canu pipeline was used to generate the Arlee line genome de-novo assembly from high coverage PacBio long-reads sequence data. The assembly was further improved with Bionano optical maps and Hi-C proximity ligation sequence data to generate 32 major scaffolds corresponding to the karyotype of the Arlee line (2 N = 64). It is composed of 938 scaffolds with N50 of 39.16 Mb and a total length of 2.33 Gb, of which ∼95% was in 32 chromosome sequences with only 438 gaps between contigs and scaffolds. In rainbow trout the haploid chromosome number can vary from 29 to 32. In the Arlee karyotype the haploid chromosome number is 32 because chromosomes Omy04, 14 and 25 are divided into six acrocentric chromosomes. Additional structural variations that were identified in the Arlee genome included the major inversions on chromosomes Omy05 and Omy20 and additional 15 smaller inversions that will require further validation. This is also the first rainbow trout genome assembly that includes a scaffold with the sex-determination gene (sdY) in the chromosome Y sequence. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes on chromosomes Omy12 and Omy13.

Download Full-text

De Novo Assembly and Characterization of the Xenocatantops brachycerus Transcriptome

International Journal of Molecular Sciences ◽

10.3390/ijms19020520 ◽

2018 ◽

Vol 19 (2) ◽

pp. 520 ◽

Cited By ~ 5

Author(s):

Le Zhao ◽

Xinmei Zhang ◽

Zhongying Qiu ◽

Yuan Huang

Keyword(s):

De Novo Assembly ◽

De Novo

Download Full-text