scholarly journals Alternative applications of whole genome de novo assembly in animal genomics

2017 ◽  
Author(s):  
◽  
Lynsey Whitacre

Genome sequencing is the process by which the sequence of deoxyribonucleic acid (DNA) residues that compromise the genome, or complete set of genetic materials of an organism or individual, is determined. Down-stream analysis of genome sequencing data requires that short reads be compiled into contiguous sequences. These methods, called de novo assembly, are based in statistical methods and graph theory. In addition to genome assembly, the research presented in this dissertation demonstrates the alternative use of these methods. Using these novel approaches, de novo assembly algorithms can be utilized to gain insight into commensal and parasitic organisms of livestock, genes containing candidate mutations for genetic defects, and population-level and species-level variation in a poorly studied organisms.

2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Jian-Jun Jin ◽  
Wen-Bin Yu ◽  
Jun-Bo Yang ◽  
Yu Song ◽  
Claude W. dePamphilis ◽  
...  

Abstract GetOrganelle is a state-of-the-art toolkit to accurately assemble organelle genomes from whole genome sequencing data. It recruits organelle-associated reads using a modified “baiting and iterative mapping” approach, conducts de novo assembly, filters and disentangles the assembly graph, and produces all possible configurations of circular organelle genomes. For 50 published plant datasets, we are able to reassemble the circular plastomes from 47 datasets using GetOrganelle. GetOrganelle assemblies are more accurate than published and/or NOVOPlasty-reassembled plastomes as assessed by mapping. We also assemble complete mitochondrial genomes using GetOrganelle. GetOrganelle is freely released under a GPL-3 license (https://github.com/Kinggerm/GetOrganelle).


Genes ◽  
2018 ◽  
Vol 9 (10) ◽  
pp. 486 ◽  
Author(s):  
Adam Ameur ◽  
Huiwen Che ◽  
Marcel Martin ◽  
Ignas Bunikis ◽  
Johan Dahlberg ◽  
...  

The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Wei Zhou ◽  
Qi Chen ◽  
Xiao-Bing Wang ◽  
Tyler O. Hughes ◽  
Jian-Jun Liu ◽  
...  

An amendment to this paper has been published and can be accessed via a link at the top of the paper.


BMC Genomics ◽  
2014 ◽  
Vol 15 (1) ◽  
pp. 453 ◽  
Author(s):  
Steven A Yates ◽  
Martin T Swain ◽  
Matthew J Hegarty ◽  
Igor Chernukin ◽  
Matthew Lowe ◽  
...  

BMC Genomics ◽  
2011 ◽  
Vol 12 (1) ◽  
Author(s):  
Yanliang Jiang ◽  
Jianguo Lu ◽  
Eric Peatman ◽  
Huseyin Kucuktas ◽  
Shikai Liu ◽  
...  

2017 ◽  
Author(s):  
Adriana Munoz ◽  
Boris Yamrom ◽  
Yoon-ha Lee ◽  
Peter Andrews ◽  
Steven Marks ◽  
...  

AbstractCopy number profiling and whole-exome sequencing has allowed us to make remarkable progress in our understanding of the genetics of autism over the past ten years, but there are major aspects of the genetics that are unresolved. Through whole-genome sequencing, additional types of genetic variants can be observed. These variants are abundant and to know which are functional is challenging. We have analyzed whole-genome sequencing data from 510 of the Simons Simplex Collections quad families and focused our attention on intronic variants. Within the introns of 546 high-quality autism target genes, we identified 63 de novo indels in the affected and only 37 in the unaffected siblings. The difference of 26 events is significantly larger than expected (p-val = 0.01) and using reasonable extrapolation shows that de novo intronic indels can contribute to at least 10% of simplex autism. The significance increases if we restrict to the half of the autism targets that are intolerant to damaging variants in the normal human population, which half we expect to be even more enriched for autism genes. For these 273 targets we observe 43 and 20 events in affected and unaffected siblings, respectively (p-value of 0.005). There was no significant signal in the number of de novo intronic indels in any of the control sets of genes analyzed. We see no signal from de novo substitutions in the introns of target genes.


Sign in / Sign up

Export Citation Format

Share Document