scholarly journals HISAT-genotype: Next Generation Genomic Analysis Platform on a Personal Computer

2018 ◽  
Author(s):  
Daehwan Kim ◽  
Joseph Paggi ◽  
Steven L. Salzberg

AbstractRapid advances in next-generation sequencing technologies have dramatically changed our ability to perform genome-scale analyses of human genomes. The human reference genome used for most genomic analyses represents only a small number of individuals, limiting its usefulness for genotyping. We designed a novel method, HISAT-genotype, for representing and searching an expanded model of the human reference genome, in which a comprehensive catalogue of known genomic variants and haplotypes is incorporated into the data structure used for searching and alignment. This strategy for representing a population of genomes, along with a very fast and memory-efficient search algorithm, enables more detailed and accurate variant analyses than previous methods. We demonstrate HISAT-genotype’s accuracy for HLA typing, a critical task in human organ transplantation, and for the DNA fingerprinting tests widely used in forensics. In both applications, HISAT-genotype not only improves upon earlier computational methods, but matches or exceeds the accuracy of laboratory-based assays.One Sentence SummaryHISAT-genotype is a software platform that has the ability to genotype all the genes in an individual’s genome within a few hours on a desktop computer.

2016 ◽  
Author(s):  
Eric Disdero ◽  
Jonathan Filée

AbstractMotivationPopulation genomic analysis of transposable elements has greatly benefited from recent advances of sequencing technologies. However, the propensity of transposable elements to nest in highly repeated regions of genomes limits the efficiency of bioinformatic tools when short read sequences technology is used.ResultsLoRTE is the first tool able to use PacBio long read sequences to identify transposon deletions and insertions between a reference genome and genomes of different strains or populations. Tested against Drosophila melanogaster PacBio datasets, LoRTE appears to be a reliable and broadly applicable tools to study the dynamic and evolutionary impact of transposable elements using low coverage, long read sequences.Availability and ImplementationLoRTE is available at http://www.egce.cnrs-gif.fr/?p=6422. It is written in Python 2.7 and only requires the NCBI BLAST + package. LoRTE can be used on standard computer with limited RAM resources and reasonable running time even with large [email protected]


2014 ◽  
Author(s):  
Rajiv C McCoy ◽  
Ryan W Taylor ◽  
Timothy A Blauwkamp ◽  
Joanna L Kelley ◽  
Michael Kertesz ◽  
...  

High-throughput DNA sequencing technologies have revolutionized genomic analysis, including thede novoassembly of whole genomes. Nevertheless, assembly of complex genomes remains challenging, in part due to the presence of dispersed repeats which introduce ambiguity during genome reconstruction. Transposable elements (TEs) can be particularly problematic, especially for TE families exhibiting high sequence identity, high copy number, or present in complex genomic arrangements. While TEs strongly affect genome function and evolution, most currentde novoassembly approaches cannot resolve long, identical, and abundant families of TEs. Here, we applied a novel Illumina technology called TruSeq synthetic long-reads, which are generated through highly parallel library preparation and local assembly of short read data and achieve lengths of 1.5-18.5 Kbp with an extremely low error rate (∼0.03% per base). To test the utility of this technology, we sequenced and assembled the genome of the model organismDrosophila melanogaster(reference genome strainy;cn,bw,sp) achieving an N50 contig size of 69.7 Kbp and covering 96.9% of the euchromatic chromosome arms of the current reference genome. TruSeq synthetic long-read technology enables placement of individual TE copies in their proper genomic locations as well as accurate reconstruction of TE sequences. We entirely recovered and accurately placed 4,229 (77.8%) of the 5,434 of annotated transposable elements with perfect identity to the current reference genome. As TEs are ubiquitous features of genomes of many species, TruSeq synthetic long- reads, and likely other methods that generate long reads, offer a powerful approach to improvede novoassemblies of whole genomes.


BMC Genomics ◽  
2014 ◽  
Vol 15 (1) ◽  
pp. 685 ◽  
Author(s):  
Yu Liu ◽  
Mehmet Koyutürk ◽  
Sean Maxwell ◽  
Min Xiang ◽  
Martina Veigl ◽  
...  

2020 ◽  
Author(s):  
Rachel Howard-Till ◽  
Claudia E. Osorio ◽  
Bradley J. Till

AbstractGenetic characterization of wild and cultivated plants provides valuable knowledge for conservation and agriculture. DNA sequencing technologies are improving and costs are dropping. Yet, analysis of many species is hindered because they grow in regions that lack infrastructure for advanced molecular biology. We developed and adapted low-cost methods that address these issues. Tissue is collected and stored in silica-gel, avoiding the need for liquid nitrogen and freezers. We have optimized low-cost home-made DNA extraction to increase yields, reduce costs, and produce DNA suitable for next generation sequencing. We also describe how to build a gel documentation system for DNA quantification. As a proof of principle, we use these methods to evaluate wild Berberis darwinii, native to Southern Chile.Method summaryWe describe a suite of low-cost do-it-yourself methods for field collection of plant tissues, extraction of genomic DNA suitable for next generation sequencing, and home-made agarose gel documentation suitable for DNA quantification. These methods enable the collection and preparation of samples for genomic analysis in regions with limited infrastructure.


2018 ◽  
Author(s):  
Raja Appuswamy ◽  
Jacques Fellay ◽  
Nimisha Chaturvedi

AbstractRapid advances in sequencing technologies are producing genomic data on an unprecedented scale. The first, and often one of the most time consuming, step of genomic data analysis is sequence alignment, where sequenced reads must be aligned to a reference genome. Several years of research on alignment algorithms has led to the development of several state-of-the-art sequence aligners that can map tens of thousands of reads per second.In this work, we answer the question “How do sequence aligners utilize modern processors?” We examine four state-of-the-art aligners running on an Intel processor and identify that all aligners leave the processor substantially underutilized. We perform an in-depth microarchitectural analysis to explore the interaction between aligner software and processor hardware. We identify bottlenecks that lead to processor underutilization and discuss the implications of our analysis on next-generation sequence aligner design.


2009 ◽  
Vol 11 (1) ◽  
pp. 31-46 ◽  
Author(s):  
Michael L. Metzker

Sign in / Sign up

Export Citation Format

Share Document