HISAT-genotype: Next Generation Genomic Analysis Platform on a Personal Computer

Mapping Intimacies ◽

10.1101/266197 ◽

2018 ◽

Cited By ~ 6

Author(s):

Daehwan Kim ◽

Joseph Paggi ◽

Steven L. Salzberg

Keyword(s):

Reference Genome ◽

Search Algorithm ◽

Genomic Analysis ◽

Hla Typing ◽

Next Generation ◽

Desktop Computer ◽

Human Organ ◽

Human Reference Genome ◽

Sequencing Technologies ◽

Analysis Platform

AbstractRapid advances in next-generation sequencing technologies have dramatically changed our ability to perform genome-scale analyses of human genomes. The human reference genome used for most genomic analyses represents only a small number of individuals, limiting its usefulness for genotyping. We designed a novel method, HISAT-genotype, for representing and searching an expanded model of the human reference genome, in which a comprehensive catalogue of known genomic variants and haplotypes is incorporated into the data structure used for searching and alignment. This strategy for representing a population of genomes, along with a very fast and memory-efficient search algorithm, enables more detailed and accurate variant analyses than previous methods. We demonstrate HISAT-genotype’s accuracy for HLA typing, a critical task in human organ transplantation, and for the DNA fingerprinting tests widely used in forensics. In both applications, HISAT-genotype not only improves upon earlier computational methods, but matches or exceeds the accuracy of laboratory-based assays.One Sentence SummaryHISAT-genotype is a software platform that has the ability to genotype all the genes in an individual’s genome within a few hours on a desktop computer.

Download Full-text

LoRTE: Detecting transposon-induced genomic variants using low coverage PacBio long read sequences

10.1101/073551 ◽

2016 ◽

Author(s):

Eric Disdero ◽

Jonathan Filée

Keyword(s):

Transposable Elements ◽

Reference Genome ◽

Genomic Analysis ◽

Bioinformatic Tools ◽

Sequencing Technologies ◽

Population Genomic ◽

Long Read ◽

Different Strains ◽

Low Coverage ◽

Ncbi Blast

AbstractMotivationPopulation genomic analysis of transposable elements has greatly benefited from recent advances of sequencing technologies. However, the propensity of transposable elements to nest in highly repeated regions of genomes limits the efficiency of bioinformatic tools when short read sequences technology is used.ResultsLoRTE is the first tool able to use PacBio long read sequences to identify transposon deletions and insertions between a reference genome and genomes of different strains or populations. Tested against Drosophila melanogaster PacBio datasets, LoRTE appears to be a reliable and broadly applicable tools to study the dynamic and evolutionary impact of transposable elements using low coverage, long read sequences.Availability and ImplementationLoRTE is available at http://www.egce.cnrs-gif.fr/?p=6422. It is written in Python 2.7 and only requires the NCBI BLAST + package. LoRTE can be used on standard computer with limited RAM resources and reasonable running time even with large [email protected]

Download Full-text

Illumina TruSeq synthetic long-reads empowerde novoassembly and resolve complex, highly repetitive transposable elements

10.1101/001834 ◽

2014 ◽

Cited By ~ 1

Author(s):

Rajiv C McCoy ◽

Ryan W Taylor ◽

Timothy A Blauwkamp ◽

Joanna L Kelley ◽

Michael Kertesz ◽

...

Keyword(s):

Transposable Elements ◽

Reference Genome ◽

De Novo ◽

Model Organism ◽

Genomic Analysis ◽

High Sequence Identity ◽

Current Reference ◽

Sequencing Technologies ◽

Long Reads ◽

Whole Genomes

High-throughput DNA sequencing technologies have revolutionized genomic analysis, including thede novoassembly of whole genomes. Nevertheless, assembly of complex genomes remains challenging, in part due to the presence of dispersed repeats which introduce ambiguity during genome reconstruction. Transposable elements (TEs) can be particularly problematic, especially for TE families exhibiting high sequence identity, high copy number, or present in complex genomic arrangements. While TEs strongly affect genome function and evolution, most currentde novoassembly approaches cannot resolve long, identical, and abundant families of TEs. Here, we applied a novel Illumina technology called TruSeq synthetic long-reads, which are generated through highly parallel library preparation and local assembly of short read data and achieve lengths of 1.5-18.5 Kbp with an extremely low error rate (∼0.03% per base). To test the utility of this technology, we sequenced and assembled the genome of the model organismDrosophila melanogaster(reference genome strainy;cn,bw,sp) achieving an N50 contig size of 69.7 Kbp and covering 96.9% of the euchromatic chromosome arms of the current reference genome. TruSeq synthetic long-read technology enables placement of individual TE copies in their proper genomic locations as well as accurate reconstruction of TE sequences. We entirely recovered and accurately placed 4,229 (77.8%) of the 5,434 of annotated transposable elements with perfect identity to the current reference genome. As TEs are ubiquitous features of genomes of many species, TruSeq synthetic long- reads, and likely other methods that generate long reads, offer a powerful approach to improvede novoassemblies of whole genomes.

Download Full-text

OR22 Towards clinical NGS HLA-typing: A performance comparison of next-generation sequencing technologies for DNA HLA typing in a clinical diagnostic environment

Human Immunology ◽

10.1016/j.humimm.2015.07.014 ◽

2015 ◽

Vol 76 ◽

pp. 7

Author(s):

Karen Sherwood ◽

Jennifer Beckrud ◽

Lenka Allan ◽

Alex Lindell ◽

Ali Crawford ◽

...

Keyword(s):

Next Generation Sequencing ◽

Performance Comparison ◽

Hla Typing ◽

Next Generation ◽

Sequencing Technologies ◽

Clinical Diagnostic ◽

Generation Sequencing ◽

A Performance

Download Full-text

Discovery of common sequences absent in the human reference genome using pooled samples from next generation sequencing

BMC Genomics ◽

10.1186/1471-2164-15-685 ◽

2014 ◽

Vol 15 (1) ◽

pp. 685 ◽

Cited By ~ 15

Author(s):

Yu Liu ◽

Mehmet Koyutürk ◽

Sean Maxwell ◽

Min Xiang ◽

Martina Veigl ◽

...

Keyword(s):

Next Generation Sequencing ◽

Reference Genome ◽

Next Generation ◽

Human Reference Genome ◽

Pooled Samples ◽

Generation Sequencing

Download Full-text

A low-cost platform suitable for sequencing-based recovery of natural variation in understudied plants

10.1101/2020.06.24.169276 ◽

2020 ◽

Author(s):

Rachel Howard-Till ◽

Claudia E. Osorio ◽

Bradley J. Till

Keyword(s):

Next Generation Sequencing ◽

Low Cost ◽

Genomic Analysis ◽

Dna Quantification ◽

Cultivated Plants ◽

Next Generation ◽

Documentation System ◽

Sequencing Technologies ◽

Do It Yourself ◽

Generation Sequencing

AbstractGenetic characterization of wild and cultivated plants provides valuable knowledge for conservation and agriculture. DNA sequencing technologies are improving and costs are dropping. Yet, analysis of many species is hindered because they grow in regions that lack infrastructure for advanced molecular biology. We developed and adapted low-cost methods that address these issues. Tissue is collected and stored in silica-gel, avoiding the need for liquid nitrogen and freezers. We have optimized low-cost home-made DNA extraction to increase yields, reduce costs, and produce DNA suitable for next generation sequencing. We also describe how to build a gel documentation system for DNA quantification. As a proof of principle, we use these methods to evaluate wild Berberis darwinii, native to Southern Chile.Method summaryWe describe a suite of low-cost do-it-yourself methods for field collection of plant tissues, extraction of genomic DNA suitable for next generation sequencing, and home-made agarose gel documentation suitable for DNA quantification. These methods enable the collection and preparation of samples for genomic analysis in regions with limited infrastructure.

Download Full-text

Sequence Alignment Through the Looking Glass

10.1101/256859 ◽

2018 ◽

Author(s):

Raja Appuswamy ◽

Jacques Fellay ◽

Nimisha Chaturvedi

Keyword(s):

Data Analysis ◽

Sequence Alignment ◽

Reference Genome ◽

State Of The Art ◽

Genomic Data ◽

Next Generation ◽

Sequencing Technologies ◽

Alignment Algorithms ◽

Genomic Data Analysis ◽

Looking Glass

AbstractRapid advances in sequencing technologies are producing genomic data on an unprecedented scale. The first, and often one of the most time consuming, step of genomic data analysis is sequence alignment, where sequenced reads must be aligned to a reference genome. Several years of research on alignment algorithms has led to the development of several state-of-the-art sequence aligners that can map tens of thousands of reads per second.In this work, we answer the question “How do sequence aligners utilize modern processors?” We examine four state-of-the-art aligners running on an Intel processor and identify that all aligners leave the processor substantially underutilized. We perform an in-depth microarchitectural analysis to explore the interaction between aligner software and processor hardware. We identify bottlenecks that lead to processor underutilization and discuss the implications of our analysis on next-generation sequence aligner design.

Download Full-text