Rapid Multiplexed Genotyping of Simple Tandem Repeats using Capture and High-Throughput Sequencing

Audrey Guilmatre; Gareth Highnam; Christelle Borel; David Mittelman; Andrew J. Sharp

doi:10.1002/humu.22359

Rapid Multiplexed Genotyping of Simple Tandem Repeats using Capture and High-Throughput Sequencing

Human Mutation ◽

10.1002/humu.22359 ◽

2013 ◽

Vol 34 (9) ◽

pp. 1304-1311 ◽

Cited By ~ 20

Author(s):

Audrey Guilmatre ◽

Gareth Highnam ◽

Christelle Borel ◽

David Mittelman ◽

Andrew J. Sharp

Keyword(s):

High Throughput ◽

Tandem Repeats ◽

High Throughput Sequencing ◽

Simple Tandem Repeats

Download Full-text

GtTR: Bayesian estimation of absolute tandem repeat copy number using sequence capture and high throughput sequencing

10.1101/246108 ◽

2018 ◽

Cited By ~ 1

Author(s):

Devika Ganesamoorthy ◽

Minh Duc Cao ◽

Tania Duarte ◽

Wenhan Chen ◽

Lachlan Coin

Keyword(s):

High Throughput ◽

Tandem Repeat ◽

Copy Number ◽

Tandem Repeats ◽

High Throughput Sequencing ◽

Sequence Data ◽

Complex Diseases ◽

Sequencing Analysis ◽

Reference Dataset ◽

Long Read

ABSTRACTBackgroundTandem repeats comprise significant proportion of the human genome including coding and regulatory regions. They are highly prone to repeat number variation and nucleotide mutation due to their repetitive and unstable nature, making them a major source of genomic variation between individuals. Despite recent advances in high throughput sequencing, analysis of tandem repeats in the context of complex diseases is still hindered by technical limitations.MethodsWe report a novel targeted sequencing approach, which allows simultaneous analysis of hundreds of repeats. We developed a Bayesian algorithm, namely – GtTR - which combines information from a reference long-read dataset with a short read counting approach to genotype tandem repeats at population scale. PCR sizing analysis was used for validation.ResultsWe used a PacBio long-read sequenced sample to generate a reference tandem repeat genotype dataset with on average 13% absolute deviation from PCR sizing results. Using this reference dataset GtTR generated estimates of VNTR copy number with accuracy within 95% high posterior density (HPD) intervals of 68% and 83% for capture sequence data and 200X WGS data respectively, improving to 87% and 94% with use of a PCR reference. We show that the genotype resolution increases as a function of depth, such that the median 95% HPD interval lies within 25%, 14%, 12% and 8% of the its midpoint copy number value for 30X, 200X WGS, 395X and 800X capture sequence data respectively. We validated nine targets by PCR sizing analysis and genotype estimates from sequencing results correlated well with PCR results.ConclusionsThe novel genotyping approach described here presents a new cost-effective method to explore previously unrecognized class of repeat variation in GWAS studies of complex diseases at the population level. Further improvements in accuracy can be obtained by improving accuracy of the reference dataset.

Download Full-text

GrigoraSNPs: Optimized HTS DNA Forensic SNP Analysis

10.1101/173716 ◽

2017 ◽

Cited By ~ 3

Author(s):

Darrell O. Ricke ◽

Anna Shcherbina ◽

Adam Michaleas ◽

Philip Fremont-Smith

Keyword(s):

High Throughput ◽

Tandem Repeats ◽

High Throughput Sequencing ◽

Dna Analysis ◽

Sequence Data ◽

Snp Analysis ◽

Analysis Pipeline ◽

Sequencing Technologies ◽

High Throughput Dna Sequencing

AbstractHigh throughput DNA sequencing technologies enable improved characterization of forensic DNA samples enabling greater insights into DNA contributor(s). Current DNA forensics techniques rely upon allele sizing of short tandem repeats by capillary electrophoresis. High throughput sequencing enables forensic sample characterizations for large numbers of single nucleotide polymorphism loci. The slowest computational component of the DNA forensics analysis pipeline is the characterization of raw sequence data. This paper optimizes the SNP calling module of the DNA analysis pipeline with runtime results that scale linearly with the number of HTS sequences (patent pending)[1]. GrigoraSNPs can analyze 100 million reads in less than 5 minutes using 3 threads on a 4.0 GHz Intel i7-6700K laptop CPU.

Download Full-text