A massively parallel strategy for STR marker development, capture, and genotyping

Mapping Intimacies ◽

10.1101/063727 ◽

2016 ◽

Author(s):

Logan Kistler ◽

Stephen M. Johnson ◽

Mitchell T. Irwin ◽

Edward E. Louis ◽

Aakrosh Ratan ◽

...

Keyword(s):

Reference Genome ◽

Massively Parallel Sequencing ◽

Genetic Research ◽

Test Group ◽

Massively Parallel ◽

Fecal Dna ◽

Marker Development ◽

Sequencing Data ◽

Str Loci ◽

Link Type

AbstractShort tandem repeat (STRs or microsatellites) variants, are highly polymorphic markers that facilitate powerful, high-precision population genetic analyses. STRs are especially valuable in conservation and ecological genetic research, yielding detailed information on population structure and short-term demographic flux. However, STR marker development and analysis by conventional PCR-based methods imposes a workflow bottleneck and is suboptimal for noninvasive sampling strategies such as fecal DNA recovery. While massively parallel sequencing has not previously been leveraged for scalable, efficient STR recovery, here we present a pipeline for developing STR markers directly from high-throughput shotgun sequencing data without requiring a reference genome assembly, and a methodological approach for highly parallel recovery of enriched STR loci. We first employed our approach to design and capture a panel of 5,000 STR loci from a test group of diademed sifakas (Propithecus diadema, n=3), endangered Malagasy rainforest lemurs, and we report extremely efficient recovery of targeted loci—97.3-99.6% of STRs characterized with ≥10x non-redundant coverage. Second, we tested our STR capture strategy on a P. diadema fecal DNA preparation, and report robust initial results and methodological suggestions for future implementations. In addition to STR targets, this approach also generates large, genome-wide single nucleotide polymorphism (SNP) panels from regions flanking the STR loci. Our method provides a cost-effective and highly scalable solution for rapid recovery of large STR and SNP datasets in any species without need for a reference genome, and can be used even with suboptimal DNA, which is more easily acquired in conservation and ecological genetic studies.Data DepositionRaw sequencing data are available under Study Accession numbers SRP073167 (genomic shotgun data for Oberon and Tatiana) and SRP076225 (targeted re-sequencing data) from the NCBI Sequence Read Archive. BaitSTR software is available at Github (core BaitSTR programs: https://github.com/aakrosh/BaitSTR; BaitSTR_type.pl companion script for genotyping and block manipulation: https://github.com/lkistler/BaitSTR_type).

Download Full-text

Estimating number of contributors in massively parallel sequencing data of STR loci

Forensic Science International Genetics ◽

10.1016/j.fsigen.2018.09.007 ◽

2019 ◽

Vol 38 ◽

pp. 15-22 ◽

Cited By ~ 4

Author(s):

Brian A Young ◽

Katherine Butler Gettings ◽

Bruce McCord ◽

Peter M. Vallone

Keyword(s):

Massively Parallel Sequencing ◽

Massively Parallel ◽

Sequencing Data ◽

Parallel Sequencing ◽

Str Loci

Download Full-text

Noninvasive Prenatal Diagnosis of Monogenic Diseases by Targeted Massively Parallel Sequencing of Maternal Plasma: Application to β-Thalassemia

Clinical Chemistry ◽

10.1373/clinchem.2012.189589 ◽

2012 ◽

Vol 58 (10) ◽

pp. 1467-1475 ◽

Cited By ~ 112

Author(s):

Kwan-Wood G Lam ◽

Peiyong Jiang ◽

Gary J W Liao ◽

K C Allen Chan ◽

Tak Y Leung ◽

...

Keyword(s):

Prenatal Diagnosis ◽

Massively Parallel Sequencing ◽

Globin Gene ◽

Maternal Plasma ◽

Massively Parallel ◽

Sequencing Data ◽

Plasma Dna ◽

Parallel Sequencing ◽

Noninvasive Prenatal Diagnosis ◽

Monogenic Diseases

Abstract BACKGROUND A genomewide genetic and mutational profile of a fetus was recently determined via deep sequencing of maternal plasma DNA. This technology could have important applications for noninvasive prenatal diagnosis (NIPD) of many monogenic diseases. Relative haplotype dosage (RHDO) analysis, a core step of this procedure, would allow one to elucidate the maternally inherited half of the fetal genome. For clinical applications, the cost and complexity of data analysis might be reduced via targeted application of this approach to selected genomic regions containing disease-causing genes. There is thus a need to explore the feasibility of performing RHDO analysis in a targeted manner. METHODS We performed target enrichment by using solution-phase hybridization followed by massively parallel sequencing of the β-globin gene region in 2 families undergoing prenatal diagnosis for β-thalassemia. We used digital PCR strategies to physically deduce parental haplotypes. Finally, we performed RHDO analysis with target-enriched sequencing data and parental haplotypes to reveal the β-thalassemic status for the fetuses. RESULTS A mean sequencing depth of 206-fold was achieved in the β-globin gene region by targeted sequencing of maternal plasma DNA. RHDO analysis was successful for the sequencing data obtained from the target-enriched samples, including a region in one of the families in which the parents had similar haplotype structures. Data analysis revealed that both fetuses were heterozygous carriers of β-thalassemia. CONCLUSIONS Targeted sequencing of maternal plasma DNA for NIPD of monogenic diseases is feasible.

Download Full-text

SICTIN: Rapid footprinting of massively parallel sequencing data

BioData Mining ◽

10.1186/1756-0381-3-4 ◽

2010 ◽

Vol 3 (1) ◽

Cited By ~ 7

Author(s):

Stefan Enroth ◽

Robin Andersson ◽

Claes Wadelius ◽

Jan Komorowski

Keyword(s):

Massively Parallel Sequencing ◽

Massively Parallel ◽

Sequencing Data ◽

Parallel Sequencing

Download Full-text

High polymorphism detected by massively parallel sequencing of autosomal STRs using old blood samples from a Chinese Han population

Scientific Reports ◽

10.1038/s41598-019-55282-9 ◽

2019 ◽

Vol 9 (1) ◽

Author(s):

Wenshen Dai ◽

Yajiao Pan ◽

Xiaochen Sun ◽

Riga Wu ◽

Luo Li ◽

...

Keyword(s):

Massively Parallel Sequencing ◽

Chinese Han Population ◽

Massively Parallel ◽

Library Preparation ◽

Chinese Han ◽

Discrimination Power ◽

Blood Samples ◽

Parallel Sequencing ◽

Han Population ◽

Str Loci

AbstractThe development of massively parallel sequencing (MPS) has quickly changed forensic short tandem repeat (STR) genotyping. By providing detailed sequence information, MPS technology may be used as an alternative or additional method to overcome the limitations of capillary electrophoresis-based STR profiling. Most current NGS processes are labour-intensive with regard to library preparation and require high-quality DNA template. In this study, a 16-plex STR typing system (SeqType®R16) was used to achieve direct library preparation without DNA extraction and adaptor ligation. The efficiency of this system was tested in 601 individuals, including 593 old blood samples from the Chinese Han population and eight positive controls. It took approximately 4 hours for library preparation, including blood direct multiplex PCR (1.5 hours), mixing of the product (15 minutes), single tube purification (2 hours) and quantification (15 minutes). The results showed that MPS presented a broader allele range and higher discrimination power. Except for FGA and D19S433, the allele number almost doubled or more than doubled at all complex STR loci and simple STR loci, including D13S317, D16S539, D5S818, and D7S820. The range of discrimination power increased from 0.8008–0.9572 to 0.8401–0.9753, and the culminated matching probability decreased from 1.7 × 10−15 to 1.1 × 10−17.

Download Full-text

HTSeq-Hadoop: Extending HTSeq for Massively Parallel Sequencing Data Analysis Using Hadoop

2014 IEEE 10th International Conference on e-Science ◽

10.1109/escience.2014.27 ◽

2014 ◽

Cited By ~ 2

Author(s):

Alexey Siretskiy ◽

Ola Spjuth

Keyword(s):

Data Analysis ◽

Massively Parallel Sequencing ◽

Massively Parallel ◽

Sequencing Data ◽

Parallel Sequencing ◽

Sequencing Data Analysis

Download Full-text

Detection of CYP2C19 Genetic Variants in Malaysian Orang Asli from Massively Parallel Sequencing Data

PLoS ONE ◽

10.1371/journal.pone.0164169 ◽

2016 ◽

Vol 11 (10) ◽

pp. e0164169 ◽

Cited By ~ 3

Author(s):

Geik Yong Ang ◽

Choo Yee Yu ◽

Vinothini Subramaniam ◽

Mohd Ikhmal Hanif Abdul Khalid ◽

Tuan Azlin Tuan Abdu Aziz ◽

...

Keyword(s):

Genetic Variants ◽

Massively Parallel Sequencing ◽

Massively Parallel ◽

Sequencing Data ◽

Parallel Sequencing ◽

Orang Asli

Download Full-text

State of the art de novo assembly of human genomes from massively parallel sequencing data

Human Genomics ◽

10.1186/1479-7364-4-4-271 ◽

2010 ◽

Vol 4 (4) ◽

pp. 271 ◽

Cited By ~ 49

Author(s):

Yingrui Li ◽

Yujie Hu ◽

Lars Bolund ◽

Jun Wang

Keyword(s):

De Novo Assembly ◽

De Novo ◽

State Of The Art ◽

Massively Parallel Sequencing ◽

Massively Parallel ◽

Sequencing Data ◽

Parallel Sequencing ◽

Human Genomes

Download Full-text

Questioning the prevalence and reliability of human mitochondrial DNA heteroplasmy from massively parallel sequencing data

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1413478111 ◽

2014 ◽

Vol 111 (43) ◽

pp. E4546-E4547 ◽

Cited By ~ 11

Author(s):

R. S. Just ◽

J. A. Irwin ◽

W. Parson

Keyword(s):

Mitochondrial Dna ◽

Massively Parallel Sequencing ◽

Massively Parallel ◽

Sequencing Data ◽

Human Mitochondrial Dna ◽

Parallel Sequencing ◽

Mitochondrial Dna Heteroplasmy

Download Full-text

Whisper: Read sorting allows robust mapping of sequencing data

10.1101/240358 ◽

2017 ◽

Author(s):

Sebastian Deorowicz ◽

Agnieszka Debudaj-Grabysz ◽

Adam Gudyś ◽

Szymon Grabowski

Keyword(s):

Reference Genome ◽

Variant Calling ◽

Real Data ◽

Supplementary Information ◽

Sequencing Data ◽

Suffix Arrays ◽

Link Type ◽

Mapping Tool ◽

Reverse Complement ◽

Comparable Accuracy

AbstractMotivationMapping reads to a reference genome is often the first step in a sequencing data analysis pipeline. Mistakes made at this computationally challenging stage cannot be recovered easily.ResultsWe present Whisper, an accurate and high-performant mapping tool, based on the idea of sorting reads and then mapping them against suffix arrays for the reference genome and its reverse complement. Employing task and data parallelism as well as storing temporary data on disk result in superior time efficiency at reasonable memory requirements. Whisper excels at large NGS read collections, in particular Illumina reads with typical WGS coverage. The experiments with real data indicate that our solution works in about 15% of the time needed by the well-known Bowtie2 and BWA-MEM tools at a comparable accuracy (validated in variant calling pipeline).AvailabilityWhisper is available for free from https://github.com/refresh-bio/Whisper or http://sun.aei.polsl.pl/REFRESH/Whisper/[email protected] informationSupplementary data are available at publisher Web site.

Download Full-text