GtTR: Bayesian estimation of absolute tandem repeat copy number using sequence capture and high throughput sequencing

Mapping Intimacies ◽

10.1101/246108 ◽

2018 ◽

Cited By ~ 1

Author(s):

Devika Ganesamoorthy ◽

Minh Duc Cao ◽

Tania Duarte ◽

Wenhan Chen ◽

Lachlan Coin

Keyword(s):

High Throughput ◽

Tandem Repeat ◽

Copy Number ◽

Tandem Repeats ◽

High Throughput Sequencing ◽

Sequence Data ◽

Complex Diseases ◽

Sequencing Analysis ◽

Reference Dataset ◽

Long Read

ABSTRACTBackgroundTandem repeats comprise significant proportion of the human genome including coding and regulatory regions. They are highly prone to repeat number variation and nucleotide mutation due to their repetitive and unstable nature, making them a major source of genomic variation between individuals. Despite recent advances in high throughput sequencing, analysis of tandem repeats in the context of complex diseases is still hindered by technical limitations.MethodsWe report a novel targeted sequencing approach, which allows simultaneous analysis of hundreds of repeats. We developed a Bayesian algorithm, namely – GtTR - which combines information from a reference long-read dataset with a short read counting approach to genotype tandem repeats at population scale. PCR sizing analysis was used for validation.ResultsWe used a PacBio long-read sequenced sample to generate a reference tandem repeat genotype dataset with on average 13% absolute deviation from PCR sizing results. Using this reference dataset GtTR generated estimates of VNTR copy number with accuracy within 95% high posterior density (HPD) intervals of 68% and 83% for capture sequence data and 200X WGS data respectively, improving to 87% and 94% with use of a PCR reference. We show that the genotype resolution increases as a function of depth, such that the median 95% HPD interval lies within 25%, 14%, 12% and 8% of the its midpoint copy number value for 30X, 200X WGS, 395X and 800X capture sequence data respectively. We validated nine targets by PCR sizing analysis and genotype estimates from sequencing results correlated well with PCR results.ConclusionsThe novel genotyping approach described here presents a new cost-effective method to explore previously unrecognized class of repeat variation in GWAS studies of complex diseases at the population level. Further improvements in accuracy can be obtained by improving accuracy of the reference dataset.

Download Full-text

Human-specific tandem repeat expansion and differential gene expression during primate evolution

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1912175116 ◽

2019 ◽

Vol 116 (46) ◽

pp. 23243-23253 ◽

Cited By ~ 13

Author(s):

Arvis Sulovari ◽

Ruiyang Li ◽

Peter A. Audano ◽

David Porubsky ◽

Mitchell R. Vollger ◽

...

Keyword(s):

Tandem Repeat ◽

Tandem Repeats ◽

Sequence Data ◽

Variable Number ◽

Specific Expression ◽

Sequence Composition ◽

Transcription Profiles ◽

Long Read ◽

Repeat Expansions ◽

Human Specific

Short tandem repeats (STRs) and variable number tandem repeats (VNTRs) are important sources of natural and disease-causing variation, yet they have been problematic to resolve in reference genomes and genotype with short-read technology. We created a framework to model the evolution and instability of STRs and VNTRs in apes. We phased and assembled 3 ape genomes (chimpanzee, gorilla, and orangutan) using long-read and 10x Genomics linked-read sequence data for 21,442 human tandem repeats discovered in 6 haplotype-resolved assemblies of Yoruban, Chinese, and Puerto Rican origin. We define a set of 1,584 STRs/VNTRs expanded specifically in humans, including large tandem repeats affecting coding and noncoding portions of genes (e.g., MUC3A, CACNA1C). We show that short interspersed nuclear element–VNTR–Alu (SVA) retrotransposition is the main mechanism for distributing GC-rich human-specific tandem repeat expansions throughout the genome but with a bias against genes. In contrast, we observe that VNTRs not originating from retrotransposons have a propensity to cluster near genes, especially in the subtelomere. Using tissue-specific expression from human and chimpanzee brains, we identify genes where transcript isoform usage differs significantly, likely caused by cryptic splicing variation within VNTRs. Using single-cell expression from cerebral organoids, we observe a strong effect for genes associated with transcription profiles analogous to intermediate progenitor cells. Finally, we compare the sequence composition of some of the largest human-specific repeat expansions and identify 52 STRs/VNTRs with at least 40 uninterrupted pure tracts as candidates for genetically unstable regions associated with disease.

Download Full-text

GtTR: Bayesian estimation of absolute tandem repeat copy number using sequence capture and high throughput sequencing

BMC Bioinformatics ◽

10.1186/s12859-018-2282-3 ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 2

Author(s):

Devika Ganesamoorthy ◽

Minh Duc Cao ◽

Tania Duarte ◽

Wenhan Chen ◽

Lachlan Coin

Keyword(s):

Bayesian Estimation ◽

High Throughput ◽

Tandem Repeat ◽

Copy Number ◽

High Throughput Sequencing ◽

Repeat Copy Number ◽

Sequence Capture ◽

Repeat Copy

Download Full-text

GrigoraSNPs: Optimized HTS DNA Forensic SNP Analysis

10.1101/173716 ◽

2017 ◽

Cited By ~ 3

Author(s):

Darrell O. Ricke ◽

Anna Shcherbina ◽

Adam Michaleas ◽

Philip Fremont-Smith

Keyword(s):

High Throughput ◽

Tandem Repeats ◽

High Throughput Sequencing ◽

Dna Analysis ◽

Sequence Data ◽

Snp Analysis ◽

Analysis Pipeline ◽

Sequencing Technologies ◽

High Throughput Dna Sequencing

AbstractHigh throughput DNA sequencing technologies enable improved characterization of forensic DNA samples enabling greater insights into DNA contributor(s). Current DNA forensics techniques rely upon allele sizing of short tandem repeats by capillary electrophoresis. High throughput sequencing enables forensic sample characterizations for large numbers of single nucleotide polymorphism loci. The slowest computational component of the DNA forensics analysis pipeline is the characterization of raw sequence data. This paper optimizes the SNP calling module of the DNA analysis pipeline with runtime results that scale linearly with the number of HTS sequences (patent pending)[1]. GrigoraSNPs can analyze 100 million reads in less than 5 minutes using 3 threads on a 4.0 GHz Intel i7-6700K laptop CPU.

Download Full-text

Accurate Microbiome Sequencing with Synthetic Long Read Sequencing

10.1101/2020.10.02.324038 ◽

2020 ◽

Author(s):

Nico Chung ◽

Marc W. Van Goethem ◽

Melanie A. Preston ◽

Filip Lhota ◽

Leona Cerna ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequence Data ◽

Rrna Gene ◽

Microbial Composition ◽

Short Read ◽

Long Read ◽

Phylogenetic Resolution

AbstractThe microbiome plays a central role in biochemical cycling and nutrient turnover of most ecosystems. Because it can comprise myriad microbial prokaryotes, eukaryotes and viruses, microbiome characterization requires high-throughput sequencing to attain an accurate identification and quantification of such co-existing microbial populations. Short-read next-generation-sequencing (srNGS) revolutionized the study of microbiomes and remains the most widely used approach, yet read lengths spanning only a few of the nine hypervariable regions of the 16S rRNA gene limit phylogenetic resolution leading to misclassification or failure to classify in a high percentage of cases. Here we evaluate a synthetic long-read (SLR) NGS approach for full-length 16S rRNA gene sequencing that is high-throughput, highly accurate and low-cost. The sequencing approach is amenable to highly multiplexed sequencing and provides microbiome sequence data that surpasses existing short and long-read modalities in terms of accuracy and phylogenetic resolution. We validated this commercially-available technology, termed LoopSeq, by characterizing the microbial composition of well-established mock microbiome communities and diverse real-world samples. SLR sequencing revealed differences in aquatic community complexity associated with environmental gradients, resolved species-level community composition of uterine lavage from subjects with histories of misconception and accurately detected strain differences, multiple copies of the 16S rRNA in a single strain’s genome, as well as low-level contamination in soil cyanobacterial cultures. This approach has implications for widespread adoption of high-resolution, accurate long-read microbiome sequencing as it is generated on popular short read sequencing platforms without the need for additional infrastructure.

Download Full-text

Application of Copy Number Variation Sequencing in Genetic Analysis of Miscarriages in Early and Middle Pregnancy

Cytogenetic and Genome Research ◽

10.1159/000512801 ◽

2020 ◽

Vol 160 (11-12) ◽

pp. 634-642

Author(s):

Shiqiang Luo ◽

Xingyuan Chen ◽

Tizhen Yan ◽

Jiaolian Ya ◽

Zehui Xu ◽

...

Keyword(s):

Copy Number Variation ◽

High Throughput ◽

Copy Number ◽

High Throughput Sequencing ◽

Chromosomal Abnormalities ◽

Pregnancy Termination ◽

Mendelian Inheritance ◽

Copy Number Variations ◽

Abnormal Chromosome ◽

Number Variation

High-throughput sequencing based on copy number variation (CNV-seq) is commonly used to detect chromosomal abnormalities. This study identifies chromosomal abnormalities in aborted embryos/fetuses in early and middle pregnancy and explores the application value of CNV-seq in determining the causes of pregnancy termination. High-throughput sequencing was used to detect chromosome copy number variations (CNVs) in 116 aborted embryos in early and middle pregnancy. The detection data were compared with the Database of Genomic Variants (DGV), the Database of Chromosomal Imbalance and Phenotype in Humans using Ensemble Resources (DECIPHER), and the Online Mendelian Inheritance in Man (OMIM) database to determine the CNV type and the clinical significance. High-throughput sequencing results were successfully obtained in 109 out of 116 specimens, with a detection success rate of 93.97%. In brief, there were 64 cases with abnormal chromosome numbers and 23 cases with CNVs, in which 10 were pathogenic mutations and 13 were variants of uncertain significance. An abnormal chromosome number is the most important reason for embryo termination in early and middle pregnancy, followed by pathogenic chromosome CNVs. CNV-seq can quickly and accurately detect chromosome abnormalities and identify microdeletion and microduplication CNVs that cannot be detected by conventional chromosome analysis, which is convenient and efficient for genetic etiology diagnosis in miscarriage.

Download Full-text

Chromosome 22q11.21 and 11p15.4 microdeletions confirmed by high‐throughput sequencing analysis in one patient with asymmetric cry syndrome: Case report and review of the literature

Clinical Case Reports ◽

10.1002/ccr3.4072 ◽

2021 ◽

Author(s):

Yonghong Pang ◽

Yang Yu ◽

Xiaoyi Deng ◽

Qian Liu ◽

Junmei Yan ◽

...

Keyword(s):

Case Report ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Analysis ◽

Review Of The Literature

Download Full-text

High-throughput sequencing analysis of microbial community diversity in response to indica and japonica bar-transgenic rice paddy soils

PLoS ONE ◽

10.1371/journal.pone.0222191 ◽

2019 ◽

Vol 14 (9) ◽

pp. e0222191 ◽

Cited By ~ 1

Author(s):

Meidan He ◽

Jiachao Zhang ◽

Linbo Shen ◽

Lixin Xu ◽

Wenjie Luo ◽

...

Keyword(s):

Microbial Community ◽

Transgenic Rice ◽

High Throughput ◽

High Throughput Sequencing ◽

Rice Paddy ◽

Paddy Soils ◽

Community Diversity ◽

Sequencing Analysis ◽

Microbial Community Diversity ◽

Rice Paddy Soils

Download Full-text

High-Throughput Sequencing Analysis of the Actinobacterial Spatial Diversity in Moonmilk Deposits

Antibiotics ◽

10.3390/antibiotics7020027 ◽

2018 ◽

Vol 7 (2) ◽

pp. 27 ◽

Cited By ~ 8

Author(s):

Marta Maciejewska ◽

Magdalena Całusińska ◽

Luc Cornet ◽

Delphine Adam ◽

Igor Pessi ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Spatial Diversity ◽

Sequencing Analysis

Download Full-text

Diversity of microbial communities in hot springs of Sri Lanka as revealed by 16S rRNA gene high-throughput sequencing analysis

Gene ◽

10.1016/j.gene.2021.146103 ◽

2021 ◽

pp. 146103

Author(s):

Dilini Sadeepa ◽

Kosala Sirisena ◽

Pathmalal M. Manage

Keyword(s):

Sri Lanka ◽

16S Rrna ◽

16S Rrna Gene ◽

Microbial Communities ◽

High Throughput ◽

High Throughput Sequencing ◽

Hot Springs ◽

Rrna Gene ◽

Sequencing Analysis

Download Full-text

High-throughput sequencing analysis of Euphorbia fischeriana Steud provides insights into the molecular mechanism of pharmaceutical ingredient biosynthesis

3 Biotech ◽

10.1007/s13205-018-1475-9 ◽

2018 ◽

Vol 8 (10) ◽

Cited By ~ 1

Author(s):

Ming Jiang ◽

Hui Li

Keyword(s):

Molecular Mechanism ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Analysis ◽

Euphorbia Fischeriana

Download Full-text