Classifying Y-Short Tandem Repeat Data: A Decision Tree Approach

2013 ◽  
Vol 06 (11) ◽  
Author(s):  
Ali Seman
2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Ali Seman ◽  
Zainab Abu Bakar ◽  
Mohamed Nizam Isa

The Y-chromosome short tandem repeat (Y-STR) data are mainly collected for a performance benchmarking result in clustering methods. There are six Y-STR dataset items, divided into two categories: Y-STR surname and Y-haplogroup data presented here. The Y-STR data are categorical, unique, and different from the other categorical data. They are composed of a lot of similar and almost similar objects. This characteristic of the Y-STR data has caused certain problems of the existing clustering algorithms in clustering them.


2020 ◽  
Author(s):  
Indhu-Shree Rajan-Babu ◽  
Junran Peng ◽  
Readman Chiu ◽  
Arezoo Mohajeri ◽  
Egor Dolzhenko ◽  
...  

ABSTRACTShort tandem repeat (STR) expansions cause several neurological and neuromuscular disorders. Screening for STR expansions in genome-wide (exome and genome) sequencing data can enable diagnosis, optimal clinical management/treatment, and accurate genetic counselling of patients with repeat expansion disorders. We assessed the performance of lobSTR, HipSTR, RepeatSeq, ExpansionHunter, TREDPARSE, GangSTR, STRetch, and exSTRa – bioinformatics tools that have been developed to detect and/or genotype STR expansions – on experimental and simulated genome sequence data with known STR expansions aligned using two different aligners, Isaac and BWA. We then adjusted the parameter settings to optimize the sensitivity and specificity of the STR tools and fed the optimized results into a machine-learning decision tree classifier to determine the best combination of tools to detect full mutation expansions with high diagnostic sensitivity and specificity. The decision tree model supported using ExpansionHunter’s full mutation calls with those of either STRetch or exSTRa for detection of full mutations with precision, recall, and F1-score of 90%, 100%, and 95%, respectively.We used this pipeline to screen the BWA-aligned exome or genome sequence data of 306 families of children with suspected genetic disorders for pathogenic expansions of known disease STR loci. We identified 27 samples, 17 with an apparent full-mutation expansion of the AR, ATXN1, ATXN2, ATXN8, DMPK, FXN, HTT, or TBP locus, nine with an intermediate or premutation allele in the FMR1 locus, and one with a borderline allele in the ATXN2 locus. We report the concordance between our bioinformatics findings and the clinical PCR results in a subset of these samples. Implementation of our bioinformatics workflow can improve the detection of disease STR expansions in exome and genome sequence diagnostics and enhance clinical outcomes for patients with repeat expansion disorders.


Sign in / Sign up

Export Citation Format

Share Document