scholarly journals Population-Scale Sequencing Data Enables Precise Estimates of Y-STR Mutation Rates

2016 ◽  
Author(s):  
Thomas Willems ◽  
Melissa Gymrek ◽  
G. David Poznik ◽  
Chris Tyler-Smith ◽  
Yaniv Erlich ◽  
...  

AbstractShort Tandem Repeats (STRs) are mutation-prone loci that span nearly 1% of the human genome. Previous studies have estimated the mutation rates of highly polymorphic STRs using capillary electrophoresis and pedigree-based designs. While this work has provided insights into the mutational dynamics of highly mutable STRs, the mutation rates of most others remain unknown. Here, we harnessed whole-genome sequencing data to estimate the mutation rates of Y-chromosome STRs (Y-STRs) with 2-6 base pair repeat units that are accessible to Illumina sequencing. We genotyped 4,500 Y-STRs using data from the 1000 Genomes Project and the Simons Genome Diversity Project. Next, we developed MUTEA, an algorithm that infers STR mutation rates from population-scale data using a high-resolution SNP-based phylogeny. After extensive intrinsic and extrinsic validations, we harnessed MUTEA to derive mutation rate estimates for 702 polymorphic STRs by tracing each locus over 222,000 meioses, resulting in the largest collection of Y-STR mutation rates to date. Using our estimates, we identified determinants of STR mutation rates and built a model to predict rates for STRs across the genome. These predictions indicate that the load of de novo STR mutations is at least 75 mutations per generation, rivaling the load of all other known variant types. Finally, we identified Y-STRs with potential applications in forensics and genetic genealogy, assessed the ability to differentiate between the Y-chromosomes of father-son pairs, and imputed Y-STR genotypes.

2016 ◽  
Author(s):  
Thomas Willems ◽  
Dina Zielinski ◽  
Assaf Gordon ◽  
Melissa Gymrek ◽  
Yaniv Erlich

AbstractShort tandem repeats (STRs) are highly variable elements that play a pivotal role in multiple genetic diseases, population genetics applications, and forensic casework. However, STRs have proven problematic to genotype from high-throughput sequencing data. Here, we describe HipSTR, a novel haplotype-based method for robustly genotyping, haplotyping, and phasing STRs from whole genome sequencing data and report a genome-wide analysis and validation of de novo STR mutations.


Genes ◽  
2018 ◽  
Vol 9 (10) ◽  
pp. 486 ◽  
Author(s):  
Adam Ameur ◽  
Huiwen Che ◽  
Marcel Martin ◽  
Ignas Bunikis ◽  
Johan Dahlberg ◽  
...  

The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data.


2021 ◽  
Author(s):  
Cody J Steely ◽  
Scott Watkins ◽  
Lisa Baird ◽  
Lynn Jorde

Short tandem repeats (STRs) are tandemly repeated sequences of 1-6 bp motifs. STRs compose approximately 3% of the genome, and mutations at STR loci have been linked to dozens of human diseases including amyotrophic lateral sclerosis, Friedreich ataxia, Huntington disease, and fragile X syndrome. Improving our understanding of these mutations would increase our knowledge of the mutational dynamics of the genome and may uncover additional loci that contribute to disease. Here, to estimate the genome-wide pattern of mutations at STR loci, we analyzed blood-derived whole-genome sequencing data for 544 individuals from 29 three-generation CEPH pedigrees. These pedigrees contain both sets of grandparents, the parents, and an average of 9 grandchildren per family. Using HipSTR we identified de novo STR mutations in the 2nd generation of these pedigrees. Analyzing ~1.6 million STR loci, we estimate the empircal de novo STR mutation rate to be 5.24*10-5 mutations per locus per generation. We find that perfect repeats mutate ~2x more often than imperfect repeats. De novo STRs are significantly enriched in Alu elements (p < 2.2e-16). Approximately 30% of STR mutations occur within Alu elements, which compose only ~11% of the genome, and ~10% are found in LINE-1 insertions, which compose ~17% of the genome. Phasing these de novo mutations to the parent of origin shows that parental transmission biases vary among families. We estimate the average number of de novo genome-wide STR mutations per individual to be ~85, which is similar to the average number of observed de novo single nucleotide variants.


2018 ◽  
Author(s):  
Adam Ameur ◽  
Huiwen Che ◽  
Marcel Martin ◽  
Ignas Bunikis ◽  
Johan Dahlberg ◽  
...  

AbstractWe have performed de novo assembly of two Swedish genomes using long-read sequencing and optical mapping, resulting in total assembly sizes of nearly 3 Gb and hybrid scaffold N50 values of over 45 Mb. A further analysis revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have elevated GC-content and are primarily located in centromeric or telomeric regions. A BLAST search showed that 31% of the NS are different from any sequences deposited in nucleotide databases. The remaining NS correspond to human (62%) or primate (6%) nucleotide entries, while 1% of hits show the highest similarity to other species, including mouse and a few different classes of parasitic worms. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are missing from GRCh38 also at chromosomes 14, 17 and 21. Inclusion of these novel sequences into the GRCh38 reference radically improves the alignment and variant calling of whole-genome sequencing data at several genomic loci. Through a re-analysis of 200 samples from a Swedish population-scale sequencing project, we obtained over 75,000 putative novel SNVs per individual when using a custom version of GRCh38 extended with 17.3 Mb of NS. In addition, about 10,000 false positive SNV calls per individual were removed from the GRCh38 autosomes and sex chromosomes in the re-analysis, with some of them located in protein coding regions.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Lilian J. Gehrke ◽  
Maulik Upadhyay ◽  
Kristin Heidrich ◽  
Elisabeth Kunz ◽  
Daniela Klaus-Halla ◽  
...  

Abstract Polledness in cattle is an autosomal dominant trait. Previous studies have revealed allelic heterogeneity at the polled locus and four different variants were identified, all in intergenic regions. In this study, we report a case of polled bull (FV-Polled1) born to horned parents, indicating a de novo origin of this polled condition. Using 50K genotyping and whole genome sequencing data, we identified on chromosome 2 an 11-bp deletion (AC_000159.1:g.52364063_52364073del; Del11) in the second exon of ZEB2 gene as the causal mutation for this de novo polled condition. We predicted that the deletion would shorten the protein product of ZEB2 by almost 91%. Moreover, we showed that all animals carrying Del11 mutation displayed symptoms similar to Mowat-Wilson syndrome (MWS) in humans, which is also associated with genetic variations in ZEB2. The symptoms in cattle include delayed maturity, small body stature and abnormal shape of skull. This is the first report of a de novo dominant mutation affecting only ZEB2 and associated with a genetic absence of horns. Therefore our results demonstrate undoubtedly that ZEB2 plays an important role in the process of horn ontogenesis as well as in the regulation of overall development and growth of animals.


2016 ◽  
Vol 98 (5) ◽  
pp. 919-933 ◽  
Author(s):  
Thomas Willems ◽  
Melissa Gymrek ◽  
G. David Poznik ◽  
Chris Tyler-Smith ◽  
Yaniv Erlich

BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Clémentine Escouflaire ◽  
Emmanuelle Rebours ◽  
Mathieu Charles ◽  
Sébastien Orellana ◽  
Margarita Cano ◽  
...  

Abstract Background In mammals, hypohidrotic ectodermal dysplasia (HED) is a genetic disorder that is characterized by sparse hair, tooth abnormalities, and defects in cutaneous glands. Only four genes, EDA, EDAR, EDARADD and WNT10A account for more than 90% of HED cases, and EDA, on chromosome X, is involved in 50% of the cases. In this study, we explored an isolated case of a female Holstein calf with symptoms similar to HED. Results Clinical examination confirmed the diagnosis. The affected female showed homogeneous hypotrichosis and oligodontia as previously observed in bovine EDAR homozygous and EDA hemizygous mutants. Under light microscopy, the hair follicles were thinner and located higher in the dermis of the frontal skin in the affected animal than in the control. Moreover, the affected animal showed a five-fold increase in the number of hair follicles and a four-fold decrease in the diameter of the pilary canals. Pedigree analysis revealed that the coefficient of inbreeding of the affected calf (4.58%) was not higher than the average population inbreeding coefficient (4.59%). This animal had ten ancestors in its paternal and maternal lineages. By estimating the number of affected cases that would be expected if any of these common ancestors carried a recessive mutation, we concluded that, if they existed, other cases of HED should have been reported in France, which is not the case. Therefore, we assumed that the causal mutation was dominant and de novo. By analyzing whole-genome sequencing data, we identified a large chromosomal inversion with breakpoints located in the first introns of the EDA and XIST genes. Genotyping by PCR-electrophoresis the case and its parents allowed us to demonstrate the de novo origin of this inversion. Finally, using various sources of information we present a body of evidence that supports the hypothesis that this mutation is responsible for a skewed inactivation of X, and that only the normal X can be inactivated. Conclusions In this article, we report a unique case of X-linked HED affected Holstein female calf with an assumed full inactivation of the normal X-chromosome, thus leading to a severe phenotype similar to that of hemizygous males.


2017 ◽  
Author(s):  
Adriana Munoz ◽  
Boris Yamrom ◽  
Yoon-ha Lee ◽  
Peter Andrews ◽  
Steven Marks ◽  
...  

AbstractCopy number profiling and whole-exome sequencing has allowed us to make remarkable progress in our understanding of the genetics of autism over the past ten years, but there are major aspects of the genetics that are unresolved. Through whole-genome sequencing, additional types of genetic variants can be observed. These variants are abundant and to know which are functional is challenging. We have analyzed whole-genome sequencing data from 510 of the Simons Simplex Collections quad families and focused our attention on intronic variants. Within the introns of 546 high-quality autism target genes, we identified 63 de novo indels in the affected and only 37 in the unaffected siblings. The difference of 26 events is significantly larger than expected (p-val = 0.01) and using reasonable extrapolation shows that de novo intronic indels can contribute to at least 10% of simplex autism. The significance increases if we restrict to the half of the autism targets that are intolerant to damaging variants in the normal human population, which half we expect to be even more enriched for autism genes. For these 273 targets we observe 43 and 20 events in affected and unaffected siblings, respectively (p-value of 0.005). There was no significant signal in the number of de novo intronic indels in any of the control sets of genes analyzed. We see no signal from de novo substitutions in the introns of target genes.


Sign in / Sign up

Export Citation Format

Share Document