scholarly journals Finding and extending ancient simple sequence repeat-derived regions in the human genome

2019 ◽  
Author(s):  
Jonathan A. Shortt ◽  
Robert P. Ruggiero ◽  
Corey Cox ◽  
Aaron C. Wacholder ◽  
David D. Pollock

AbstractBackgroundPreviously, 3% of the human genome has been annotated as simple sequence repeats (SSRs), similar to the proportion annotated as protein coding. The origin of much of the genome is not well annotated, however, and some of the unidentified regions are likely to be ancient SSR-derived regions not identified by current methods. The identification of these regions is complicated because SSRs appear to evolve through complex cycles of expansion and contraction, often interrupted by mutations that alter both the repeated motif and mutation rate. We applied an empirical, kmer-based, approach to identify genome regions that are likely derived from SSRs.ResultsThe sequences flanking annotated SSRs are enriched for similar sequences and for SSRs with similar motifs, suggesting that the evolutionary remains of SSR activity abound in regions near obvious SSRs. Using our previously described P-clouds approach, we identified ‘SSR-clouds’, groups of similar kmers (or ‘oligos’) that are enriched near a training set of unbroken SSR loci, and then used the SSR-clouds to detect likely SSR-derived regions throughout the genome.ConclusionsOur analysis indicates that the amount of likely SSR-derived sequence in the human genome is 6.77%, over twice as much as previous estimates, including millions of newly identified ancient SSR-derived loci. SSR-clouds identified poly-A sequences adjacent to transposable element termini in over 74% of the oldest class ofAlu(roughly,AluJ), validating the sensitivity of the approach. Poly-A’s annotated by SSR-clouds also had a length distribution that was more consistent with their poly-A origins, with mean about 35 bp even in olderAlus. This work demonstrate that the high sensitivity provided by SSR-Clouds improves the detection of SSR-derived regions and will enable deeper analysis of how decaying repeats contribute to genome structure.

Author(s):  
Júlia Halász ◽  
Noémi Makovics-Zsohár ◽  
Ferenc Szőke ◽  
Sezai Ercisli ◽  
Attila Hegedűs

AbstractPolyploid Prunus spinosa (2n = 4 ×) and P. domestica subsp. insititia (2n = 6 ×) represent enormous genetic potential in Central Europe, which can be exploited in breeding programs. In Hungary, 16 cultivar candidates and a recognized cultivar ‘Zempléni’ were selected from wild-growing populations including ten P. spinosa, four P. domestica subsp. insititia and three P. spinosa × P. domestica hybrids (2n = 5 ×) were also created. Genotyping in eleven simple sequence repeat (SSR) loci and the multiallelic S-locus was used to characterize genetic variability and achieve a reliable identification of tested accessions. Nine SSR loci proved to be polymorphic and eight of those were highly informative (PIC values ˃ 0.7). A total of 129 SSR alleles were identified, which means 14.3 average allele number per locus and all accessions but two clones could be discriminated based on unique SSR fingerprints. A total of 23 S-RNase alleles were identified and the complete and partial S-genotype was determined for 10 and 7 accessions, respectively. The DNA sequence was determined for a total of 17 fragments representing 11 S-RNase alleles. ‘Zempléni’ was confirmed to be self-compatible carrying at least one non-functional S-RNase allele (SJ). Our results indicate that the S-allele pools of wild-growing P. spinosa and P. domestica subsp. insititia are overlapping in Hungary. Phylogenetic and principal component analyses confirmed the high level of diversity and genetic differentiation present within the analysed accessions and indicated putative ancestor–descendant relationships. Our data confirm that S-locus genotyping is suitable for diversity studies in polyploid Prunus species but non-related accessions sharing common S-alleles may distort phylogenetic inferences.


2011 ◽  
Vol 136 (2) ◽  
pp. 116-128 ◽  
Author(s):  
Xinwang Wang ◽  
Phillip A. Wadl ◽  
Cecil Pounders ◽  
Robert N. Trigiano ◽  
Raul I. Cabrera ◽  
...  

Genetic diversity was estimated for 51 Lagerstroemia indica L. cultivars, five Lagerstroemia fauriei Koehne cultivars, and 37 interspecific hybrids using 78 simple sequence repeat (SSR) markers. SSR loci were highly variable among the cultivars, detecting an average of 6.6 alleles (amplicons) per locus. Each locus detected 13.6 genotypes on average. Cluster analysis identified three main groups that consisted of individual cultivars from L. indica, L. fauriei, and their interspecific hybrids. However, only 18.1% of the overall variation was the result of differences between these groups, which may be attributable to pedigree-based breeding strategies that use current cultivars as parents for future selections. Clustering within each group generally reflected breeding pedigrees but was not supported by bootstrap replicates. Low statistical support was likely the result of low genetic diversity estimates, which indicated that only 25.5% of the total allele size variation was attributable to differences between the species L. indica and L. fauriei. Most allele size variation, or 74.5%, was common to L. indica and L. fauriei. Thus, introgression of other Lagestroemia species such as Lagestroemia limii Merr. (L. chekiangensis Cheng), Lagestroemia speciosa (L.) Pers., and Lagestroemia subcostata Koehne may significantly expand crapemyrtle breeding programs. This study verified relationships between existing cultivars and identified potentially untapped sources of germplasm.


Plants ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 965 ◽  
Author(s):  
Xian-Lin Guo ◽  
Hong-Yi Zheng ◽  
Megan Price ◽  
Song-Dong Zhou ◽  
Xing-Jin He

Chamaesium H. Wolff (Apiaceae, Apioideae) is a small genus mainly distributed in the Hengduan Mountains and the Himalayas. Ten species of Chamaesium have been described and nine species are distributed in China. Recent advances in molecular phylogenetics have revolutionized our understanding of Chinese Chamaesium taxonomy and evolution. However, an accurate phylogenetic relationship in Chamaesium based on the second-generation sequencing technology remains poorly understood. Here, we newly assembled nine plastid genomes from the nine Chinese Chamaesium species and combined these genomes with eight other species from five genera to perform a phylogenic analysis by maximum likelihood (ML) using the complete plastid genome and analyzed genome structure, GC content, species pairwise Ka/Ks ratios and the simple sequence repeat (SSR) component. We found that the nine species’ plastid genomes ranged from 152,703 bp (C. thalictrifolium) to 155,712 bp (C. mallaeanum), and contained 133 genes, 34 SSR types and 585 SSR loci. We also found 20,953–21,115 codons from 53 coding sequence (CDS) regions, 38.4–38.7% GC content of the total genome and low Ka/Ks (0.27–0.43) ratios of 53 aligned CDS. These results will facilitate our further understanding of the evolution of the genus Chamaesium.


2008 ◽  
Vol 2008 ◽  
pp. 1-9 ◽  
Author(s):  
Luciano Carlos da Maia ◽  
Dario Abel Palmieri ◽  
Velci Queiroz de Souza ◽  
Mauricio Marini Kopp ◽  
Fernando Irajá Félix de Carvalho ◽  
...  

Microsatellites or SSRs (simple sequence repeats) are ubiquitous short tandem duplications occurring in eukaryotic organisms. These sequences are among the best marker technologies applied in plant genetics and breeding. The abundant genomic, BAC, and EST sequences available in databases allow the survey regarding presence and location of SSR loci. Additional information concerning primer sequences is also the target of plant geneticists and breeders. In this paper, we describe a utility that integrates SSR searches, frequency of occurrence of motifs and arrangements, primer design, and PCR simulation against other databases. This simulation allows the performance of global alignments and identity and homology searches between different amplified sequences, that is, amplicons. In order to validate the tool functions, SSR discovery searches were performed in a database containing 28 469 nonredundant rice cDNA sequences.


PLoS ONE ◽  
2015 ◽  
Vol 10 (5) ◽  
pp. e0127812 ◽  
Author(s):  
Jing Xiao ◽  
Jin Zhao ◽  
Mengjun Liu ◽  
Ping Liu ◽  
Li Dai ◽  
...  

Genome ◽  
1996 ◽  
Vol 39 (4) ◽  
pp. 628-633 ◽  
Author(s):  
J. E. Bowers ◽  
G. S. Dangl ◽  
R. Vignani ◽  
C. P. Meredith

Four new simple sequence repeat (SSR) loci (designated VVMD5, VVMD6, VVMD7, and VVMD8) were characterized in grape and analyzed by silver staining in 77 cultivars of Vitis vinifera. Amplification products ranged in size from 141 to 263 base pairs (bp). The number of alleles observed per locus ranged from 5 to 11 and the number of diploid genotypes per locus ranged from 13 to 27. At each locus at least 75% of the cultivars were heterozygous. Alleles differing in length by only 1 bp could be distinguished by silver staining, and size estimates were within 1 or 2 bp, depending on the locus, of those obtained by fluorescence detection at previously reported loci. Allele frequencies were generally similar in wine grapes and table grapes, with some exceptions. Some alleles were found only in one of the two groups of cultivars. All 77 cultivars were distinguished by the four loci with the exception of four wine grapes considered to be somatic variants of the same cultivar, 'Pinot noir', 'Pinot gris', 'Pinot blanc', and 'Meunier'; two table grapes that are known to be synonymous, 'Keshmesh' and 'Thompson Seedless'; and three table grapes, 'Dattier', 'Rhazaki Arhanon', and 'Markandi', the first two of which have been suggested to be synonymous. Although the high polymorphism at grape SSR loci suggests that very few loci would theoretically be needed to separate all cultivars, the economic and legal significance of grape variety identification requires the increased resolution that can be provided by a larger number of loci. The ease with which SSR markers and data can be shared internationally should encourage their broad use, which will in turn increase the power of these markers for both identification and genetic analysis of grape. Key words : grape, Vitis, microsatellite, simple sequence repeat, DNA typing, identification.


Genome ◽  
2000 ◽  
Vol 43 (2) ◽  
pp. 293-297 ◽  
Author(s):  
Muhammad H Rahman ◽  
S Dayanandan ◽  
Om P Rajora

Markers for eight new microsatellite DNA or simple sequence repeat (SSR) loci were developed and characterized in trembling aspen (Populus tremuloides) from a partial genomic library. Informativeness of these microsatellite DNA markers was examined by determining polymorphisms in 38 P. tremuloides individuals. Inheritance of selected markers was tested in progenies of controlled crosses. Six characterized SSR loci were of dinucleotide repeats (two perfect and four imperfect), and one each of trinucleotide and tetranucleotide repeats. The monomorphic SSR locus (PTR15) was of a compound imperfect dinucleotide repeat. The primers of one highly polymorphic SSR locus (PTR7) amplified two loci, and alleles could not be assigned to a specific locus. At the other six polymorphic loci, 25 alleles were detected in 38 P. tremuloides individuals; the number of alleles ranged from 2 to 7, with an average of 4.2 alleles per locus, and the observed heterozygosity ranged from 0.05 to 0.61, with an average of 0.36 per locus. The two perfect dinucleotide and one trinucleotide microsatellite DNA loci were the most informative. Microsatellite DNA variants of four SSR loci characterized previously followed a single-locus Mendelian inheritance pattern, whereas those of PTR7 from the present study showed a two-locus Mendelian inheritance pattern in controlled crosses. The microsatellite DNA markers developed and reported here could be used for assisting various genetic, breeding, biotechnology, genome mapping, conservation, and sustainable forest management programs in poplars. Key words: poplar, microsatellites, genetic mapping, simple sequence repeat (SSR) markers, DNA fingerprinting.


2005 ◽  
Vol 71 (8) ◽  
pp. 4888-4892 ◽  
Author(s):  
Hong Lin ◽  
Edwin L. Civerolo ◽  
Rong Hu ◽  
Samuel Barros ◽  
Marta Francis ◽  
...  

ABSTRACT A genome-wide search was performed to identify simple sequence repeat (SSR) loci among the available sequence databases from four strains of Xylella fastidiosa (strains causing Pierce's disease, citrus variegated chlorosis, almond leaf scorch, and oleander leaf scorch). Thirty-four SSR loci were selected for SSR primer design and were validated in PCR experiments. These multilocus SSR primers, distributed across the X. fastidiosa genome, clearly differentiated and clustered X. fastidiosa strains collected from grape, almond, citrus, and oleander. They are well suited for differentiating strains and studying X. fastidiosa epidemiology and population genetics.


Sign in / Sign up

Export Citation Format

Share Document