scholarly journals Identification of meiotic recombination through gamete genome reconstruction using whole genome linked-reads

2018 ◽  
Author(s):  
Peng Xu ◽  
Zechen Chong ◽  

AbstractMeiotic recombination (MR), which transmits exchanged genetic materials between homologous chromosomes to offspring, plays a crucial role in shaping genomic diversity in eukaryotic organisms. In humans, thousands of meiotic recombination hotspots have been mapped by population genetics approaches. However, direct identification of MR events for individuals is still challenging due to the difficulty in resolving the haplotypes of homologous chromosomes and reconstructing the gamete genome. Whole genome linked-read sequencing (lrWGS) can generate haplotype sequences of mega-base pairs (N50 ~2.5Mb) after computational phasing. However, the haplotype information is still isolated in a large number of fragmented genomic regions and limited by switch errors, impeding its further application in the chromosome-scale analysis. In this study, we developed a tool MRLR (Meiotic Recombination identification by Linked-Read sequencing) for the analysis of individual MR events. By leveraging trio pedigree information with lrWGS haplotypes, our pipeline is sufficient to reconstruct the whole human gamete genome with 99.8% haplotyping accuracy. By analyzing the haplotype exchange between homologous chromosomes, MRLR identified 462 high-resolution MR events in 6 human trio samples from the Genome In A Bottle (GIAB) and the Human Genome Structural Variation Consortium (HGSVC). In three datasets of the HGSVC, our results recapitulated 149 (92%) previously identified high-confident MR events and discovered 85 novel events. About half (40) of the new events are supported by single-cell template strand sequencing (Strand-seq) results. We found that 332 (71.9%) MR events co-localize with recombination hotspots (>10 cM/Mb) in human populations, and MR breakpoint regions are enriched in PRDM9 and DMC1 binding sites. In addition, 48% (221) breakpoint regions were detected inside a gene, indicating these MRs can directly affect the haplotype diversity of genic regions. Taken together, our approach provides new opportunities in the haplotype-based genomic analysis of individual meiotic recombination. The MRLR software is implemented in Perl and is freely available at https://github.com/ChongLab/MRLR.

2003 ◽  
Vol 23 (2) ◽  
pp. 733-743 ◽  
Author(s):  
Jeremy M. Stark ◽  
Maria Jasin

ABSTRACT Loss of heterozygosity (LOH) is a common genetic alteration in tumors and often extends several megabases to encompass multiple genetic loci or even whole chromosome arms. Based on marker and karyotype analysis of tumor samples, a significant fraction of LOH events appears to arise from mitotic recombination between homologous chromosomes, reminiscent of recombination during meiosis. As DNA double-strand breaks (DSBs) initiate meiotic recombination, a potential mechanism leading to LOH in mitotically dividing cells is DSB repair involving homologous chromosomes. We therefore sought to characterize the extent of LOH arising from DSB-induced recombination between homologous chromosomes in mammalian cells. To this end, a recombination reporter was introduced into a mouse embryonic stem cell line that has nonisogenic maternal and paternal chromosomes, as is the case in human populations, and then a DSB was introduced into one of the chromosomes. Recombinants involving alleles on homologous chromosomes were readily obtained at a frequency of 4.6 × 10−5; however, this frequency was substantially lower than that of DSB repair by nonhomologous end joining or the inferred frequency of homologous repair involving sister chromatids. Strikingly, the majority of recombinants had LOH restricted to the site of the DSB, with a minor class of recombinants having LOH that extended to markers 6 kb from the DSB. Furthermore, we found no evidence of LOH extending to markers 1 centimorgan or more from the DSB. In addition, crossing over, which can lead to LOH of a whole chromosome arm, was not observed, implying that there are key differences between mitotic and meiotic recombination mechanisms. These results indicate that extensive LOH is normally suppressed during DSB-induced allelic recombination in dividing mammalian cells.


2020 ◽  
Author(s):  
Petra Bulánková ◽  
Mirna Sekulić ◽  
Denis Jallet ◽  
Charlotte Nef ◽  
Tom Delmont ◽  
...  

AbstractDiatoms, an evolutionarily successful group of microalgae, display high levels of intraspecific variability in natural populations. However, the process generating such diversity is unknown. Here we estimated the variability within a natural diatom population and subsequently mapped the genomic changes arising within cultures clonally propagated from single diatom cells. We demonstrate that genome rearrangements and mitotic recombination between homologous chromosomes underlie clonal variability, resulting in haplotype diversity accompanied by the appearance of novel protein variants and loss of heterozygosity resulting in the fixation of alleles. The frequency of interhomolog mitotic recombination exceeds 4 out of 100 cell divisions and increases under environmental stress. We propose that this plastic response in the interhomolog mitotic recombination rate increases the evolutionary potential of diatoms, contributing to their ecological success.One Sentence SummaryRecombination between homologous chromosomes in diatom vegetative cells leads to extensive genomic diversity in clonal populations.


2008 ◽  
Vol 190 (20) ◽  
pp. 6881-6893 ◽  
Author(s):  
David A. Rasko ◽  
M. J. Rosovitz ◽  
Garry S. A. Myers ◽  
Emmanuel F. Mongodin ◽  
W. Florian Fricke ◽  
...  

ABSTRACT Whole-genome sequencing has been skewed toward bacterial pathogens as a consequence of the prioritization of medical and veterinary diseases. However, it is becoming clear that in order to accurately measure genetic variation within and between pathogenic groups, multiple isolates, as well as commensal species, must be sequenced. This study examined the pangenomic content of Escherichia coli. Six distinct E. coli pathovars can be distinguished using molecular or phenotypic markers, but only two of the six pathovars have been subjected to any genome sequencing previously. Thus, this report provides a seminal description of the genomic contents and unique features of three unsequenced pathovars, enterotoxigenic E. coli, enteropathogenic E. coli, and enteroaggregative E. coli. We also determined the first genome sequence of a human commensal E. coli isolate, E. coli HS, which will undoubtedly provide a new baseline from which workers can examine the evolution of pathogenic E. coli. Comparison of 17 E. coli genomes, 8 of which are new, resulted in identification of ∼2,200 genes conserved in all isolates. We were also able to identify genes that were isolate and pathovar specific. Fewer pathovar-specific genes were identified than anticipated, suggesting that each isolate may have independently developed virulence capabilities. Pangenome calculations indicate that E. coli genomic diversity represents an open pangenome model containing a reservoir of more than 13,000 genes, many of which may be uncharacterized but important virulence factors. This comparative study of the species E. coli, while descriptive, should provide the basis for future functional work on this important group of pathogens.


2018 ◽  
Author(s):  
Weiling Li ◽  
Lin Lin ◽  
Raunaq Malhotra ◽  
Lei Yang ◽  
Raj Acharya ◽  
...  

AbstractHuman Endogenous Retrovirus type K (HERV-K) is the only HERV known to be insertionally polymorphic. It is possible that HERV-Ks contribute to human disease because people differ in both number and genomic location of these retroviruses. Indeed viral transcripts, proteins, and antibody against HERV-K are detected in cancers, auto-immune, and neurodegenerative diseases. However, attempts to link a polymorphic HERV-K with any disease have been frustrated in part because population frequency of HERV-K provirus at each site is lacking and it is challenging to identify closely related elements such as HERV-K from short read sequence data. We present an integrated and computationally robust approach that uses whole genome short read data to determine the occupation status at all sites reported to contain a HERV-K provirus. Our method estimates the proportion of fixed length genomic sequence (k-mers) from whole genome sequence data matching a reference set ofk-mersunique to each HERV-K loci and applies mixture model-based clustering to account for low depth sequence data. Our analysis of 1000 Genomes Project Data (KGP) reveals numerous differences among the five KGP super-populations in the frequency of individual and co-occurring HERV-K proviruses; we provide a visualization tool to easily depict the prevalence of any combination of HERV-K among KGP populations. Further, the genome burden of polymorphic HERV-K is variable in humans, with East Asian (EAS) individuals having the fewest integration sites. Our study identifies population-specific sequence variation for several HERV-K proviruses. We expect these resources will advance research on HERV-K contributions to human diseases.Author summaryHuman Endogenous Retrovirus type K (HERV-K) is the youngest of retrovirus families in the human genome and is the only group that is polymorphic; a HERV-K can be present in one individual but absent from others. HERV-Ks could contribute to disease risk but establishing a link of a polymorphic HERV-K to a specific disease has been difficult. We develop an easy to use method that reveals the considerable variation existing among global populations in the frequency of individual and co-occurring polymorphic HERV-K, and in the total number of HERV-K that any individual has in their genome. Our study provides a global reference set of HERV-K genomic diversity and tools needed to determine the genomic landscape of HERV-K in any patient population.


2020 ◽  
Vol 8 (10) ◽  
pp. 1561
Author(s):  
Roshan Kumar ◽  
Karen Register ◽  
Jane Christopher-Hennings ◽  
Paolo Moroni ◽  
Gloria Gioia ◽  
...  

Among more than twenty species belonging to the class Mollecutes, Mycoplasma bovis is the most common cause of bovine mycoplasmosis in North America and Europe. Bovine mycoplasmosis causes significant economic loss in the cattle industry. The number of M. bovis positive herds recently has increased in North America and Europe. Since antibiotic treatment is ineffective and no efficient vaccine is available, M. bovis induced mycoplasmosis is primarily controlled by herd management measures such as the restriction of moving infected animals out of the herds and culling of infected or shedders of M. bovis. To better understand the population structure and genomic factors that may contribute to its transmission, we sequenced 147 M. bovis strains isolated from four different countries viz. USA (n = 121), Canada (n = 22), Israel (n = 3) and Lithuania (n = 1). All except two of the isolates (KRB1 and KRB8) were isolated from two host types i.e., bovine (n = 75) and bison (n = 70). We performed a large-scale comparative analysis of M. bovis genomes by integrating 103 publicly available genomes and our dataset (250 total genomes). Whole genome single nucleotide polymorphism (SNP) based phylogeny using M.agalactiae as an outgroup revealed that M. bovis population structure is composed of five different clades. USA isolates showed a high degree of genomic divergence in comparison to the Australian isolates. Based on host of origin, all the isolates in clade IV was of bovine origin, whereas majority of the isolates in clades III and V was of bison origin. Our comparative genome analysis also revealed that M. bovis has an open pangenome with a large breadth of unexplored diversity of genes. The function based analysis of autogenous vaccine candidates (n = 10) included in this study revealed that their functional diversity does not span the genomic diversity observed in all five clades identified in this study. Our study also found that M. bovis genome harbors a large number of IS elements and their number increases significantly (p = 7.8 × 10−6) as the genome size increases. Collectively, the genome data and the whole genome-based population analysis in this study may help to develop better understanding of M. bovis induced mycoplasmosis in cattle.


2017 ◽  
Author(s):  
Njlaa Bakhsh ◽  
Fatimah Jackson ◽  
Latifa Jackson ◽  
Christopher cross

The Arabian Peninsula (AP) is the first site of human migration and habitation outside of Africa. As a major crossroad for human populations, the AP provides an opportunity to better understand early to modern changes in human demographic patterns through selections, admixture, gene flow, and migration. Dramatic climatic fluctuations have been recorded in the AP that contributed to contractions and expansions in water availability. These climatological perturbations are thought to have shaped genomic variations in this population. Recent reports indicate that a number of Arab nation-states have committed significant resources to genetically type the national population, with the overall goal of determining the degree of genomic diversity in the AP. We sought to characterize currently typed genomic variation in Arabian populations to support the rationale for our proposed analyses of Saudi Arabian genomic diversity. Interestingly, in contrast to published claims , a comprehensive search of peer-reviewed reports on genomic analysis (N=20 papers) revealed no genomic data from four national genomic projects (Qatar, Saudi Arabia, Kuwait, and The United Arab Emirates). Our analysis demonstrates that while much fanfare and presumably resources have been devoted to defining the genomic landscape of the Arabian peoples, little actual data is available to either substantiate or support such an investment.


2014 ◽  
Author(s):  
Michael C Schatz ◽  
Lyza G Maron ◽  
Joshua C Stein ◽  
Alejandro Hernandez Wences ◽  
James Gurtowski ◽  
...  

The use of high throughput genome-sequencing technologies has uncovered a large extent of structural variation in eukaryotic genomes that makes important contributions to genomic diversity and phenotypic variation. Currently, when the genomes of different strains of a given organism are compared, whole genome resequencing data are aligned to an established reference sequence. However when the reference differs in significant structural ways from the individuals under study, the analysis is often incomplete or inaccurate. Here, we use rice as a model to explore the extent of structural variation among strains adapted to different ecologies and geographies, and show that this variation can be significant, often matching or exceeding the variation present in closely related human populations or other mammals. We demonstrate how improvements in sequencing and assembly technology allow rapid and inexpensive de novo assembly of next generation sequence data into high-quality assemblies that can be directly compared to provide an unbiased assessment. Using this approach, we are able to accurately assess the ?pan-genome? of three divergent rice varieties and document several megabases of each genome absent in the other two. Many of the genome-specific loci are annotated to contain genes, reflecting the potential for new biological properties that would be missed by standard resequencing approaches. We further provide a detailed analysis of several loci associated with agriculturally important traits, illustrating the utility of our approach for biological discovery. All of the data and software are openly available to support further breeding and functional studies of rice and other species.


2017 ◽  
Author(s):  
Njlaa Bakhsh ◽  
Fatimah Jackson ◽  
Latifa Jackson ◽  
Christopher cross

The Arabian Peninsula (AP) is the first site of human migration and habitation outside of Africa. As a major crossroad for human populations, the AP provides an opportunity to better understand early to modern changes in human demographic patterns through selections, admixture, gene flow, and migration. Dramatic climatic fluctuations have been recorded in the AP that contributed to contractions and expansions in water availability. These climatological perturbations are thought to have shaped genomic variations in this population. Recent reports indicate that a number of Arab nation-states have committed significant resources to genetically type the national population, with the overall goal of determining the degree of genomic diversity in the AP. We sought to characterize currently typed genomic variation in Arabian populations to support the rationale for our proposed analyses of Saudi Arabian genomic diversity. Interestingly, in contrast to published claims , a comprehensive search of peer-reviewed reports on genomic analysis (N=20 papers) revealed no genomic data from four national genomic projects (Qatar, Saudi Arabia, Kuwait, and The United Arab Emirates). Our analysis demonstrates that while much fanfare and presumably resources have been devoted to defining the genomic landscape of the Arabian peoples, little actual data is available to either substantiate or support such an investment.


2017 ◽  
Author(s):  
Céline Adam ◽  
Raphaël Guérois ◽  
Anna Citarella ◽  
Laura Verardi ◽  
Florine Adolphe ◽  
...  

AbstractHistone H3K4 methylation is a feature of meiotic recombination hotspots shared by many organisms including plants and mammals. Meiotic recombination is initiated by programmed double-strand break (DSB) formation that in budding yeast takes place in gene promoters and is promoted by histone H3K4 di/trimethylation. This histone modification is recognized by Spp1, a PHD-finger containing protein that belongs to the conserved histone H3K4 methyltransferase Set1 complex. During meiosis, Spp1 binds H3K4me3 and interacts with a DSB protein, Mer2, to promote DSB formation close to gene promoters. How Set1 complex- and Mer2- related functions of Spp1 are connected is not clear. Here, combining genome-wide localization analyses, biochemical approaches and the use of separation of function mutants, we show that Spp1 is present within two distinct complexes in meiotic cells, the Set1 and the Mer2 complexes. Disrupting the Spp1-Set1 interaction mildly decreases H3K4me3 levels and does not affect meiotic recombination initiation. Conversely, the Spp1-Mer2 interaction is required for normal meiotic recombination initiation, but dispensable for Set1 complex-mediated histone H3K4 methylation. Finally, we evidence that Spp1 preserves normal H3K4me3 levels independently of the Set1 complex. We propose a model where the three populations of Spp1 work sequentially to promote recombination initiation: first by depositing histone H3K4 methylation (Set1 complex), next by “reading” and protecting histone H3K4 methylation, and finally by making the link with the chromosome axis (Mer2-Spp1 complex). This work deciphers the precise roles of Spp1 in meiotic recombination and opens perspectives to study its functions in other organisms where H3K4me3 is also present at recombination hotspots.Author summaryMeiotic recombination is a conserved pathway of sexual reproduction that is required to faithfully segregate homologous chromosomes and produce viable gametes. Recombination events between homologous chromosomes are triggered by the programmed formation of DNA breaks, which occur preferentially at places called hotspots. In many organisms, these hotspots are located close to a particular chromatin modification, the methylation of lysine 4 of histone H3 (H3K4me3). It was previously shown in the budding yeast model that one protein, Spp1, plays an important function in this process. We further explored the functional link between Spp1 and its interacting partners, and show that Spp1 shows genetically separable functions, by depositing the H3K4me3 mark on the chromatin, “reading” and protecting it, and linking it to the recombination proteins. We provide evidence that Spp1 is in three independent complexes to perform these functions. This work opens perspectives for understanding the process in other eukaryotes such as mammals, where most of the proteins involved are conserved.


Sign in / Sign up

Export Citation Format

Share Document