scholarly journals Inferring the landscape of recombination using recurrent neural networks

2019 ◽  
Author(s):  
Jeffrey R. Adrion ◽  
Jared G. Galloway ◽  
Andrew D. Kern

AbstractAccurately inferring the genome-wide landscape of recombination rates in natural populations is a central aim in genomics, as patterns of linkage influence everything from genetic mapping to understanding evolutionary history. Here we describe ReLERNN, a deep learning method for estimating a genome-wide recombination map that is accurate even with small numbers of pooled or individually sequenced genomes. Rather than use summaries of linkage disequilibrium as its input, ReLERNN takes columns from a genotype alignment, which are then modeled as a sequence across the genome using a recurrent neural network. We demonstrate that ReLERNN improves accuracy and reduces bias relative to existing methods and maintains high accuracy in the face of demographic model misspecification, missing genotype calls, and genome inaccessibility. We apply ReLERNN to natural populations of African Drosophila melanogaster and show that genome-wide recombination landscapes, while largely correlated among populations, exhibit important population-specific differences. Lastly, we connect the inferred patterns of recombination with the frequencies of major inversions segregating in natural Drosophila populations.

2020 ◽  
Vol 37 (6) ◽  
pp. 1790-1808 ◽  
Author(s):  
Jeffrey R Adrion ◽  
Jared G Galloway ◽  
Andrew D Kern

Abstract Accurately inferring the genome-wide landscape of recombination rates in natural populations is a central aim in genomics, as patterns of linkage influence everything from genetic mapping to understanding evolutionary history. Here, we describe recombination landscape estimation using recurrent neural networks (ReLERNN), a deep learning method for estimating a genome-wide recombination map that is accurate even with small numbers of pooled or individually sequenced genomes. Rather than use summaries of linkage disequilibrium as its input, ReLERNN takes columns from a genotype alignment, which are then modeled as a sequence across the genome using a recurrent neural network. We demonstrate that ReLERNN improves accuracy and reduces bias relative to existing methods and maintains high accuracy in the face of demographic model misspecification, missing genotype calls, and genome inaccessibility. We apply ReLERNN to natural populations of African Drosophila melanogaster and show that genome-wide recombination landscapes, although largely correlated among populations, exhibit important population-specific differences. Lastly, we connect the inferred patterns of recombination with the frequencies of major inversions segregating in natural Drosophila populations.


2017 ◽  
Vol 7 (7) ◽  
pp. 2391-2403 ◽  
Author(s):  
Amanda S Lobell ◽  
Rachel R Kaspari ◽  
Yazmin L Serrano Negron ◽  
Susan T Harbison

Abstract Ovariole number has a direct role in the number of eggs produced by an insect, suggesting that it is a key morphological fitness trait. Many studies have documented the variability of ovariole number and its relationship to other fitness and life-history traits in natural populations of Drosophila. However, the genes contributing to this variability are largely unknown. Here, we conducted a genome-wide association study of ovariole number in a natural population of flies. Using mutations and RNAi-mediated knockdown, we confirmed the effects of 24 candidate genes on ovariole number, including a novel gene, anneboleyn (formerly CG32000), that impacts both ovariole morphology and numbers of offspring produced. We also identified pleiotropic genes between ovariole number traits and sleep and activity behavior. While few polymorphisms overlapped between sleep parameters and ovariole number, 39 candidate genes were nevertheless in common. We verified the effects of seven genes on both ovariole number and sleep: bin3, blot, CG42389, kirre, slim, VAChT, and zfh1. Linkage disequilibrium among the polymorphisms in these common genes was low, suggesting that these polymorphisms may evolve independently.


2019 ◽  
Vol 110 (3) ◽  
pp. 361-369 ◽  
Author(s):  
Katherine L Bell ◽  
Chris C Nice ◽  
Darrin Hulsey

Abstract In recent decades, an increased understanding of molecular ecology has led to a reinterpretation of the role of gene flow during the evolution of reproductive isolation and biological novelty. For example, even in the face of ongoing gene flow strong selection may maintain divergent polymorphisms, or gene flow may introduce novel biological diversity via hybridization and introgression from a divergent species. Herein, we elucidate the evolutionary history and genomic basis of a trophically polymorphic trait in a species of cichlid fish, Herichthys minckleyi. We explored genetic variation at 3 hierarchical levels; between H. minckleyi (n = 69) and a closely related species Herichthys cyanoguttatus (n = 10), between H. minckleyi individuals from 2 geographic locations, and finally between individuals with alternate morphotypes at both a genome-wide and locus-specific scale. We found limited support for the hypothesis that the H. minckleyi polymorphism is the result of ongoing hybridization between the 2 species. Within H. minckleyi we found evidence of geographic genetic structure, and using traditional population genetic analyses found that individuals of alternate morphotypes within a pool appear to be panmictic. However, when we used a locus-specific approach to examine the relationship between multi-locus genotype, tooth size, and geographic sampling, we found the first evidence for molecular genetic differences between the H. minckleyi morphotypes.


2018 ◽  
Vol 116 (3) ◽  
pp. 900-908 ◽  
Author(s):  
Hamutal Arbel ◽  
Sumanta Basu ◽  
William W. Fisher ◽  
Ann S. Hammonds ◽  
Kenneth H. Wan ◽  
...  

Identifying functional enhancer elements in metazoan systems is a major challenge. Large-scale validation of enhancers predicted by ENCODE reveal false-positive rates of at least 70%. We used the pregrastrula-patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held-out data results from heterogeneity of functional signatures in enhancer elements. We show that at least two classes of enhancers are active during early Drosophila embryogenesis and that by focusing on a single, relatively homogeneous class of elements, greater than 98% prediction accuracy can be achieved in a balanced, completely held-out test set. The class of well-predicted elements is composed predominantly of enhancers driving multistage segmentation patterns, which we designate segmentation driving enhancers (SDE). Prediction is driven by the DNA occupancy of early developmental transcription factors, with almost no additional power derived from histone modifications. We further show that improved accuracy is not a property of a particular prediction method: after conditioning on the SDE set, naïve Bayes and logistic regression perform as well as more sophisticated tools. Applying this method to a genome-wide scan, we predict 1,640 SDEs that cover 1.6% of the genome. An analysis of 32 SDEs using whole-mount embryonic imaging of stably integrated reporter constructs chosen throughout our prediction rank-list showed >90% drove expression patterns. We achieved 86.7% precision on a genome-wide scan, with an estimated recall of at least 98%, indicating high accuracy and completeness in annotating this class of functional elements.


2019 ◽  
Author(s):  
Andrew D. Foote ◽  
Michael D. Martin ◽  
Marie Louis ◽  
George Pacheco ◽  
Kelly M. Robertson ◽  
...  

AbstractReconstruction of the demographic and evolutionary history of populations assuming a consensus tree-like relationship can mask more complex scenarios, which are prevalent in nature. An emerging genomic toolset, which has been most comprehensively harnessed in the reconstruction of human evolutionary history, enables molecular ecologists to elucidate complex population histories. Killer whales have limited extrinsic barriers to dispersal and have radiated globally, and are therefore a good candidate model for the application of such tools. Here, we analyse a global dataset of killer whale genomes in a rare attempt to elucidate global population structure in a non-human species. We identify a pattern of genetic homogenisation at lower latitudes and the greatest differentiation at high latitudes, even between currently sympatric lineages. The processes underlying the major axis of structure include high drift at the edge of species’ range, likely associated with founder effects and allelic surfing during post-glacial range expansion. Divergence between Antarctic and non-Antarctic lineages is further driven by ancestry segments with up to four-fold older coalescence time than the genome-wide average; relicts of a previous vicariance during an earlier glacial cycle. Our study further underpins that episodic gene flow is ubiquitous in natural populations, and can occur across great distances and after substantial periods of isolation between populations. Thus, understanding the evolutionary history of a species requires comprehensive geographic sampling and genome-wide data to sample the variation in ancestry within individuals.


2021 ◽  
Author(s):  
Ho Namkoong ◽  
Ryuya Edahiro ◽  
Koichi Fukunaga ◽  
Yuya Shirai ◽  
Kyuto Sonehara ◽  
...  

To elucidate the host genetic loci affecting severity of SARS-CoV-2 infection, or Coronavirus disease 2019 (COVID-19), is an emerging issue in the face of the current devastating pandemic. Here, we report a genome-wide association study (GWAS) of COVID-19 in a Japanese population led by the Japan COVID-19 Task Force, as one of the initial discovery GWAS studies performed on a non-European population. Enrolling a total of 2,393 cases and 3,289 controls, we not only replicated previously reported COVID-19 risk variants (e.g., LZTFL1, FOXP4, ABO, and IFNAR2), but also found a variant on 5p35 (rs60200309-A at DOCK2) that was significantly associated with severe COVID-19 in younger (<65 years of age) patients with a genome-wide significant p-value of 1.2 × 10-8 (odds ratio = 2.01, 95% confidence interval = 1.58-2.55). This risk allele was prevalent in East Asians, including Japanese (minor allele frequency [MAF] = 0.097), but rarely found in Europeans. Cross-population Mendelian randomization analysis made a causal inference of a number of complex human traits on COVID-19. In particular, obesity had a significant impact on severe COVID-19. The presence of the population-specific risk allele underscores the need of non-European studies of COVID-19 host genetics.


2020 ◽  
Vol 20 (2) ◽  
pp. 544-559 ◽  
Author(s):  
Ingerid J. Hagen ◽  
Sigbjørn Lien ◽  
Anna M. Billing ◽  
Tore O. Elgvin ◽  
Cassandra Trier ◽  
...  

2020 ◽  
Author(s):  
Ator Ashoti ◽  
Francesco Limone ◽  
Melissa van Kranenburg ◽  
Anna Alemany ◽  
Mirna Baak ◽  
...  

AbstractFacioscapulohumeral muscular dystrophy (FHSD), a fundamentally complex muscle disorder that thus far remains untreatable. As the name implies, FSHD starts in the muscles of the face and shoulder gridle. The main perturbator of the disease is the pioneer transcription factor DUX4, which is misexpressed in affected tissues due to a failure in epigenetic repressive mechanisms. In pursuit of unraveling the underlying mechanism of FSHD and finding potential therapeutic targets or treatment options, we performed an exhaustive genome-wide CRISPR/Cas9 phenotypic rescue screen to identify modulators of DUX4 cytotoxicity. We found no key effectors other than DUX4 itself, suggesting treatment efforts in FSHD should be directed towards its direct modulation.The screen did however reveal some rare and unexpected Cas9-induced genomic events, that may provide important considerations for planning future CRISPR/Cas9 knock-out screens.


2018 ◽  
Author(s):  
Yujun Cui ◽  
Chao Yang ◽  
Hongling Qiu ◽  
Hui Wang ◽  
Ruifu Yang ◽  
...  

AbstractInvestigating fitness interactions in natural populations remains a considerable challenge. We take advantage of the unique population structure of Vibrio parahaemolyticus, a bacterial pathogen of humans and shrimp, to perform a genome-wide screen for coadapted genetic elements. We identified 90 interaction groups involving 1,560 coding genes. 82 of these interaction groups are between accessory genes, many of which have functions related to carbohydrate transport and metabolism. Only 8 interaction groups involve both core and accessory genomes. The largest includes 1,540 SNPs in 82 genes and 338 accessory genome elements, many involved in lateral flagella and cell wall biogenesis. The interactions have a complex hierarchical structure encoding at least four distinct ecological strategies. Preliminary experiments imply that the strategies influence biofilm formation and bacterial growth rate in vitro. One strategy involves a divergent profile in multiple genome regions, implying that strains have irreversibly specialized, while the others involve fewer genes and are more plastic. Our results imply that most genetic alliances are ephemeral but that increasingly complex strategies can evolve and eventually cause speciation.


2020 ◽  
Vol 37 (10) ◽  
pp. 2825-2837 ◽  
Author(s):  
Paolo Franchini ◽  
Andreas F Kautt ◽  
Alexander Nater ◽  
Gloria Antonini ◽  
Riccardo Castiglia ◽  
...  

Abstract Chromosomal evolution is widely considered to be an important driver of speciation, as karyotypic reorganization can bring about the establishment of reproductive barriers between incipient species. One textbook example for genetic mechanisms of speciation are large-scale chromosomal rearrangements such as Robertsonian (Rb) fusions, a common class of structural variants that can drastically change the recombination landscape by suppressing crossing-over and influence gene expression by altering regulatory networks. Here, we explore the population structure and demographic patterns of a well-known house mouse Rb system in the Aeolian archipelago in Southern Italy using genome-wide data. By analyzing chromosomal regions characterized by different levels of recombination, we trace the evolutionary history of a set of Rb chromosomes occurring in different geographical locations and test whether chromosomal fusions have a single shared origin or occurred multiple times. Using a combination of phylogenetic and population genetic approaches, we find support for multiple, independent origins of three focal Rb chromosomes. The elucidation of the demographic patterns of the mouse populations within the Aeolian archipelago shows that an interplay between fixation of newly formed Rb chromosomes and hybridization events has contributed to shaping their current karyotypic distribution. Overall, our results illustrate that chromosome structure is much more dynamic than anticipated and emphasize the importance of large-scale chromosomal translocations in speciation.


Sign in / Sign up

Export Citation Format

Share Document