Introgression patterns between house mouse subspecies and species reveal genomic windows of frequent exchange

Mapping Intimacies ◽

10.1101/168328 ◽

2017 ◽

Cited By ~ 3

Author(s):

Kristian Karsten Ullrich ◽

Miriam Linnenbrink ◽

Diethard Tautz

Keyword(s):

Genetic Material ◽

Innate Immune ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Mus Spretus ◽

Coding Regions ◽

Genomic Conflict ◽

Frequent Exchange ◽

Genomic Regions ◽

Active Genes

AbstractBased on whole genome sequencing data, we have studied the patterns of introgression in a phylogenetically well defined set of populations, sub-species and species of mice (Mus m. domesticus, Mus m. musculus, Mus m. castaneus and Mus spretus). We find that many discrete genomic regions are subject to repeated and mutual introgression and exchange. The majority of these regions code for genes that are involved in parasite defense or genomic conflict. They include genes involved in adaptive immunity, such as the MHC region or antibody coding regions, but also genes involved in innate immune reactions of the epidermis. We find also clusters of KRAB zinc finger proteins that control the spread of transposable elements and genes that are involved in meiotic drive. These findings suggest that even well separated populations and species maintain the capacity to exchange genetic material in a special set of evolutionary active genes.

Download Full-text

Identity-by-descent analyses for measuring population dynamics and selection in recombining pathogens

10.1101/088039 ◽

2016 ◽

Cited By ~ 3

Author(s):

Lyndal Henden ◽

Stuart Lee ◽

Ivo Mueller ◽

Alyssa Barry ◽

Melanie Bahlo

Keyword(s):

Drug Resistance ◽

Positive Selection ◽

Whole Genome Sequencing Data ◽

Population Bottlenecks ◽

Human Pathogens ◽

Sequencing Data ◽

Antimalarial Drug Resistance ◽

Genomic Regions ◽

Genomic Locations ◽

Critical Regions

AbstractIdentification of genomic regions that are identical by descent (IBD) has proven useful for human genetic studies where analyses have led to the discovery of familial relatedness and fine-mapping of disease critical regions. Unfortunately however, IBD analyses have been underutilized inanalysis of other organisms, including human pathogens. This is in part due to the lack of statistical methodologies for non-diploid genomes in addition to the added complexity of multiclonal infections. As such, we have developed an IBD methodology, called isoRelate, for analysis of haploid recombining microorganisms in the presence of multiclonal infections. Using the inferred IBD status at genomic locations, we have also developed a novel statistic for identifying loci under positive selection and propose relatedness networks as a means of exploring shared haplotypes within populations. We evaluate the performance of our methodologies for detecting IBD and selection, including comparisons with existing tools, then perform an exploratory analysis of whole genome sequencing data from a global Plasmodium falciparum dataset of more than 2500 genomes. This analysis identifies Southeast Asia as havingmany highly related isolates, possibly as a result of both reduced transmission from intensified control efforts and population bottlenecks following the emergence of antimalarial drug resistance. Many signals of selection are also identified, most of which overlap genes that are known to be associated with drug resistance, in addition to two novel signals observed in multiple countries that have yet to be explored in detail. Additionally, we investigate relatedness networks over the selected loci and determine that one of these sweeps has spread between continents while the other has arisen independently in different countries. IBD analysis of microorganisms using isoRelate can be used for exploring population structure, positive selection and haplotype distributions, and will be a valuable tool for monitoring disease control and elimination efforts of many diseases.

Download Full-text

Sensitive, Highly Multiplexed Sequencing of Microhaplotypes From the Plasmodium falciparum Heterozygome

The Journal of Infectious Diseases ◽

10.1093/infdis/jiaa527 ◽

2020 ◽

Cited By ~ 1

Author(s):

Sofonias K Tessema ◽

Nicholas J Hathaway ◽

Noam B Teyssier ◽

Maxwell Murphy ◽

Anna Chen ◽

...

Keyword(s):

Plasmodium Falciparum ◽

Low Cost ◽

Dried Blood Spots ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Flexible Tool ◽

High Coverage ◽

Targeted Next Generation Sequencing ◽

Genomic Regions ◽

High Diversity

Abstract Background Targeted next-generation sequencing offers the potential for consistent, deep coverage of information-rich genomic regions to characterize polyclonal Plasmodium falciparum infections. However, methods to identify and sequence these genomic regions are currently limited. Methods A bioinformatic pipeline and multiplex methods were developed to identify and simultaneously sequence 100 targets and applied to dried blood spot (DBS) controls and field isolates from Mozambique. For comparison, whole-genome sequencing data were generated for the same controls. Results Using publicly available genomes, 4465 high-diversity genomic regions suited for targeted sequencing were identified, representing the P. falciparum heterozygome. For this study, 93 microhaplotypes with high diversity (median expected heterozygosity = 0.7) were selected along with 7 drug resistance loci. The sequencing method achieved very high coverage (median 99%), specificity (99.8%), and sensitivity (90% for haplotypes with 5% within sample frequency in dried blood spots with 100 parasites/µL). In silico analyses revealed that microhaplotypes provided much higher resolution to discriminate related from unrelated polyclonal infections than biallelic single-nucleotide polymorphism barcodes. Conclusions The bioinformatic and laboratory methods outlined here provide a flexible tool for efficient, low-cost, high-throughput interrogation of the P. falciparum genome, and can be tailored to simultaneously address multiple questions of interest in various epidemiological settings.

Download Full-text

Pan-cancer analysis of non-coding recurrent mutations and their possible involvement in cancer pathogenesis

NAR Cancer ◽

10.1093/narcan/zcab008 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Chie Kikutake ◽

Minako Yoshihara ◽

Mikita Suyama

Keyword(s):

The Cancer Genome Atlas ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Huge Number ◽

Protein Coding ◽

Coding Regions ◽

Cancer Pathogenesis ◽

Recurrent Mutations ◽

Cancer Genome Atlas ◽

Pan Cancer

Abstract Cancer-related mutations have been mainly identified in protein-coding regions. Recent studies have demonstrated that mutations in non-coding regions of the genome could also be a risk factor for cancer. However, the non-coding regions comprise 98% of the total length of the human genome and contain a huge number of mutations, making it difficult to interpret their impacts on pathogenesis of cancer. To comprehensively identify cancer-related non-coding mutations, we focused on recurrent mutations in non-coding regions using somatic mutation data from COSMIC and whole-genome sequencing data from The Cancer Genome Atlas (TCGA). We identified 21 574 recurrent mutations in non-coding regions that were shared by at least two different samples from both COSMIC and TCGA databases. Among them, 580 candidate cancer-related non-coding recurrent mutations were identified based on epigenomic and chromatin structure datasets. One of such mutation was located in RREB1 binding site that is thought to interact with TEAD1 promoter. Our results suggest that mutations may disrupt the binding of RREB1 to the candidate enhancer region and increase TEAD1 expression levels. Our findings demonstrate that non-coding recurrent mutations and coding mutations may contribute to the pathogenesis of cancer.

Download Full-text

Application of deep learning algorithm on whole genome sequencing data uncovers structural variants associated with multiple mental disorders in African American patients

Molecular Psychiatry ◽

10.1038/s41380-021-01418-1 ◽

2022 ◽

Author(s):

Yichuan Liu ◽

Hui-Qi Qu ◽

Frank D. Mentch ◽

Jingchun Qu ◽

Xiao Chang ◽

...

Keyword(s):

Deep Learning ◽

Mental Disorders ◽

Mental Disorder ◽

Genome Sequencing ◽

Learning Algorithm ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Coding Regions ◽

Deep Learning Algorithm

AbstractMental disorders present a global health concern, while the diagnosis of mental disorders can be challenging. The diagnosis is even harder for patients who have more than one type of mental disorder, especially for young toddlers who are not able to complete questionnaires or standardized rating scales for diagnosis. In the past decade, multiple genomic association signals have been reported for mental disorders, some of which present attractive drug targets. Concurrently, machine learning algorithms, especially deep learning algorithms, have been successful in the diagnosis and/or labeling of complex diseases, such as attention deficit hyperactivity disorder (ADHD) or cancer. In this study, we focused on eight common mental disorders, including ADHD, depression, anxiety, autism, intellectual disabilities, speech/language disorder, delays in developments, and oppositional defiant disorder in the ethnic minority of African Americans. Blood-derived whole genome sequencing data from 4179 individuals were generated, including 1384 patients with the diagnosis of at least one mental disorder. The burden of genomic variants in coding/non-coding regions was applied as feature vectors in the deep learning algorithm. Our model showed ~65% accuracy in differentiating patients from controls. Ability to label patients with multiple disorders was similarly successful, with a hamming loss score less than 0.3, while exact diagnostic matches are around 10%. Genes in genomic regions with the highest weights showed enrichment of biological pathways involved in immune responses, antigen/nucleic acid binding, chemokine signaling pathway, and G-protein receptor activities. A noticeable fact is that variants in non-coding regions (e.g., ncRNA, intronic, and intergenic) performed equally well as variants in coding regions; however, unlike coding region variants, variants in non-coding regions do not express genomic hotspots whereas they carry much more narrow standard deviations, indicating they probably serve as alternative markers.

Download Full-text

Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning

Nature Communications ◽

10.1038/s41467-021-21790-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Dimitrios Vitsios ◽

Ryan S. Dhindsa ◽

Lawrence Middleton ◽

Ayal B. Gussow ◽

Slavé Petrovski

Keyword(s):

Deep Learning ◽

Genomic Sequence ◽

Strong Predictor ◽

Whole Genome Sequencing Data ◽

Disease Genes ◽

Mendelian Disease ◽

Human Lineage ◽

Sequencing Data ◽

Coding Regions ◽

Residual Variation

AbstractElucidating functionality in non-coding regions is a key challenge in human genomics. It has been shown that intolerance to variation of coding and proximal non-coding sequence is a strong predictor of human disease relevance. Here, we integrate intolerance to variation, functional genomic annotations and primary genomic sequence to build JARVIS: a comprehensive deep learning model to prioritize non-coding regions, outperforming other human lineage-specific scores. Despite being agnostic to evolutionary conservation, JARVIS performs comparably or outperforms conservation-based scores in classifying pathogenic single-nucleotide and structural variants. In constructing JARVIS, we introduce the genome-wide residual variation intolerance score (gwRVIS), applying a sliding-window approach to whole genome sequencing data from 62,784 individuals. gwRVIS distinguishes Mendelian disease genes from more tolerant CCDS regions and highlights ultra-conserved non-coding elements as the most intolerant regions in the human genome. Both JARVIS and gwRVIS capture previously inaccessible human-lineage constraint information and will enhance our understanding of the non-coding genome.

Download Full-text

Germline de novo mutation clusters arise during oocyte aging in genomic regions with increased double-strand break incidence

10.1101/140111 ◽

2017 ◽

Cited By ~ 2

Author(s):

Jakob M. Goldmann ◽

Vladimir B. Seplyarskiy ◽

Wendy S.W. Wong ◽

Thierry Vilboux ◽

Dale L. Bodian ◽

...

Keyword(s):

De Novo ◽

Underlying Mechanism ◽

Strand Break ◽

De Novo Mutation ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Induced Mutations ◽

Cancer Genomes ◽

Oocyte Aging ◽

Genomic Regions

Clustering of mutations has been found both in somatic mutations from cancer genomes and in germline de novo mutations (DNMs). We identified 1,755 clustered DNMs (cDNMs) within whole-genome sequencing data from 1,291 parent-offspring trios and investigated the underlying mutational mechanisms. We found that the number of clusters on the maternalallele was positively correlated with maternal age and that these consist of more individual mutations with larger intra-mutational distances compared to paternal clusters. More than 50% of maternal clusters were located on chromosomes 8, 9 and 16, in regions with an overall increased maternal mutation rate. Maternal clusters in these regions showed a distinct mutation signature characterized by C>G mutations. Finally, we found that maternal clusters associate with processes involving double-stranded-breaks (DSBs) such as meiotic gene conversions and de novo deletions events. These findings suggest accumulation of DSB-induced mutations throughout oocyte aging as an underlying mechanism leading to maternal mutation clusters.

Download Full-text

Computational methods for the discovery and annotation of viral integrations

10.1101/2021.08.28.458009 ◽

2021 ◽

Author(s):

Umberto Palatini ◽

Elisa Pischedda ◽

Mariangela Bonizzoni

Keyword(s):

Genetic Material ◽

Eukaryotic Cells ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Genotoxic Effects ◽

Dna Viruses ◽

Widespread Occurrence ◽

Persistent Viral Infection ◽

A Genome ◽

Endogenous Viral Elements

The transfer of genetic material between viruses and eukaryotic cells is pervasive. Somatic integrations of DNA viruses and retroviruses have been linked to persistent viral infection and genotoxic effects. Integrations into germline cells, referred to as Endogenous Viral Elements (EVEs), can be co-opted for host functions. Besides DNA viruses and retroviruses, EVEs can also derive from nonretroviral RNA viruses, which have often been observed in piRNA clusters. Here, we describe a bioinformatic framework to annotate EVEs in a genome assembly, study their widespread occurrence and polymorphism and identify sample-specific viral integrations using whole-genome sequencing data.

Download Full-text

No rare deleterious variants from STK32B, PPARGC1A, and CTNNA3 are associated with essential tremor

Neurology Genetics ◽

10.1212/nxg.0000000000000195 ◽

2017 ◽

Vol 3 (5) ◽

pp. e195

Author(s):

Gabrielle Houle ◽

Amirthagowri Ambalavanan ◽

Jean-François Schmouth ◽

Claire S. Leblond ◽

Dan Spiegelman ◽

...

Keyword(s):

Essential Tremor ◽

Genome Wide Association Study ◽

Rare Variants ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Cumulative Impact ◽

Coding Regions ◽

Genome Wide ◽

Whole Exome ◽

Or Gene

Objective:To assess the contribution of variants in STK32B, PPARGC1A, and CTNNA3 as essential tremor (ET) predisposing factors following their association in a 2-stage genome-wide association study (GWAS).Methods:The coding regions of these genes was examined for the presence of rare variants using two approaches: (1) Looking at whole-exome and whole-genome sequencing data of 14 autosomal dominant multiplex ET families. (2) Conducting a targeted massive parallel sequencing to examine the three genes in cohorts of 269 ET cases and 287 control individuals. The cumulative impact of rare variants was assessed using SKAT-O analyses using (1) all variants, (2) only rare variants, and (3) only the rare variants altering the mRNA.Results:Thirty-four variants were identified. No difference emerged regarding the distributions of individual variants (or gene) between cases and controls.Conclusion:No rare exonic variants further validated one of these genes as a risk factor for ET. The recent GWAS offers promising avenues, but the genetic heterogeneity of ET is nonetheless challenging for the validation of risk factors, and ultimately larger cohorts of cases should help to overcome this task.

Download Full-text

Genetic architecture and marker-assisted breeding for salt tolerance in soybean

10.32469/10355/68890 ◽

2018 ◽

Author(s):

◽

Tuyen Duc Do

Keyword(s):

Salt Tolerance ◽

Qtl Mapping ◽

Genome Wide Association Study ◽

Major Gene ◽

Snp Markers ◽

Whole Genome Sequencing Data ◽

Salt Tolerant ◽

Sequencing Data ◽

Content Ratio ◽

Genomic Regions

Salinity is one of the major abiotic stresses that inhibits plant growth and causes seed yield loss in soybean. Although a major gene for salt tolerance on chromosome (Chr.) 3 was mapped, cloned and characterized, it does not fully explain genetic variability for tolerance in soybean. Two mapping approaches, quantitative trait loci (QTL) mapping and genome-wide association study (GWAS), can complement each other to identify genomic regions and molecular markers associated with traits of interest. QTL mapping is more suitable to map traits governed by rare alleles in a designed population while GWAS is better in mapping traits underlined by few genes of large effect in the natural population. This study was performed to identify additional loci and new sources for salt tolerance by using both approaches. For bi-parental QTL mapping, salt tolerance of 132 F2 families was evaluated by accessing leaf scorch score (LSS), chlorophyll content ratio (CCR), leaf sodium content (LSC), and leaf chloride content (LCC). Their genotypes were obtained using the Illumina Infinium SoySNP6K BeadChip assay to map salt tolerant gene(s). A major locus significantly associated with LSS, CCR, LSC, and LCC was mapped to Chr. 3 with LOD scores of 19.1, 11.0, 7.7, and 25.6, respectively. In addition, a second locus associated with salt tolerance for LSC was also detected and mapped on Chr. 13 with a LOD score of 4.6 and an R2 of 0.115. The evaluation of salt tolerance of an F5 population derived from the same cross showed that combining salt tolerant alleles of major and minor loci significantly increased salt tolerance. On the other hand, GWAS for salt tolerance was conducted using SNPs of two datasets, SoySNP50K iSelect BeadChip and 3.7M SNP dataset (from whole-genome sequencing data), across 305 soybean accessions of a diverse panel. The known gene on Chr. 3 was confirmed by three gene-based markers (GBMs) that integrated into both datasets. Other genomic regions significantly associated with salt tolerance were identified on Chrs. 1, 2, 5, 6, 8, 14, 18, and 19 by analyzing 3.7M SNP dataset, in which the position on Chr. 8 strongly predicted a new minor locus for salt tolerance. The genotype-phenotype correlation using three GBMs discovered six new salt tolerant sources that may carry novel gene(s) for salt tolerance. By complementation tests and segregation analysis of salt tolerance among F2 plants developed from a cross of Fiskeby III and a salt tolerance accession, PI 468908, it was speculated that salt tolerance from PI 468908 was possibly controlled by a new gene instead of the known gene on Chr. 3. These significant loci in new salt tolerant sources coupled with significant SNP markers could be useful for marker-assisted selection in molecular breeding programs to improve salt tolerance in soybean.

Download Full-text

Detection of shared balancing selection in the absence of trans-species polymorphism

10.1101/320390 ◽

2018 ◽

Author(s):

Xiaoheng Cheng ◽

Michael DeGiorgio

Keyword(s):

Balancing Selection ◽

Single Species ◽

Ease Of Use ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Model Based ◽

Higher Power ◽

Multiple Species ◽

Genomic Regions

AbstractTrans-species polymorphism has been widely used as a key sign of long-term balancing selection across multiple species. However, such sites are often rare in the genome, and could result from mutational processes or technical artifacts. Few methods are yet available to specifically detect footprints of trans-species balancing selection without using trans-species polymorphic sites. In this study, we develop summary- and model-based approaches that are each specifically tailored to uncover regions of long-term balancing selection shared by a set of species by using genomic patterns of intra-specific polymorphism and inter-specific fixed differences. We demonstrate that our trans-species statistics have substantially higher power than single-species approaches to detect footprints of trans-species balancing selection, and are robust to those that do not affect all tested species. We further apply our model-based methods to human and chimpanzee whole genome sequencing data. In addition to the previously-established MHC and malaria resistance-associated FREM3/GYPE regions, we also find outstanding genomic regions involved in barrier integrity and innate immunity, such as the GRIK1/CLDN17 intergenic region, and the SLC35F1 and ABCA13 genes. Our findings not only echo the significance of pathogen defense, but also reveal novel candidates in maintaining balanced polymorphisms across human and chimpanzee lineages. Finally, we show that these trans-species statistics can be applied to and work well for an arbitrary number of species, and integrate them into open-source software packages for ease of use by the scientific community.

Download Full-text