Ancestry-dependent Enrichment of Deleterious Homozygotes in Runs of Homozygosity

Mapping Intimacies ◽

10.1101/382721 ◽

2018 ◽

Author(s):

Zachary A. Szpiech ◽

Angel C.Y. Mak ◽

Marquitta J. White ◽

Donglei Hu ◽

Celeste Eng ◽

...

Keyword(s):

Native American ◽

Mexican American ◽

Complex Disease ◽

Disease Risk ◽

African Ancestry ◽

Population History ◽

Whole Genome Sequencing Data ◽

Runs Of Homozygosity ◽

Sequencing Data ◽

Local Ancestry

AbstractRuns of homozygosity (ROH) are important genomic features that manifest when an individual inherits two haplotypes that are identical-by-descent. Their length distributions are informative about population history, and their genomic locations are useful for mapping recessive loci contributing to both Mendelian and complex disease risk. We have previously shown that ROH, and especially long ROH that are likely the result of recent parental relatedness, are enriched for homozygous deleterious coding variation in a worldwide sample of outbred individuals. However, the distribution of ROH in admixed populations and their relationship to deleterious homozygous genotypes is understudied. Here we analyze whole genome sequencing data from 1,441 individuals from self-identified African American, Puerto Rican, and Mexican American populations. These populations are three-way admixed between European, African, and Native American ancestries and provide an opportunity to study the distribution of deleterious alleles partitioned by local ancestry and ROH. We re-capitulate previous findings that long ROH are enriched for deleterious variation genome-wide. We then partition by local ancestry and show that deleterious homozygotes arise at a higher rate when ROH overlap African ancestry segments than when they overlap European or Native American ancestry segments of the genome. These results suggest that, while ROH on any haplotype background are associated with an inflation of deleterious homozygous variation, African haplotype backgrounds may play a particularly important role in the genetic architecture of complex diseases for admixed individuals, highlighting the need for further study of these populations.

Download Full-text

Genetic profiles of 103,106 individuals in the Taiwan Biobank provide insights into the health and history of Han Chinese

npj Genomic Medicine ◽

10.1038/s41525-021-00178-9 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Chun-Yu Wei ◽

Jenn-Hwai Yang ◽

Erh-Chan Yeh ◽

Ming-Fang Tsai ◽

Hsiao-Jung Kao ◽

...

Keyword(s):

Large Scale ◽

Disease Risk ◽

Clinical Care ◽

Full Range ◽

Han Chinese ◽

Population History ◽

Whole Genome Sequencing Data ◽

Mendelian Disease ◽

Sequencing Data ◽

Functional Variants

AbstractPersonalized medical care focuses on prediction of disease risk and response to medications. To build the risk models, access to both large-scale genomic resources and human genetic studies is required. The Taiwan Biobank (TWB) has generated high-coverage, whole-genome sequencing data from 1492 individuals and genome-wide SNP data from 103,106 individuals of Han Chinese ancestry using custom SNP arrays. Principal components analysis of the genotyping data showed that the full range of Han Chinese genetic variation was found in the cohort. The arrays also include thousands of known functional variants, allowing for simultaneous ascertainment of Mendelian disease-causing mutations and variants that affect drug metabolism. We found that 21.2% of the population are mutation carriers of autosomal recessive diseases, 3.1% have mutations in cancer-predisposing genes, and 87.3% carry variants that affect drug response. We highlight how TWB data provide insight into both population history and disease burden, while showing how widespread genetic testing can be used to improve clinical care.

Download Full-text

Local Ancestry Adjusted Allelic Association Analysis Robustly Captures Tuberculosis Susceptibility Loci

Frontiers in Genetics ◽

10.3389/fgene.2021.716558 ◽

2021 ◽

Vol 12 ◽

Author(s):

Yolandi Swart ◽

Caitlin Uren ◽

Paul D. van Helden ◽

Eileen G. Hoal ◽

Marlo Möller

Keyword(s):

South Africa ◽

Complex Disease ◽

Disease Risk ◽

African Ancestry ◽

False Negative ◽

Genetic Admixture ◽

Allelic Association ◽

Local Ancestry ◽

Asian Populations ◽

Risk Alleles

Pulmonary tuberculosis (TB), caused by Mycobacterium tuberculosis, is a complex disease. The risk of developing active TB is in part determined by host genetic factors. Most genetic studies investigating TB susceptibility fail to replicate association signals particularly across diverse populations. South African populations arose because of multi-wave genetic admixture from the indigenous KhoeSan, Bantu-speaking Africans, Europeans, Southeast Asian-and East Asian populations. This has led to complex genetic admixture with heterogenous patterns of linkage disequilibrium and associated traits. As a result, precise estimation of both global and local ancestry is required to prevent both false positive and false-negative associations. Here, 820 individuals from South Africa were genotyped on the SNP-dense Illumina Multi-Ethnic Genotyping Array (∼1.7M SNPs) followed by local and global ancestry inference using RFMix. Local ancestry adjusted allelic association (LAAA) models were utilized owing to the extensive genetic heterogeneity present in this population. Hence, an interaction term, comprising the identification of the minor allele that corresponds to the ancestry present at the specific locus under investigation, was included as a covariate. One SNP (rs28647531) located on chromosome 4q22 was significantly associated with TB susceptibility and displayed a SNP minor allelic effect (G allele, frequency = 0.204) whilst correcting for local ancestry for Bantu-speaking African ancestry (p-value = 5.518 × 10−7; OR = 3.065; SE = 0.224). Although no other variants passed the significant threshold, clear differences were observed between the lead variants identified for each ancestry. Furthermore, the LAAA model robustly captured the source of association signals in multi-way admixed individuals from South Africa and allowed the identification of ancestry-specific disease risk alleles associated with TB susceptibility that have previously been missed.

Download Full-text

Whole genome sequencing reveals high differentiation, low levels of genetic diversity and short runs of homozygosity among Swedish wels catfish

Heredity ◽

10.1038/s41437-021-00438-5 ◽

2021 ◽

Author(s):

Axel Jensen ◽

Mette Lillie ◽

Kristofer Bergström ◽

Per Larsson ◽

Jacob Höglund

Keyword(s):

Genetic Diversity ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Peripheral Populations ◽

Whole Genome ◽

Runs Of Homozygosity ◽

Sequencing Data ◽

Isolated Populations ◽

Native Populations

AbstractThe use of genetic markers in the context of conservation is largely being outcompeted by whole-genome data. Comparative studies between the two are sparse, and the knowledge about potential effects of this methodology shift is limited. Here, we used whole-genome sequencing data to assess the genetic status of peripheral populations of the wels catfish (Silurus glanis), and discuss the results in light of a recent microsatellite study of the same populations. The Swedish populations of the wels catfish have suffered from severe declines during the last centuries and persists in only a few isolated water systems. Fragmented populations generally are at greater risk of extinction, for example due to loss of genetic diversity, and may thus require conservation actions. We sequenced individuals from the three remaining native populations (Båven, Emån, and Möckeln) and one reintroduced population of admixed origin (Helge å), and found that genetic diversity was highest in Emån but low overall, with strong differentiation among the populations. No signature of recent inbreeding was found, but a considerable number of short runs of homozygosity were present in all populations, likely linked to historically small population sizes and bottleneck events. Genetic substructure within any of the native populations was at best weak. Individuals from the admixed population Helge å shared most genetic ancestry with the Båven population (72%). Our results are largely in agreement with the microsatellite study, and stresses the need to protect these isolated populations at the northern edge of the distribution of the species.

Download Full-text

Population Genomics of American Mink Using Whole Genome Sequencing Data

Genes ◽

10.3390/genes12020258 ◽

2021 ◽

Vol 12 (2) ◽

pp. 258

Author(s):

Karim Karimi ◽

Duy Ngoc Do ◽

Mehdi Sargolzaei ◽

Younes Miar

Keyword(s):

Population Genomics ◽

Association Studies ◽

American Mink ◽

Population History ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Effective Population ◽

Cross Validation Error

Characterizing the genetic structure and population history can facilitate the development of genomic breeding strategies for the American mink. In this study, we used the whole genome sequences of 100 mink from the Canadian Centre for Fur Animal Research (CCFAR) at the Dalhousie Faculty of Agriculture (Truro, NS, Canada) and Millbank Fur Farm (Rockwood, ON, Canada) to investigate their population structure, genetic diversity and linkage disequilibrium (LD) patterns. Analysis of molecular variance (AMOVA) indicated that the variation among color-types was significant (p < 0.001) and accounted for 18% of the total variation. The admixture analysis revealed that assuming three ancestral populations (K = 3) provided the lowest cross-validation error (0.49). The effective population size (Ne) at five generations ago was estimated to be 99 and 50 for CCFAR and Millbank Fur Farm, respectively. The LD patterns revealed that the average r2 reduced to <0.2 at genomic distances of >20 kb and >100 kb in CCFAR and Millbank Fur Farm suggesting that the density of 120,000 and 24,000 single nucleotide polymorphisms (SNP) would provide the adequate accuracy of genomic evaluation in these populations, respectively. These results indicated that accounting for admixture is critical for designing the SNP panels for genotype-phenotype association studies of American mink.

Download Full-text

Discovery of structural deletions in breast cancer predisposition genes using whole genome sequencing data from > 2000 women of African-ancestry

Human Genetics ◽

10.1007/s00439-021-02342-8 ◽

2021 ◽

Author(s):

Zhishan Chen ◽

Xingyi Guo ◽

Jirong Long ◽

Jie Ping ◽

Bingshan Li ◽

...

Keyword(s):

Breast Cancer ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

African Ancestry ◽

Cancer Predisposition ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Breast Cancer Predisposition ◽

Predisposition Genes

Download Full-text

Local Ancestry Prediction with PyLAE

10.1101/2020.11.13.380105 ◽

2020 ◽

Author(s):

Alexander Smetanin ◽

Nikita Moshkov ◽

Tatiana V. Tatarinova

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Computational Efficiency ◽

Source Code ◽

High Density ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Local Ancestry ◽

A Genome

AbstractSummaryWe developed PyLAE - a new tool for determining local ancestry along a genome using whole-genome sequencing data or high-density genotyping experiments. PyLAE can process an arbitrarily large number of ancestral populations (with or without an informative prior). Since PyLAE does not involve estimation of many parameters, it can process thousands of genomes within a day. Computational efficiency, straightforward presentation of results, and an ease of installation makes PyLAE a useful tool to study admixed populations.Availability and implementationThe source code and installation manual are available at https://github.com/smetam/pylae.

Download Full-text

Cryptic Native American ancestry recapitulates population-specific migration and settlement of the continental United States

10.1101/333609 ◽

2018 ◽

Cited By ~ 1

Author(s):

I. King Jordan ◽

Lavanya Rishishwar ◽

Andrew B. Conley

Keyword(s):

African American ◽

Native American ◽

Mexican American ◽

African Ancestry ◽

Westward Expansion ◽

Native American Ancestry ◽

Southeastern Us ◽

Jewish Ancestry ◽

The Us ◽

African Descendants

AbstractEuropean and African descendants settled the continental US during the 17th-19th centuries, coming into contact with established Native American populations. The resulting admixture among these groups yielded a significant reservoir of cryptic Native American ancestry in the modern US population. We analyzed the patterns of Native American admixture seen for the three largest genetic ancestry groups in the US population: African American, European American, and Hispanic/Latino. The three groups show distinct Native American ancestry profiles, which are indicative of their historical patterns of migration and settlement across the country. Native American ancestry in the modern African American population does not coincide with local geography, instead forming a monophyletic group with origins in the southeastern US, consistent with the Great Migration of the early 20th century. European Americans show Native American ancestry that tracks their geographic origins across the US, indicative of ongoing contact during westward expansion, and Native American ancestry can resolve Hispanic/Latino individuals into distinct local groups formed by more recent migration from Mexico and Puerto Rico. We found an anomalous pattern of Native American ancestry from the US southwest, which most likely corresponds to the Nuevomexicano descendants of early Spanish settlers to the region. We addressed a number of controversies surrounding this population, including the extent of Sephardic Jewish ancestry. Nuevomexicanos are less admixed than nearby Mexican-American individuals, with more European and less Native American and African ancestry, and while they do show demonstrable Sephardic Jewish ancestry, the fraction is no greater than seen for other Hispanic/Latino populations.

Download Full-text

Characterizing the genetic architecture of Parkinson’s disease in Latinos

10.1101/2020.11.09.20227124 ◽

2020 ◽

Author(s):

Douglas Loesch ◽

Andrea R. V. R. Horimoto ◽

Karl Heilbron ◽

Elif Irem Sarihan ◽

Miguel Inca-Martinez ◽

...

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Native American ◽

Genetic Architecture ◽

Association Studies ◽

African Ancestry ◽

Population History ◽

Genome Wide Association Studies ◽

Association Testing ◽

Significant Locus

AbstractTo date, over 90 Parkinson’s disease (PD) risk variants have been reported from genome-wide association studies (GWAS). However, these GWAS efforts have been limited to individuals of European and East Asian ancestry. We performed the first GWAS of Latino PD patients from South America, comparing 807 cases against 690 controls followed by association testing of suggestive loci in a replication cohort of 1,234 cases and 439,522 controls. We demonstrated that SNCA plays a significant role in PD etiology in a Latino cohort and identified a suggestive locus near NRROS on chromosome 3 that appeared to be driven by Peruvian subjects. We also characterized the overlap of PD genetic architecture between Europeans and Latinos with a replication of significant variants identified by Nalls et al. in their 2019 GWAS1, finding 80% concordance in direction of effect. We then leveraged the population history of Latinos via admixture mapping, identifying a significant locus on chromosome 14 in a joint test of ancestries, driven by the Native American ancestral background, and a significant locus on chromosome 6 in our test of African ancestry, containing the genes STXBP6 and RPS6KA2, respectively. Ultimately, our work reflects the most comprehensive characterization of PD genetic architecture in Latinos to date.

Download Full-text

The impact of rare variation on gene expression across tissues

10.1101/074443 ◽

2016 ◽

Cited By ~ 10

Author(s):

Xin Li ◽

Yungil Kim ◽

Emily K. Tsang ◽

Joe R. Davis ◽

Farhan N. Damani ◽

...

Keyword(s):

Gene Expression ◽

Rare Variants ◽

Disease Risk ◽

Whole Genome Sequencing Data ◽

Personal Genome ◽

Sequencing Data ◽

Potential Health ◽

Rare Variation ◽

Functional Variants ◽

The Impact

AbstractRare genetic variants are abundant in humans yet their functional effects are often unknown and challenging to predict. The Genotype-Tissue Expression (GTEx) project provides a unique opportunity to identify the functional impact of rare variants through combined analyses of whole genomes and multi-tissue RNA-sequencing data. Here, we identify gene expression outliers, or individuals with extreme expression levels, across 44 human tissues, and characterize the contribution of rare variation to these large changes in expression. We find 58% of underexpression and 28% of overexpression outliers have underlying rare variants compared with 9% of non-outliers. Large expression effects are enriched for proximal loss-of-function, splicing, and structural variants, particularly variants near the TSS and at evolutionarily conserved sites. Known disease genes have expression outliers, underscoring that rare variants can contribute to genetic disease risk. To prioritize functional rare regulatory variants, we develop RIVER, a Bayesian approach that integrates RNA and whole genome sequencing data from the same individual. RIVER predicts functional variants significantly better than models using genomic annotations alone, and is an extensible tool for personal genome interpretation. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues with potential health consequences, and provide an integrative method for interpreting rare variants in individual genomes.

Download Full-text

An Increased Burden of Highly Active Retrotransposition Competent L1s Is Associated with Parkinson’s Disease Risk and Progression in the PPMI Cohort

International Journal of Molecular Sciences ◽

10.3390/ijms21186562 ◽

2020 ◽

Vol 21 (18) ◽

pp. 6562 ◽

Cited By ~ 1

Author(s):

Abigail L. Pfaff ◽

Vivien J. Bubb ◽

John P. Quinn ◽

Sulev Koks

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Disease Risk ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Highly Active ◽

Germline Variation ◽

Progression Markers ◽

Long Interspersed Element

Long interspersed element-1 (LINE-1/L1s) contributes 17% of the human genome with more than 1 million elements present; however, fewer than 100 of these have evidence for being retrotransposition competent (RC). In addition to those RC-L1s present in the reference genome, there are a small number of known non-reference L1 insertions that are also retrotransposition competent. L1 activity, whether through the potentially detrimental effects of their mRNA or protein expression or somatic retrotransposition events, has been linked to several neurological conditions. The polymorphic nature of both reference and non-reference RC-L1s in terms of their presence or absence will result in individuals harboring a different combination of these elements and it is currently unknown if this type of germline variation contributes to the risk of neurological disease. Here, we utilized whole-genome sequencing data from 178 healthy controls and 372 Parkinson’s disease (PD) subjects from the Parkinson’s Progression Markers Initiative (PPMI) to investigate the role of RC-L1s in PD. In the PPMI cohort, we identified 22 reference and 50 non-reference polymorphic RC-L1 loci. Focusing on 16 highly active RC-L1 loci, an increased burden of these elements (≥9) was associated with PD (OR 1.25, 95% CI 1.03–1.51, p = 0.02). In addition, we identified significant associations of progression markers of PD and the burden of highly active RC-L1s. This study has identified a novel type of genetic element associated with PD risk and disease progression.

Download Full-text