scholarly journals Ancient ancestry informative markers for identifying fine-scale ancient population structure in Eurasians

2018 ◽  
Author(s):  
Umberto Esposito ◽  
Ranajit Das ◽  
Mehdi Pirooznia ◽  
Eran Elhaik

AbstractThe rapid accumulation of ancient human genomes from various areas and time periods potentially allows the expansion of studies of biodiversity, biogeography, forensics, population history, and epidemiology into past populations. However, most ancient DNA (aDNA) data were generated through microarrays designed for modern-day populations known to misrepresent the population structure. Past studies addressed these problems using ancestry informative markers (AIMs). However, it is unclear whether AIMs derived from contemporary human genomes can capture ancient population structure and whether AIM finding methods are applicable to ancient DNA (aDNA) provided that the high missingness rates in ancient, oftentimes haploid, DNA can also distort the population structure. Here, we define ancient AIMs (aAIMs) and develop a framework to evaluate established and novel AIM-finding methods in identifying the most informative markers. We show that aAIMs identified by a novel principal component analysis (PCA)-based method outperforms all competing methods in classifying ancient individuals into populations and identifying admixed individuals. In some cases, predictions made using the aAIMs were more accurate than those made with a complete marker set. We discuss the features of the ancient Eurasian population structure and strategies to identify aAIMs. This work informs the design of population microarrays and the interpretation of aDNA results.

Genes ◽  
2018 ◽  
Vol 9 (12) ◽  
pp. 625 ◽  
Author(s):  
Umberto Esposito ◽  
Ranajit Das ◽  
Syakir Syed ◽  
Mehdi Pirooznia ◽  
Eran Elhaik

The rapid accumulation of ancient human genomes from various areas and time periods potentially enables the expansion of studies of biodiversity, biogeography, forensics, population history, and epidemiology into past populations. However, most ancient DNA (aDNA) data were generated through microarrays designed for modern-day populations, which are known to misrepresent the population structure. Past studies addressed these problems by using ancestry informative markers (AIMs). It is, thereby, unclear whether AIMs derived from contemporary human genomes can capture ancient population structures, and whether AIM-finding methods are applicable to aDNA, provided that the high missingness rates in ancient—and oftentimes haploid—DNA can also distort the population structure. Here, we define ancient AIMs (aAIMs) and develop a framework to evaluate established and novel AIM-finding methods in identifying the most informative markers. We show that aAIMs identified by a novel principal component analysis (PCA)-based method outperform all of the competing methods in classifying ancient individuals into populations and identifying admixed individuals. In some cases, predictions made using the aAIMs were more accurate than those made with a complete marker set. We discuss the features of the ancient Eurasian population structure and strategies to identify aAIMs. This work informs the design of single nucleotide polymorphism (SNP) microarrays and the interpretation of aDNA results, which enables a population-wide testing of primordialist theories.


BMC Genetics ◽  
2020 ◽  
Vol 21 (S1) ◽  
Author(s):  
Ranajit Das ◽  
Vladimir A. Ivanisenko ◽  
Anastasia A. Anashkina ◽  
Priyanka Upadhyai

Abstract Background The population structure of the Indian subcontinent is a tapestry of extraordinary diversity characterized by the amalgamation of autochthonous and immigrant ancestries and rigid enforcement of sociocultural stratification. Here we investigated the genetic origin and population history of the Kumhars, a group of people who inhabit large parts of northern India. We compared 27 previously published Kumhar SNP genotype data sampled from Uttar Pradesh in north India to various modern day and ancient populations. Results Various approaches such as Principal Component Analysis (PCA), Admixture, TreeMix concurred that Kumhars have high ASI ancestry, minimal Steppe component and high genomic proximity to the Kurchas, a small and relatively little-known population found ~ 2500 km away in Kerala, south India. Given the same, biogeographical mapping using Geographic Population Structure (GPS) assigned most Kumhar samples in areas neighboring to those where Kurchas are found in south India. Conclusions We hypothesize that the significant genomic similarity between two apparently distinct modern-day Indian populations that inhabit well separated geographical areas with no known overlapping history or links, likely alludes to their common origin during or post the decline of the Indus Valley Civilization (estimated by ALDER). Thereafter, while they dispersed towards opposite ends of the Indian subcontinent, their genomic integrity and likeness remained preserved due to endogamous social practices. Our findings illuminate the genomic history of two Indian populations, allowing a glimpse into one or few of numerous of human migrations that likely occurred across the Indian subcontinent and contributed to shape its varied and vibrant evolutionary past.


2017 ◽  
Author(s):  
Kridsadakorn Chaichoompu ◽  
Fentaw Abegaz Yazew ◽  
Sissades Tongsima ◽  
Philip James Shaw ◽  
Anavaj Sakuntabhai ◽  
...  

AbstractBackgroundResolving population genetic structure is challenging, especially when dealing with closely related or geographically confined populations. Although Principal Component Analysis (PCA)-based methods and genomic variation with single nucleotide polymorphisms (SNPs) are widely used to describe shared genetic ancestry, improvements can be made especially when fine-scale population structure is the target.ResultsThis work presents an R package called IPCAPS, which uses SNP information for resolving possibly fine-scale population structure. The IPCAPS routines are built on the iterative pruning Principal Component Analysis (ipPCA) framework that systematically assigns individuals to genetically similar subgroups. In each iteration, our tool is able to detect and eliminate outliers, hereby avoiding severe misclassification errors.ConclusionsIPCAPS supports different measurement scales for variables used to identify substructure. Hence, panels of gene expression and methylation data can be accommodated as well. The tool can also be applied in patient sub-phenotyping contexts. IPCAPS is developed in R and is freely available from bio3.giga.ulg.ac.be/ipcaps


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Elena Arciero ◽  
Sufyan A. Dogra ◽  
Daniel S. Malawsky ◽  
Massimo Mezzavilla ◽  
Theofanis Tsismentzoglou ◽  
...  

AbstractPrevious genetic and public health research in the Pakistani population has focused on the role of consanguinity in increasing recessive disease risk, but little is known about its recent population history or the effects of endogamy. Here, we investigate fine-scale population structure, history and consanguinity patterns using genotype chip data from 2,200 British Pakistanis. We reveal strong recent population structure driven by the biraderi social stratification system. We find that all subgroups have had low recent effective population sizes (Ne), with some showing a decrease 15‒20 generations ago that has resulted in extensive identity-by-descent sharing and homozygosity, increasing the risk of recessive disorders. Our results from two orthogonal methods (one using machine learning and the other coalescent-based) suggest that the detailed reporting of parental relatedness for mothers in the cohort under-represents the true levels of consanguinity. These results demonstrate the impact of cultural practices on population structure and genomic diversity in Pakistanis, and have important implications for medical genetic studies.


2018 ◽  
Author(s):  
Clare Bycroft ◽  
Ceres Fernandez-Rozadilla ◽  
Clara Ruiz-Ponte ◽  
Inés Quintela-García ◽  
Ángel Carracedo ◽  
...  

Genetic differences within or between human populations (population structure) has been studied using a variety of approaches over many years. Recently there has been an increasing focus on studying genetic differentiation at fine geographic scales, such as within countries. Identifying such structure allows the study of recent population history, and identifies the potential for confounding in association studies, particularly when testing rare, often recently arisen variants. The Iberian Peninsula is linguistically diverse, has a complex demographic history, and is unique among European regions in having a centuries-long period of Muslim rule. Previous genetic studies of Spain have examined either a small fraction of the genome or only a few Spanish regions. Thus, the overall pattern of fine-scale population structure within Spain remains uncharacterised. Here we analyse genome-wide genotyping array data for 1,413 Spanish individuals sampled from all regions of Spain. We identify extensive fine-scale structure, down to unprecedented scales, smaller than 10 Km in some places. We observe a major axis of genetic differentiation that runs from east to west of the peninsula. In contrast, we observe remarkable genetic similarity in the north-south direction, and evidence of historical north-south population movement. Finally, without making particular prior assumptions about source populations, we show that modern Spanish people have regionally varying fractions of ancestry from a group most similar to modern north Moroccans. The north African ancestry results from an admixture event, which we date to 860 - 1120 CE, corresponding to the early half of Muslim rule. Our results indicate that it is possible to discern clear genetic impacts of the Muslim conquest and population movements associated with the subsequent Reconquista.


2017 ◽  
Author(s):  
Caitlin Uren ◽  
Minju Kim ◽  
Alicia R. Martin ◽  
Dean Bobo ◽  
Christopher R. Gignoux ◽  
...  

AbstractRecent genetic studies have established that the KhoeSan populations of southern Africa are distinct from all other African populations and have remained largely isolated during human prehistory until about 2,000 years ago. Dozens of different KhoeSan groups exist, belonging to three different language families, but very little is known about their population history. We examine new genome-wide polymorphism data and whole mitochondrial genomes for more than one hundred South Africans from the ≠Khomani San and Nama populations of the Northern Cape, analyzed in conjunction with 19 additional southern African populations. Our analyses reveal fine-scale population structure in and around the Kalahari Desert. Surprisingly, this structure does not always correspond to linguistic or subsistence categories as previously suggested, but rather reflects the role of geographic barriers and the ecology of the greater Kalahari Basin. Regardless of subsistence strategy, the indigenous Khoe-speaking Nama pastoralists and the N|u-speaking ≠Khomani (formerly hunter-gatherers) share ancestry with other Khoe-speaking forager populations that form a rim around the Kalahari Desert. We reconstruct earlier migration patterns and estimate that the southern Kalahari populations were among the last to experience gene flow from Bantu-speakers, approximately 14 generations ago. We conclude that local adoption of pastoralism, at least by the Nama, appears to have been primarily a cultural process with limited genetic impact from eastern Africa.Data depositionData files are freely available on the Henn Lab website: http://ecoevo.stonybrook.edu/hennlab/data-software/SummaryDistinct, spatially organized ancestries demonstrate fine-scale population structure in southern Africa, implying a more complex history of the KhoeSan than previously thought. Southern KhoeSan ancestry in the Nama and ≠Khomani is shared in a rim around the Kalahari Desert. We hypothesize that there was recent migration of pastoralists from East Africa into southern Africa, independent of the Bantu-expansion, but the spread of pastoralism within southern Africa occurred largely by cultural diffusion.


2020 ◽  
Author(s):  
Elena Arciero ◽  
Sufyan A. Dogra ◽  
Massimo Mezzavilla ◽  
Theofanis Tsismentzoglou ◽  
Qin Qin Huang ◽  
...  

AbstractPrevious genetic and public health research in the Pakistani population has focused on the role of consanguinity in increasing recessive disease risk, but little is known about its recent population history or the effects of endogamy. Here, we investigate fine-scale population structure, history and consanguinity patterns using genetic and questionnaire data from >4,000 British Pakistani individuals, mostly with roots in Azad Kashmir and Punjab. We reveal strong recent population structure driven by the biraderi social stratification system. We find that all subgroups have had low effective population sizes (Ne) over the last 50 generations, with some showing a decrease in Ne 15-20 generations ago that has resulted in extensive identity-by-descent sharing and increased homozygosity. Using new theory, we show that the footprint of regions of homozygosity in the two largest subgroups is about twice that expected naively based on the self-reported consanguinity rates and the inferred historical Ne trajectory. These results demonstrate the impact of the cultural practices of endogamy and consanguinity on population structure and genomic diversity in British Pakistanis, and have important implications for medical genetic studies.


2016 ◽  
Author(s):  
Caitlin Uren ◽  
Minju Kim ◽  
Alicia R Martin ◽  
Dean Bobo ◽  
Christopher R Gignoux ◽  
...  

Recent genetic studies have established that the KhoeSan populations of southern Africa are distinct from all other African populations and have remained largely isolated during human prehistory until about 2,000 years ago. Dozens of different KhoeSan groups exist, belonging to three different language families, but very little is known about population history within southern Africa. We examine new genome-wide polymorphism data and whole mitochondrial genomes for more than one hundred South Africans from the ≠Khomani San and Nama populations of the Northern Cape, analyzed in conjunction with 19 additional southern African populations. Our analyses reveal fine-scale population structure in and around the Kalahari Desert. Surprisingly, this structure does not always correspond to linguistic or subsistence categories as previously suggested, but rather reflects the role of geographic barriers and the ecology of the greater Kalahari Basin. Regardless of subsistence strategy, the indigenous Khoe-speaking Nama pastoralists and the N|u-speaking ≠Khomani (formerly hunter-gatherers) share recent ancestry with other Khoe-speaking forager populations that forms a rim around the Kalahari Desert. We reconstruct earlier migration patterns and estimate that the southern Kalahari populations were among the last to experience gene flow from Bantu-speakers, approximately 14 generations ago. We conclude that local adoption of pastoralism, at least by the Nama, appears to have been primarily a cultural process with limited impact from eastern African genetic diffusion.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Olivier François ◽  
Flora Jay

Abstract The recent years have seen a growing number of studies investigating evolutionary questions using ancient DNA. To address these questions, one of the most frequently-used method is principal component analysis (PCA). When PCA is applied to temporal samples, the sample dates are, however, ignored during analysis, leading to imperfect representations of samples in PC plots. Here, we present a factor analysis (FA) method in which individual scores are corrected for the effect of allele frequency drift over time. We obtained exact solutions for the estimates of corrected factors, and we provided a fast algorithm for their computation. Using computer simulations and ancient European samples, we compared geometric representations obtained from FA with PCA and with ancestry estimation programs. In admixture analyses, FA estimates agreed with tree-based statistics, and they were more accurate than those obtained from PCA projections and from ancestry estimation programs. A great advantage of FA over existing approaches is to improve descriptive analyses of ancient DNA samples without requiring inclusion of outgroup or present-day samples.


Sign in / Sign up

Export Citation Format

Share Document