Ancient ancestry informative markers for identifying fine-scale ancient population structure in Eurasians

Mapping Intimacies ◽

10.1101/333690 ◽

2018 ◽

Cited By ~ 1

Author(s):

Umberto Esposito ◽

Ranajit Das ◽

Mehdi Pirooznia ◽

Eran Elhaik

Keyword(s):

Population Structure ◽

Ancient Dna ◽

Principal Component ◽

Population History ◽

Fine Scale ◽

Ancestry Informative Markers ◽

Ancient Population ◽

Human Genomes ◽

Rapid Accumulation ◽

Time Periods

AbstractThe rapid accumulation of ancient human genomes from various areas and time periods potentially allows the expansion of studies of biodiversity, biogeography, forensics, population history, and epidemiology into past populations. However, most ancient DNA (aDNA) data were generated through microarrays designed for modern-day populations known to misrepresent the population structure. Past studies addressed these problems using ancestry informative markers (AIMs). However, it is unclear whether AIMs derived from contemporary human genomes can capture ancient population structure and whether AIM finding methods are applicable to ancient DNA (aDNA) provided that the high missingness rates in ancient, oftentimes haploid, DNA can also distort the population structure. Here, we define ancient AIMs (aAIMs) and develop a framework to evaluate established and novel AIM-finding methods in identifying the most informative markers. We show that aAIMs identified by a novel principal component analysis (PCA)-based method outperforms all competing methods in classifying ancient individuals into populations and identifying admixed individuals. In some cases, predictions made using the aAIMs were more accurate than those made with a complete marker set. We discuss the features of the ancient Eurasian population structure and strategies to identify aAIMs. This work informs the design of population microarrays and the interpretation of aDNA results.

Download Full-text

Ancient Ancestry Informative Markers for Identifying Fine-Scale Ancient Population Structure in Eurasians

Genes ◽

10.3390/genes9120625 ◽

2018 ◽

Vol 9 (12) ◽

pp. 625 ◽

Cited By ~ 3

Author(s):

Umberto Esposito ◽

Ranajit Das ◽

Syakir Syed ◽

Mehdi Pirooznia ◽

Eran Elhaik

Keyword(s):

Population Structure ◽

Principal Component ◽

Population History ◽

Ancestry Informative Markers ◽

Single Nucleotide ◽

Ancient Population ◽

Human Genomes ◽

Snp Microarrays ◽

Population Structures ◽

Wide Testing

The rapid accumulation of ancient human genomes from various areas and time periods potentially enables the expansion of studies of biodiversity, biogeography, forensics, population history, and epidemiology into past populations. However, most ancient DNA (aDNA) data were generated through microarrays designed for modern-day populations, which are known to misrepresent the population structure. Past studies addressed these problems by using ancestry informative markers (AIMs). It is, thereby, unclear whether AIMs derived from contemporary human genomes can capture ancient population structures, and whether AIM-finding methods are applicable to aDNA, provided that the high missingness rates in ancient—and oftentimes haploid—DNA can also distort the population structure. Here, we define ancient AIMs (aAIMs) and develop a framework to evaluate established and novel AIM-finding methods in identifying the most informative markers. We show that aAIMs identified by a novel principal component analysis (PCA)-based method outperform all of the competing methods in classifying ancient individuals into populations and identifying admixed individuals. In some cases, predictions made using the aAIMs were more accurate than those made with a complete marker set. We discuss the features of the ancient Eurasian population structure and strategies to identify aAIMs. This work informs the design of single nucleotide polymorphism (SNP) microarrays and the interpretation of aDNA results, which enables a population-wide testing of primordialist theories.

Download Full-text

The story of the lost twins: decoding the genetic identities of the Kumhar and Kurcha populations from the Indian subcontinent

BMC Genetics ◽

10.1186/s12863-020-00919-2 ◽

2020 ◽

Vol 21 (S1) ◽

Author(s):

Ranajit Das ◽

Vladimir A. Ivanisenko ◽

Anastasia A. Anashkina ◽

Priyanka Upadhyai

Keyword(s):

Population Structure ◽

South India ◽

Indian Subcontinent ◽

Uttar Pradesh ◽

Principal Component ◽

Population History ◽

North India ◽

Geographic Population ◽

Indian Populations ◽

History Of

Abstract Background The population structure of the Indian subcontinent is a tapestry of extraordinary diversity characterized by the amalgamation of autochthonous and immigrant ancestries and rigid enforcement of sociocultural stratification. Here we investigated the genetic origin and population history of the Kumhars, a group of people who inhabit large parts of northern India. We compared 27 previously published Kumhar SNP genotype data sampled from Uttar Pradesh in north India to various modern day and ancient populations. Results Various approaches such as Principal Component Analysis (PCA), Admixture, TreeMix concurred that Kumhars have high ASI ancestry, minimal Steppe component and high genomic proximity to the Kurchas, a small and relatively little-known population found ~ 2500 km away in Kerala, south India. Given the same, biogeographical mapping using Geographic Population Structure (GPS) assigned most Kumhar samples in areas neighboring to those where Kurchas are found in south India. Conclusions We hypothesize that the significant genomic similarity between two apparently distinct modern-day Indian populations that inhabit well separated geographical areas with no known overlapping history or links, likely alludes to their common origin during or post the decline of the Indus Valley Civilization (estimated by ALDER). Thereafter, while they dispersed towards opposite ends of the Indian subcontinent, their genomic integrity and likeness remained preserved due to endogamous social practices. Our findings illuminate the genomic history of two Indian populations, allowing a glimpse into one or few of numerous of human migrations that likely occurred across the Indian subcontinent and contributed to shape its varied and vibrant evolutionary past.

Download Full-text

IPCAPS: an R package for iterative pruning to capture population structure

10.1101/186874 ◽

2017 ◽

Cited By ~ 3

Author(s):

Kridsadakorn Chaichoompu ◽

Fentaw Abegaz Yazew ◽

Sissades Tongsima ◽

Philip James Shaw ◽

Anavaj Sakuntabhai ◽

...

Keyword(s):

Principal Component Analysis ◽

Population Structure ◽

Principal Component ◽

R Package ◽

Component Analysis ◽

Genomic Variation ◽

Fine Scale ◽

Nucleotide Polymorphisms ◽

Measurement Scales ◽

Scale Population

AbstractBackgroundResolving population genetic structure is challenging, especially when dealing with closely related or geographically confined populations. Although Principal Component Analysis (PCA)-based methods and genomic variation with single nucleotide polymorphisms (SNPs) are widely used to describe shared genetic ancestry, improvements can be made especially when fine-scale population structure is the target.ResultsThis work presents an R package called IPCAPS, which uses SNP information for resolving possibly fine-scale population structure. The IPCAPS routines are built on the iterative pruning Principal Component Analysis (ipPCA) framework that systematically assigns individuals to genetically similar subgroups. In each iteration, our tool is able to detect and eliminate outliers, hereby avoiding severe misclassification errors.ConclusionsIPCAPS supports different measurement scales for variables used to identify substructure. Hence, panels of gene expression and methylation data can be accommodated as well. The tool can also be applied in patient sub-phenotyping contexts. IPCAPS is developed in R and is freely available from bio3.giga.ulg.ac.be/ipcaps

Download Full-text

Fine-scale population structure and demographic history of British Pakistanis

Nature Communications ◽

10.1038/s41467-021-27394-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Elena Arciero ◽

Sufyan A. Dogra ◽

Daniel S. Malawsky ◽

Massimo Mezzavilla ◽

Theofanis Tsismentzoglou ◽

...

Keyword(s):

Population Structure ◽

Disease Risk ◽

Demographic History ◽

Population History ◽

Genomic Diversity ◽

Fine Scale ◽

Effective Population ◽

Scale Population ◽

The Impact ◽

British Pakistanis

AbstractPrevious genetic and public health research in the Pakistani population has focused on the role of consanguinity in increasing recessive disease risk, but little is known about its recent population history or the effects of endogamy. Here, we investigate fine-scale population structure, history and consanguinity patterns using genotype chip data from 2,200 British Pakistanis. We reveal strong recent population structure driven by the biraderi social stratification system. We find that all subgroups have had low recent effective population sizes (Ne), with some showing a decrease 15‒20 generations ago that has resulted in extensive identity-by-descent sharing and homozygosity, increasing the risk of recessive disorders. Our results from two orthogonal methods (one using machine learning and the other coalescent-based) suggest that the detailed reporting of parental relatedness for mothers in the cohort under-represents the true levels of consanguinity. These results demonstrate the impact of cultural practices on population structure and genomic diversity in Pakistanis, and have important implications for medical genetic studies.

Download Full-text

Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula

10.1101/250191 ◽

2018 ◽

Cited By ~ 2

Author(s):

Clare Bycroft ◽

Ceres Fernandez-Rozadilla ◽

Clara Ruiz-Ponte ◽

Inés Quintela-García ◽

Ángel Carracedo ◽

...

Keyword(s):

Population Structure ◽

Genetic Differentiation ◽

Iberian Peninsula ◽

Association Studies ◽

Demographic History ◽

African Ancestry ◽

Population History ◽

Human Populations ◽

Fine Scale ◽

The North

Genetic differences within or between human populations (population structure) has been studied using a variety of approaches over many years. Recently there has been an increasing focus on studying genetic differentiation at fine geographic scales, such as within countries. Identifying such structure allows the study of recent population history, and identifies the potential for confounding in association studies, particularly when testing rare, often recently arisen variants. The Iberian Peninsula is linguistically diverse, has a complex demographic history, and is unique among European regions in having a centuries-long period of Muslim rule. Previous genetic studies of Spain have examined either a small fraction of the genome or only a few Spanish regions. Thus, the overall pattern of fine-scale population structure within Spain remains uncharacterised. Here we analyse genome-wide genotyping array data for 1,413 Spanish individuals sampled from all regions of Spain. We identify extensive fine-scale structure, down to unprecedented scales, smaller than 10 Km in some places. We observe a major axis of genetic differentiation that runs from east to west of the peninsula. In contrast, we observe remarkable genetic similarity in the north-south direction, and evidence of historical north-south population movement. Finally, without making particular prior assumptions about source populations, we show that modern Spanish people have regionally varying fractions of ancestry from a group most similar to modern north Moroccans. The north African ancestry results from an admixture event, which we date to 860 - 1120 CE, corresponding to the early half of Muslim rule. Our results indicate that it is possible to discern clear genetic impacts of the Muslim conquest and population movements associated with the subsequent Reconquista.

Download Full-text

Fine-scale human population structure in southern Africa reflects ecogeographic boundaries

10.1101/098095 ◽

2017 ◽

Author(s):

Caitlin Uren ◽

Minju Kim ◽

Alicia R. Martin ◽

Dean Bobo ◽

Christopher R. Gignoux ◽

...

Keyword(s):

Population Structure ◽

Southern Africa ◽

Population History ◽

Eastern Africa ◽

Fine Scale ◽

Kalahari Desert ◽

African Populations ◽

Scale Population ◽

Genetic Impact ◽

Northern Cape

AbstractRecent genetic studies have established that the KhoeSan populations of southern Africa are distinct from all other African populations and have remained largely isolated during human prehistory until about 2,000 years ago. Dozens of different KhoeSan groups exist, belonging to three different language families, but very little is known about their population history. We examine new genome-wide polymorphism data and whole mitochondrial genomes for more than one hundred South Africans from the ≠Khomani San and Nama populations of the Northern Cape, analyzed in conjunction with 19 additional southern African populations. Our analyses reveal fine-scale population structure in and around the Kalahari Desert. Surprisingly, this structure does not always correspond to linguistic or subsistence categories as previously suggested, but rather reflects the role of geographic barriers and the ecology of the greater Kalahari Basin. Regardless of subsistence strategy, the indigenous Khoe-speaking Nama pastoralists and the N|u-speaking ≠Khomani (formerly hunter-gatherers) share ancestry with other Khoe-speaking forager populations that form a rim around the Kalahari Desert. We reconstruct earlier migration patterns and estimate that the southern Kalahari populations were among the last to experience gene flow from Bantu-speakers, approximately 14 generations ago. We conclude that local adoption of pastoralism, at least by the Nama, appears to have been primarily a cultural process with limited genetic impact from eastern Africa.Data depositionData files are freely available on the Henn Lab website: http://ecoevo.stonybrook.edu/hennlab/data-software/SummaryDistinct, spatially organized ancestries demonstrate fine-scale population structure in southern Africa, implying a more complex history of the KhoeSan than previously thought. Southern KhoeSan ancestry in the Nama and ≠Khomani is shared in a rim around the Kalahari Desert. We hypothesize that there was recent migration of pastoralists from East Africa into southern Africa, independent of the Bantu-expansion, but the spread of pastoralism within southern Africa occurred largely by cultural diffusion.

Download Full-text

Fine-scale population structure and demographic history of British Pakistanis

10.1101/2020.09.02.279190 ◽

2020 ◽

Author(s):

Elena Arciero ◽

Sufyan A. Dogra ◽

Massimo Mezzavilla ◽

Theofanis Tsismentzoglou ◽

Qin Qin Huang ◽

...

Keyword(s):

Population Structure ◽

Disease Risk ◽

Demographic History ◽

Population History ◽

Genomic Diversity ◽

Fine Scale ◽

Effective Population ◽

Scale Population ◽

The Impact ◽

British Pakistanis

AbstractPrevious genetic and public health research in the Pakistani population has focused on the role of consanguinity in increasing recessive disease risk, but little is known about its recent population history or the effects of endogamy. Here, we investigate fine-scale population structure, history and consanguinity patterns using genetic and questionnaire data from >4,000 British Pakistani individuals, mostly with roots in Azad Kashmir and Punjab. We reveal strong recent population structure driven by the biraderi social stratification system. We find that all subgroups have had low effective population sizes (Ne) over the last 50 generations, with some showing a decrease in Ne 15-20 generations ago that has resulted in extensive identity-by-descent sharing and increased homozygosity. Using new theory, we show that the footprint of regions of homozygosity in the two largest subgroups is about twice that expected naively based on the self-reported consanguinity rates and the inferred historical Ne trajectory. These results demonstrate the impact of the cultural practices of endogamy and consanguinity on population structure and genomic diversity in British Pakistanis, and have important implications for medical genetic studies.

Download Full-text

Fine-scale human population structure in southern Africa reflects ecological boundaries

10.1101/038729 ◽

2016 ◽

Cited By ~ 2

Author(s):

Caitlin Uren ◽

Minju Kim ◽

Alicia R Martin ◽

Dean Bobo ◽

Christopher R Gignoux ◽

...

Keyword(s):

Population Structure ◽

Southern Africa ◽

Population History ◽

Fine Scale ◽

Hunter Gatherers ◽

Kalahari Desert ◽

Genome Wide ◽

African Populations ◽

Scale Population ◽

Northern Cape

Recent genetic studies have established that the KhoeSan populations of southern Africa are distinct from all other African populations and have remained largely isolated during human prehistory until about 2,000 years ago. Dozens of different KhoeSan groups exist, belonging to three different language families, but very little is known about population history within southern Africa. We examine new genome-wide polymorphism data and whole mitochondrial genomes for more than one hundred South Africans from the ≠Khomani San and Nama populations of the Northern Cape, analyzed in conjunction with 19 additional southern African populations. Our analyses reveal fine-scale population structure in and around the Kalahari Desert. Surprisingly, this structure does not always correspond to linguistic or subsistence categories as previously suggested, but rather reflects the role of geographic barriers and the ecology of the greater Kalahari Basin. Regardless of subsistence strategy, the indigenous Khoe-speaking Nama pastoralists and the N|u-speaking ≠Khomani (formerly hunter-gatherers) share recent ancestry with other Khoe-speaking forager populations that forms a rim around the Kalahari Desert. We reconstruct earlier migration patterns and estimate that the southern Kalahari populations were among the last to experience gene flow from Bantu-speakers, approximately 14 generations ago. We conclude that local adoption of pastoralism, at least by the Nama, appears to have been primarily a cultural process with limited impact from eastern African genetic diffusion.

Download Full-text

Inferring the ancient population structure of the vulnerable albatross Phoebastria albatrus, combining ancient DNA, stable isotope, and morphometric analyses of archaeological samples

Conservation Genetics ◽

10.1007/s10592-011-0270-5 ◽

2011 ◽

Vol 13 (1) ◽

pp. 143-151 ◽

Cited By ~ 12

Author(s):

Masaki Eda ◽

Hiroko Koike ◽

Masaki Kuro-o ◽

Shozo Mihara ◽

Hiroshi Hasegawa ◽

...

Keyword(s):

Population Structure ◽

Stable Isotope ◽

Ancient Dna ◽

Ancient Population ◽

Morphometric Analyses

Download Full-text

Factor analysis of ancient population genomic samples

Nature Communications ◽

10.1038/s41467-020-18335-6 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 2

Author(s):

Olivier François ◽

Flora Jay

Keyword(s):

Principal Component Analysis ◽

Factor Analysis ◽

Ancient Dna ◽

Principal Component ◽

Ancient Population ◽

Population Genomic ◽

Ancestry Estimation ◽

Geometric Representations ◽

Individual Scores ◽

Over Time

Abstract The recent years have seen a growing number of studies investigating evolutionary questions using ancient DNA. To address these questions, one of the most frequently-used method is principal component analysis (PCA). When PCA is applied to temporal samples, the sample dates are, however, ignored during analysis, leading to imperfect representations of samples in PC plots. Here, we present a factor analysis (FA) method in which individual scores are corrected for the effect of allele frequency drift over time. We obtained exact solutions for the estimates of corrected factors, and we provided a fast algorithm for their computation. Using computer simulations and ancient European samples, we compared geometric representations obtained from FA with PCA and with ancestry estimation programs. In admixture analyses, FA estimates agreed with tree-based statistics, and they were more accurate than those obtained from PCA projections and from ancestry estimation programs. A great advantage of FA over existing approaches is to improve descriptive analyses of ancient DNA samples without requiring inclusion of outgroup or present-day samples.

Download Full-text