Full Bayesian Comparative Phylogeography from Genomic Data

Mapping Intimacies ◽

10.1101/324525 ◽

2018 ◽

Cited By ~ 1

Author(s):

Jamie R. Oaks

Keyword(s):

Bayesian Approach ◽

Genomic Data ◽

The Philippines ◽

Population History ◽

New Method ◽

Computational Time ◽

Nucleotide Polymorphisms ◽

Gene Trees ◽

Genomic Libraries ◽

Approximate Likelihood

AbstractA challenge to understanding biological diversification is accounting for community-scale processes that cause multiple, co-distributed lineages to co-speciate. Such processes predict non-independent, temporally clustered divergences across taxa. Approximate-likelihood Bayesian computation (ABC) approaches to inferring such patterns from comparative genetic data are very sensitive to prior assumptions and often biased toward estimating shared divergences. We introduce a full-likelihood Bayesian approach, ecoevolity, which takes full advantage of information in genomic data. By analytically integrating over gene trees, we are able to directly calculate the likelihood of the population history from genomic data, and efficiently sample the model-averaged posterior via Markov chain Monte Carlo algorithms. Using simulations, we find that the new method is much more accurate and precise at estimating the number and timing of divergence events across pairs of populations than existing approximate-likelihood methods. Our full Bayesian approach also requires several orders of magnitude less computational time than existing ABC approaches. We find that despite assuming unlinked characters (e.g., unlinked single-nucleotide polymorphisms), the new method performs better if this assumption is violated in order to retain the constant characters of whole linked loci. In fact, retaining constant characters allows the new method to robustly estimate the correct number of divergence events with high posterior probability in the face of character-acquisition biases, which commonly plague loci assembled from reduced-representation genomic libraries. We apply our method to genomic data from four pairs of insular populations of Gekko lizards from the Philippines that are not expected to have co-diverged. Despite all four pairs diverging very recently, our method strongly supports that they diverged independently, and these results are robust to very disparate prior assumptions.

Download Full-text

Population Genomics of American Mink Using Whole Genome Sequencing Data

Genes ◽

10.3390/genes12020258 ◽

2021 ◽

Vol 12 (2) ◽

pp. 258

Author(s):

Karim Karimi ◽

Duy Ngoc Do ◽

Mehdi Sargolzaei ◽

Younes Miar

Keyword(s):

Population Genomics ◽

Association Studies ◽

American Mink ◽

Population History ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Effective Population ◽

Cross Validation Error

Characterizing the genetic structure and population history can facilitate the development of genomic breeding strategies for the American mink. In this study, we used the whole genome sequences of 100 mink from the Canadian Centre for Fur Animal Research (CCFAR) at the Dalhousie Faculty of Agriculture (Truro, NS, Canada) and Millbank Fur Farm (Rockwood, ON, Canada) to investigate their population structure, genetic diversity and linkage disequilibrium (LD) patterns. Analysis of molecular variance (AMOVA) indicated that the variation among color-types was significant (p < 0.001) and accounted for 18% of the total variation. The admixture analysis revealed that assuming three ancestral populations (K = 3) provided the lowest cross-validation error (0.49). The effective population size (Ne) at five generations ago was estimated to be 99 and 50 for CCFAR and Millbank Fur Farm, respectively. The LD patterns revealed that the average r2 reduced to <0.2 at genomic distances of >20 kb and >100 kb in CCFAR and Millbank Fur Farm suggesting that the density of 120,000 and 24,000 single nucleotide polymorphisms (SNP) would provide the adequate accuracy of genomic evaluation in these populations, respectively. These results indicated that accounting for admixture is critical for designing the SNP panels for genotype-phenotype association studies of American mink.

Download Full-text

Accurate age estimation in small-scale societies

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1619583114 ◽

2017 ◽

Vol 114 (31) ◽

pp. 8205-8210 ◽

Cited By ~ 8

Author(s):

Yoan Diekmann ◽

Daniel Smith ◽

Pascale Gerbault ◽

Mark Dyble ◽

Abigail E. Page ◽

...

Keyword(s):

Life History ◽

Bayesian Approach ◽

Age Estimation ◽

Human Life ◽

The Philippines ◽

Monte Carlo Algorithm ◽

Small Scale ◽

Hunter Gatherers ◽

Cultural Life ◽

Ranking Order

Precise estimation of age is essential in evolutionary anthropology, especially to infer population age structures and understand the evolution of human life history diversity. However, in small-scale societies, such as hunter-gatherer populations, time is often not referred to in calendar years, and accurate age estimation remains a challenge. We address this issue by proposing a Bayesian approach that accounts for age uncertainty inherent to fieldwork data. We developed a Gibbs sampling Markov chain Monte Carlo algorithm that produces posterior distributions of ages for each individual, based on a ranking order of individuals from youngest to oldest and age ranges for each individual. We first validate our method on 65 Agta foragers from the Philippines with known ages, and show that our method generates age estimations that are superior to previously published regression-based approaches. We then use data on 587 Agta collected during recent fieldwork to demonstrate how multiple partial age ranks coming from multiple camps of hunter-gatherers can be integrated. Finally, we exemplify how the distributions generated by our method can be used to estimate important demographic parameters in small-scale societies: here, age-specific fertility patterns. Our flexible Bayesian approach will be especially useful to improve cross-cultural life history datasets for small-scale societies for which reliable age records are difficult to acquire.

Download Full-text

Nesting Monte Carlo for high-dimensional non-linear PDEs

Monte Carlo Methods and Applications ◽

10.1515/mcma-2018-2020 ◽

2018 ◽

Vol 24 (4) ◽

pp. 225-247 ◽

Cited By ~ 4

Author(s):

Xavier Warin

Keyword(s):

Monte Carlo ◽

Deep Learning ◽

High Dimension ◽

Analytical Solutions ◽

New Method ◽

Computational Time ◽

High Dimensional ◽

Learning Methods ◽

Lipschitz Constants ◽

Linear Pdes

Abstract A new method based on nesting Monte Carlo is developed to solve high-dimensional semi-linear PDEs. Depending on the type of non-linearity, different schemes are proposed and theoretically studied: variance error are given and it is shown that the bias of the schemes can be controlled. The limitation of the method is that the maturity or the Lipschitz constants of the non-linearity should not be too high in order to avoid an explosion of the computational time. Many numerical results are given in high dimension for cases where analytical solutions are available or where some solutions can be computed by deep-learning methods.

Download Full-text

Hybrid Method for Detecting Duplicate Image by Using Image Retrieval Technique in Data Mining

INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTER RESEARCH ◽

10.47191/ijmcr/v9i4.02 ◽

2021 ◽

Vol 09 (04) ◽

Author(s):

Dr. S. Thavamani ◽

Keyword(s):

Data Mining ◽

Image Retrieval ◽

Hybrid Method ◽

Video Retrieval ◽

New Method ◽

Computational Time ◽

Storage Space ◽

Copy Detection ◽

Retrieval Technique ◽

Improved Accuracy

Duplicated images cause several problems in online sites, so these demand special attention. To address the disadvantages of frames copy detection, the Hybrid Method of Detecting Duplicate Image by Using Image Retrieval Technique in Data Mining was proposed. We use the new method of eliminating duplicates in this example. To address the disadvantages of frames copy detection, the Hybrid Method of Detecting Duplicate Image by Using Image Retrieval Technique in Data Mining was proposed. The new method of eliminating duplicates in this example has proposed. Using this method, you can get rid of frames that aren't relevant to the video. This makes for more precise and faster video retrieval with fewer duplicates. As a back end, this technique is implemented in C# and SQL. The findings are put to the test and compared to the current SIFT process. The results showed that the output improved accuracy while reducing storage space, computational time, and memory use.

Download Full-text

A Bayesian approach for genotyping single nucleotide polymorphisms (SNPs)

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2018.094890 ◽

2018 ◽

Vol 20 (4) ◽

pp. 341

Author(s):

Ali Sheikhi ◽

David Ramsey

Keyword(s):

Single Nucleotide Polymorphisms ◽

Bayesian Approach ◽

Nucleotide Polymorphisms ◽

Single Nucleotide

Download Full-text

The utility of single nucleotide polymorphisms in inferences of population history

Trends in Ecology & Evolution ◽

10.1016/s0169-5347(03)00018-1 ◽

2003 ◽

Vol 18 (5) ◽

pp. 249-256 ◽

Cited By ~ 393

Author(s):

Robb T. Brumfield ◽

Peter Beerli ◽

Deborah A. Nickerson ◽

Scott V. Edwards

Keyword(s):

Single Nucleotide Polymorphisms ◽

Population History ◽

Nucleotide Polymorphisms ◽

Single Nucleotide

Download Full-text

Genome-wide single nucleotide polymorphisms reveal population history and adaptive divergence in wild guppies

Molecular Ecology ◽

10.1111/j.1365-294x.2010.04528.x ◽

2010 ◽

Vol 19 (5) ◽

pp. 968-984 ◽

Cited By ~ 103

Author(s):

EVA-MARIA WILLING ◽

PAUL BENTZEN ◽

COCK van OOSTERHOUT ◽

MARGARETE HOFFMANN ◽

JOANNE CABLE ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Population History ◽

Adaptive Divergence ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide

Download Full-text

EFFECTIVE ALGORITHMS FOR TAG SNP SELECTION

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720005001521 ◽

2005 ◽

Vol 03 (05) ◽

pp. 1089-1106 ◽

Cited By ~ 2

Author(s):

TIE-FEI LIU ◽

WING-KIN SUNG ◽

YI LI ◽

JIAN-JUN LIU ◽

ANKUSH MITTAL ◽

...

Keyword(s):

Search Algorithm ◽

Association Studies ◽

Strong Association ◽

Genetic Association Studies ◽

Approximate Solutions ◽

Computational Time ◽

Nucleotide Polymorphisms ◽

Tag Snps ◽

Tag Snp ◽

Tag Snp Selection

Single nucleotide polymorphisms (SNPs), due to their abundance and low mutation rate, are very useful genetic markers for genetic association studies. However, the current genotyping technology cannot afford to genotype all common SNPs in all the genes. By making use of linkage disequilibrium, we can reduce the experiment cost by genotyping a subset of SNPs, called Tag SNPs, which have a strong association with the ungenotyped SNPs, while are as independent from each other as possible. The problem of selecting Tag SNPs is NP-complete; when there are large number of SNPs, in order to avoid extremely long computational time, most of the existing Tag SNP selection methods first partition the SNPs into blocks based on certain block definitions, then Tag SNPs are selected in each block by brute-force search. The size of the Tag SNP set obtained in this way may usually be reduced further due to the inter-dependency among blocks. This paper proposes two algorithms, TSSA and TSSD, to tackle the block-independent Tag SNP selection problem. TSSA is based on A* search algorithm, and TSSD is a heuristic algorithm. Experiments show that TSSA can find the optimal solutions for medium-sized problems in reasonable time, while TSSD can handle very large problems and report approximate solutions very close to the optimal ones.

Download Full-text

Genomic insights into the recent population history of Mapuche Native Americans

10.1101/2021.11.25.470066 ◽

2021 ◽

Author(s):

Lucas Vicuña ◽

Anastasia Mikhailova ◽

Tomás Norambuena ◽

Anna Ilina ◽

Olga Klimenkova ◽

...

Keyword(s):

Native American ◽

Native Americans ◽

Amazon Basin ◽

Genomic Data ◽

Population History ◽

Ancestral Population ◽

Effective Population ◽

Southern Patagonia ◽

History Of ◽

Demographic Shifts

The last few years have witnessed an explosive generation of genomic data from ancient and modern Native American populations. These data shed light on key demographic shifts that occurred in geographically diverse territories of South America, such as the Andean highlands, Southern Patagonia and the Amazon basin. We used genomic data to study the recent population history of the Mapuche, who are the major Native population from the Southern Cone (Chile and Argentina). We found evidence of specific shared genetic ancestry between the Mapuche and ancient populations from Southern Patagonia, Central Chile and the Argentine Pampas. Despite previous evidence of cultural influence of Inca and Tiwanaku polities over the Mapuche, we did not find evidence of specific shared ancestry between them, nor with Amazonian groups. We estimated the effective population size dynamics of the Mapuche ancestral population during the last millennia, identifying a population bottleneck around 1650 AD, coinciding with a period of Spaniards invasions into the territory inhabited by the Mapuche. Finally, we show that admixed Chileans underwent post-admixture adaptation in their Mapuche subancestry component in genes related with lipid metabolism, suggesting adaptation to scarce food availability.

Download Full-text

Quantifying the risk of hemiplasy in phylogenetic inference

10.1101/391391 ◽

2018 ◽

Cited By ~ 1

Author(s):

Rafael F. Guerrero ◽

Matthew W. Hahn

Keyword(s):

Risk Factor ◽

Convergent Evolution ◽

Phylogenetic Inference ◽

Species Tree ◽

New Method ◽

Gene Trees ◽

Arbitrary Length ◽

Wide Range

AbstractConvergent evolution is often inferred when a trait is incongruent with the species tree. However, trait incongruence can also arise from changes that occur on discordant gene trees, a process referred to as hemiplasy. Hemiplasy is rarely taken into account in studies of convergent evolution, despite the fact that phylogenomic studies have revealed rampant discordance. Here, we study the relative probabilities of homoplasy (including convergence and reversal) and hemiplasy for an incongruent trait. We derive expressions for the probabilities of the two events, showing that they depend on many of the same parameters. We find that hemiplasy is as likely— or more likely—than homoplasy for a wide range of conditions, even when levels of discordance are low. We also present a new method to calculate the ratio of these two probabilities (the “hemiplasy risk factor”) along the branches of a phylogeny of arbitrary length. Such calculations can be applied to any tree in order to identify when and where incongruent traits may be more likely to be due to hemiplasy than homoplasy.

Download Full-text