scholarly journals Maximum likelihood estimation of species trees from gene trees in the presence of ancestral population structure

2019 ◽  
Author(s):  
Hillary Koch ◽  
Michael DeGiorgio

AbstractThough large multilocus genomic datasets have led to overall improvements in phylogenetic inference, they have posed the new challenge of addressing conflicting signals across the genome. In particular, ancestral population structure, which has been uncovered in a number of diverse species, can skew gene tree frequencies, thereby hindering the performance of species tree estimators. Here we develop a novel maximum likelihood method, termed TASTI, that can infer phylogenies under such scenarios, and find that it has increasing accuracy with increasing numbers of input gene trees, contrasting with the relatively poor performances of methods not tailored for ancestral structure. Moreover, we propose a supertree approach that allows TASTI to scale computationally with increasing numbers of input taxa. We use genetic simulations to assess TASTI’s performance in the four-taxon setting, and demonstrate the application of TASTI on a six-species Afrotropical mosquito dataset. Finally, we have implemented TASTI in an open-source software package for ease of use by the scientific community.

2020 ◽  
Vol 12 (2) ◽  
pp. 3977-3995 ◽  
Author(s):  
Hillary Koch ◽  
Michael DeGiorgio

Abstract Though large multilocus genomic data sets have led to overall improvements in phylogenetic inference, they have posed the new challenge of addressing conflicting signals across the genome. In particular, ancestral population structure, which has been uncovered in a number of diverse species, can skew gene tree frequencies, thereby hindering the performance of species tree estimators. Here we develop a novel maximum likelihood method, termed TASTI (Taxa with Ancestral structure Species Tree Inference), that can infer phylogenies under such scenarios, and find that it has increasing accuracy with increasing numbers of input gene trees, contrasting with the relatively poor performances of methods not tailored for ancestral structure. Moreover, we propose a supertree approach that allows TASTI to scale computationally with increasing numbers of input taxa. We use genetic simulations to assess TASTI’s performance in the three- and four-taxon settings and demonstrate the application of TASTI on a six-species Afrotropical mosquito data set. Finally, we have implemented TASTI in an open-source software package for ease of use by the scientific community.


Author(s):  
Anggis Sagitarisman ◽  
Aceng Komarudin Mutaqin

AbstractCar manufacturers in Indonesia need to determine reasonable warranty costs that do not burden companies or consumers. Several statistical approaches have been developed to analyze warranty costs. One of them is the Gertsbakh-Kordonsky method which reduces the two-dimensional warranty problem to one dimensional. In this research, we apply the Gertsbakh-Kordonsky method to estimate the warranty cost for car type A in XYZ company. The one-dimensional data will be tested using the Kolmogorov-Smirnov to determine its distribution and the parameter of distribution will be estimated using the maximum likelihood method. There are three approaches to estimate the parameter of the distribution. The difference between these three approaches is in the calculation of mileage for units that do not claim within the warranty period. In the application, we use claim data for the car type A. The data exploration indicates the failure of car type A is mostly due to the age of the vehicle. The Kolmogorov-Smirnov shows that the most appropriate distribution for the claim data is the three-parameter Weibull. Meanwhile, the estimated using the Gertsbakh-Kordonsky method shows that the warranty costs for car type A are around 3.54% from the selling price of this car unit without warranty i.e. around Rp. 4,248,000 per unit.Keywords: warranty costs; the Gertsbakh-Kordonsky method; maximum likelihood estimation; Kolmogorov-Smirnov test.                                   AbstrakPerusahaan produsen mobil di Indonesia perlu menentukan biaya garansi yang bersifat wajar tidak memberatkan perusahaan maupun konsumen. Beberapa pendekatan statistik telah dikembangkan untuk menganalisis biaya garansi. Salah satunya adalah metode Gertsbakh-Kordonsky yang mereduksi masalah garansi dua dimensi menjadi satu dimensi. Pada penelitian ini, metode Gertsbakh-Kordonsky akan digunakan untuk mengestimasi biaya garansi untuk mobil tipe A pada perusahaan XYZ. Data satu dimensi hasil reduksi diuji kecocokan distribusinya menggunakan uji kecocokan Kolmogorov-Smirnov dan taksiran parameter distribusinya menggunakan metode penaksir kemungkinan maksimum. Ada tiga pendekatan yang digunakan untuk menaksir parameter distribusi. Perbedaan dari ketiga pendekatan tersebut terletak pada perhitungan jarak tempuh untuk unit yang tidak melakukan klaim dalam periode garansi. Sebagai bahan aplikasi, kami menggunakan data klaim unit mobil tipe A. Hasil eksplorasi data menunjukkan bahwa kegagalan mobil tipe A lebih banyak disebabkan karena faktor usia kendaraan. Hasil uji kecocokan distribusi untuk data hasil reduksi menunjukkan bahwa distribusi yang cocok adalah distribusi Weibull 3-parameter. Sementara itu, hasil perhitungan taksiran biaya garansi menunjukan bahwa taksiran biaya garansi untuk unit mobil tipe A sekitar 3,54% dari harga jual unit mobil tipe A tanpa garansi, atau sekitar Rp. 4.248.000,- per unit.Kata Kunci: biaya garansi; metode Gertsbakh-Kordonsky; penaksiran kemungkinan maksimum; uji Kolmogorov-Smirnov.


Genetics ◽  
2003 ◽  
Vol 164 (4) ◽  
pp. 1645-1656 ◽  
Author(s):  
Bruce Rannala ◽  
Ziheng Yang

Abstract The effective population sizes of ancestral as well as modern species are important parameters in models of population genetics and human evolution. The commonly used method for estimating ancestral population sizes, based on counting mismatches between the species tree and the inferred gene trees, is highly biased as it ignores uncertainties in gene tree reconstruction. In this article, we develop a Bayes method for simultaneous estimation of the species divergence times and current and ancestral population sizes. The method uses DNA sequence data from multiple loci and extracts information about conflicts among gene tree topologies and coalescent times to estimate ancestral population sizes. The topology of the species tree is assumed known. A Markov chain Monte Carlo algorithm is implemented to integrate over uncertain gene trees and branch lengths (or coalescence times) at each locus as well as species divergence times. The method can handle any species tree and allows different numbers of sequences at different loci. We apply the method to published noncoding DNA sequences from the human and the great apes. There are strong correlations between posterior estimates of speciation times and ancestral population sizes. With the use of an informative prior for the human-chimpanzee divergence date, the population size of the common ancestor of the two species is estimated to be ∼20,000, with a 95% credibility interval (8000, 40,000). Our estimates, however, are affected by model assumptions as well as data quality. We suggest that reliable estimates have yet to await more data and more realistic models.


Author(s):  
V.A. Ufaev

On the basis of the hypothesis of equality of the measured and true values of the amplitude of the field strength, an algebraic solution for estimating the unknown coordinates and the energy parameter of the radiator is obtained. Initially, by compiling and solving a redefined system of linear equations by pseudo-rotation of matrices, the coordinates of the emitter are determined under the assumption of independence of the distance to the reference point from the coordinates of the emitter. Then make and solve the square equation concerning distance to a reference point with the subsequent estimation of coordinates and an energy parameter. The ambiguity of the algebraic solution is resolved by comparing the maximum likelihood functional and choosing the parameters at which its maximum is reached. According to the simulation of a cellular-type system in multiplicative noise, the results of algebraic solutions by the maximum likelihood method and the calculated ones are close, except for a special zone where anomalous changes occur due to the limitations of the coordinate determination method. Algebraic solutions for maximum likelihood estimation provide an increase in the calculation speed of about 500 times. The proposed principle can be used in solving the ambiguity of algebraic solutions in systems of difference-rangefinder type and in the inverse problem of self-positioning of the receiving point by the amplitude of the electromagnetic field of beacons with a known location. The article contains 4 figures, a list of references from 9 sources.


2022 ◽  
Vol 12 ◽  
Author(s):  
Martha Kandziora ◽  
Petr Sklenář ◽  
Filip Kolář ◽  
Roswitha Schmickl

A major challenge in phylogenetics and -genomics is to resolve young rapidly radiating groups. The fast succession of species increases the probability of incomplete lineage sorting (ILS), and different topologies of the gene trees are expected, leading to gene tree discordance, i.e., not all gene trees represent the species tree. Phylogenetic discordance is common in phylogenomic datasets, and apart from ILS, additional sources include hybridization, whole-genome duplication, and methodological artifacts. Despite a high degree of gene tree discordance, species trees are often well supported and the sources of discordance are not further addressed in phylogenomic studies, which can eventually lead to incorrect phylogenetic hypotheses, especially in rapidly radiating groups. We chose the high-Andean Asteraceae genus Loricaria to shed light on the potential sources of phylogenetic discordance and generated a phylogenetic hypothesis. By accounting for paralogy during gene tree inference, we generated a species tree based on hundreds of nuclear loci, using Hyb-Seq, and a plastome phylogeny obtained from off-target reads during target enrichment. We observed a high degree of gene tree discordance, which we found implausible at first sight, because the genus did not show evidence of hybridization in previous studies. We used various phylogenomic analyses (trees and networks) as well as the D-statistics to test for ILS and hybridization, which we developed into a workflow on how to tackle phylogenetic discordance in recent radiations. We found strong evidence for ILS and hybridization within the genus Loricaria. Low genetic differentiation was evident between species located in different Andean cordilleras, which could be indicative of substantial introgression between populations, promoted during Pleistocene glaciations, when alpine habitats shifted creating opportunities for secondary contact and hybridization.


2021 ◽  
Author(s):  
Benoit Morel ◽  
Paul Schade ◽  
Sarah Lutteropp ◽  
Tom A. Williams ◽  
Gergely J. Szöllösi ◽  
...  

Species tree inference from gene family trees is becoming increasingly popular because it can account for discordance between the species tree and the corresponding gene family trees. In particular, methods that can account for multiple-copy gene families exhibit potential to leverage paralogy as informative signal. At present, there does not exist any widely adopted inference method for this purpose. Here, we present SpeciesRax, the first maximum likelihood method that can infer a rooted species tree from a set of gene family trees and can account for gene duplication, loss, and transfer events. By explicitly modelling events by which gene trees can depart from the species tree, SpeciesRax leverages the phylogenetic rooting signal in gene trees. SpeciesRax infers species tree branch lengths in units of expected substitutions per site and branch support values via paralogy-aware quartets extracted from the gene family trees. Using both empirical and simulated datasets we show that SpeciesRax is at least as accurate as the best competing methods while being one order of magnitude faster on large datasets at the same time. We used SpeciesRax to infer a biologically plausible rooted phylogeny of the vertebrates comprising $188$ species from $31612$ gene families in one hour using $40$ cores. SpeciesRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax and on BioConda.


2016 ◽  
Author(s):  
Rui J. Costa ◽  
Hilde Wilkinson-Herbots

AbstractThe isolation-with-migration (IM) model is commonly used to make inferences about gene flow during speciation, using polymorphism data. However, Becquet and Przeworski (2009) report that the parameter estimates obtained by fitting the IM model are very sensitive to the model's assumptions (including the assumption of constant gene flow until the present). This paper is concerned with the isolation-with-initial-migration (IIM) model of Wilkinson-Herbots (2012), which drops precisely this assumption. In the IIM model, one ancestral population divides into two descendant subpopulations, between which there is an initial period of gene flow and a subsequent period of isolation. We derive a very fast method of fitting an extended version of the IIM model, which also allows for asymmetric gene flow and unequal population sizes. This is a maximum-likelihood method, applicable to data on the number of segregating sites between pairs of DNA sequences from a large number of independent loci. In addition to obtaining parameter estimates, our method can also be used to distinguish between alternative models representing different evolutionary scenarios, by means of likelihood ratio tests. We illustrate the procedure on pairs of Drosophila sequences from approximately 30,000 loci. The computing time needed to fit the most complex version of the model to this data set is only a couple of minutes. The R code to fit the IIM model can be found in the supplementary files of this paper.


2020 ◽  
Author(s):  
Matthew H Van Dam ◽  
James B Henderson ◽  
Lauren Esposito ◽  
Michelle Trautwein

Abstract Ultraconserved genomic elements (UCEs) are generally treated as independent loci in phylogenetic analyses. The identification pipeline for UCE probes does not require prior knowledge of genetic identity, only selecting loci that are highly conserved, single copy, without repeats, and of a particular length. Here, we characterized UCEs from 11 phylogenomic studies across the animal tree of life, from birds to marine invertebrates. We found that within vertebrate lineages, UCEs are mostly intronic and intergenic, while in invertebrates, the majority are in exons. We then curated four different sets of UCE markers by genomic category from five different studies including: birds, mammals, fish, Hymenoptera (ants, wasps, and bees), and Coleoptera (beetles). Of genes captured by UCEs, we find that many are represented by two or more UCEs, corresponding to nonoverlapping segments of a single gene. We considered these UCEs to be nonindependent, merged all UCEs that belonged to a particular gene, constructed gene and species trees, and then evaluated the subsequent effect of merging cogenic UCEs on gene and species tree reconstruction. Average bootstrap support for merged UCE gene trees was significantly improved across all data sets apparently driven by the increase in loci length. Additionally, we conducted simulations and found that gene trees generated from merged UCEs were more accurate than those generated by unmerged UCEs. As loci length improves gene tree accuracy, this modest degree of UCE characterization and curation impacts downstream analyses and demonstrates the advantages of incorporating basic genomic characterizations into phylogenomic analyses. [Anchored hybrid enrichment; ants; ASTRAL; bait capture; carangimorph; Coleoptera; conserved nonexonic elements; exon capture; gene tree; Hymenoptera; mammal; phylogenomic markers; songbird; species tree; ultraconserved elements; weevils.]


2014 ◽  
Vol 1070-1072 ◽  
pp. 2073-2078
Author(s):  
Xiu Ji ◽  
Hui Wang ◽  
Chuan Qi Zhao ◽  
Xu Ting Yan

It is difficult to estimate the parameters of Weibull distribution model using maximum likelihood estimation based on particle swarm optimization (PSO) theory for which is easy to fall into premature and needs more variables, ant colony algorithm theory was introduced into maximum likelihood method, and a parameter estimation method based on ant colony algorithm theory was proposed, an example was simulated to verify the feasibility and effectiveness of this method by comparing with ant colony algorithm and PSO.This template explains and demonstrates how to prepare your camera-ready paper for Trans Tech Publications. The best is to read these instructions and follow the outline of this text.


Sign in / Sign up

Export Citation Format

Share Document