Fast and accurate statistical inference of phylogenetic networks using large-scale genomic sequence data

Mapping Intimacies ◽

10.1101/132795 ◽

2017 ◽

Cited By ~ 1

Author(s):

Hussein A. Hejase ◽

Natalie VandePol ◽

Gregory M. Bonito ◽

Kevin J. Liu

Keyword(s):

Gene Flow ◽

Large Scale ◽

Genomic Sequence ◽

State Of The Art ◽

Sequence Data ◽

Phylogenetic Network ◽

Phylogenetic Networks ◽

Divide And Conquer ◽

Performance Study ◽

Art Methods

AbstractAn emerging discovery in phylogenomics is that interspecific gene flow has played a major role in the evolution of many different organisms. To what extent is the Tree of Life not truly a tree reflecting strict “vertical” divergence, but rather a more general graph structure known as a phylogenetic network which also captures “horizontal”gene flow? The answer to this fundamental question not only depends upon densely sampled and divergent genomic sequence data, but also compu-tational methods which are capable of accurately and efficiently inferring phylogenetic networks from large-scale genomic sequence datasets. Re-cent methodological advances have attempted to address this gap. How-ever, in the 2016 performance study of Hejase and Liu, state-of-the-art methods fell well short of the scalability requirements of existing phy-logenomic studies.The methodological gap remains: how can phylogenetic networks be ac-curately and efficiently inferred using genomic sequence data involving many dozens or hundreds of taxa? In this study, we address this gap by proposing a new phylogenetic divide-and-conquer method which we call FastNet. We conduct a performance study involving a range of evolu-tionary scenarios, and we demonstrate that FastNet outperforms state-of-the-art methods in terms of computational efficiency and topological accuracy.

Download Full-text

A Divide-and-Conquer Method for Scalable Phylogenetic Network Inference from Multi-locus Data

10.1101/587725 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jiafan Zhu ◽

Xinhao Liu ◽

Huw A. Ogilvie ◽

Luay K. Nakhleh

Keyword(s):

Large Scale ◽

Network Inference ◽

Incomplete Lineage Sorting ◽

Phylogenetic Network ◽

Biological Data ◽

Phylogenetic Networks ◽

Divide And Conquer ◽

Lineage Sorting ◽

Step Method ◽

Sequence Alignments

AbstractReticulate evolutionary histories, such as those arising in the presence of hybridization, are best modeled as phylogenetic networks. Recently developed methods allow for statistical inference of phylogenetic networks while also accounting for other processes, such as incomplete lineage sorting (ILS). However, these methods can only handle a small number of loci from a handful of genomes.In this paper, we introduce a novel two-step method for scalable inference of phylogenetic networks from the sequence alignments of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa. To reduce the number of trinets to infer, we formulate a Hitting Set version of the problem of finding a small number of subsets, and implement a simple heuristic to solve it. We studied their performance, in terms of both running time and accuracy, on simulated as well as on biological data sets. The two-step method accurately infers phylogenetic networks at a scale that is infeasible with existing methods. The results are a significant and promising step towards accurate, large-scale phylogenetic network inference.We implemented the algorithms in the publicly available software package PhyloNet (https://bioinfocs.rice.edu/PhyloNet)[email protected]

Download Full-text

A divide-and-conquer method for scalable phylogenetic network inference from multilocus data

Bioinformatics ◽

10.1093/bioinformatics/btz359 ◽

2019 ◽

Vol 35 (14) ◽

pp. i370-i378 ◽

Cited By ~ 5

Author(s):

Jiafan Zhu ◽

Xinhao Liu ◽

Huw A Ogilvie ◽

Luay K Nakhleh

Keyword(s):

Large Scale ◽

Network Inference ◽

Incomplete Lineage Sorting ◽

Phylogenetic Network ◽

Phylogenetic Networks ◽

Divide And Conquer ◽

Supplementary Information ◽

Lineage Sorting ◽

Step Method ◽

Sequence Alignments

Abstract Motivation Reticulate evolutionary histories, such as those arising in the presence of hybridization, are best modeled as phylogenetic networks. Recently developed methods allow for statistical inference of phylogenetic networks while also accounting for other processes, such as incomplete lineage sorting. However, these methods can only handle a small number of loci from a handful of genomes. Results In this article, we introduce a novel two-step method for scalable inference of phylogenetic networks from the sequence alignments of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa. To reduce the number of trinets to infer, we formulate a Hitting Set version of the problem of finding a small number of subsets, and implement a simple heuristic to solve it. We studied their performance, in terms of both running time and accuracy, on simulated as well as on biological datasets. The two-step method accurately infers phylogenetic networks at a scale that is infeasible with existing methods. The results are a significant and promising step towards accurate, large-scale phylogenetic network inference. Availability and implementation We implemented the algorithms in the publicly available software package PhyloNet (https://bioinfocs.rice.edu/PhyloNet). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

FastNet: Fast and Accurate Statistical Inference of Phylogenetic Networks Using Large-Scale Genomic Sequence Data

Comparative Genomics - Lecture Notes in Computer Science ◽

10.1007/978-3-030-00834-5_14 ◽

2018 ◽

pp. 242-259 ◽

Cited By ~ 2

Author(s):

Hussein A. Hejase ◽

Natalie VandePol ◽

Gregory M. Bonito ◽

Kevin J. Liu

Keyword(s):

Statistical Inference ◽

Large Scale ◽

Genomic Sequence ◽

Sequence Data ◽

Phylogenetic Networks

Download Full-text

Faculty Opinions recommendation of A likelihood ratio test of speciation with gene flow using genomic sequence data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.3540959.3240060 ◽

2010 ◽

Author(s):

Nicolas Galtier ◽

Julien Dutheil

Keyword(s):

Gene Flow ◽

Likelihood Ratio ◽

Likelihood Ratio Test ◽

Genomic Sequence ◽

Sequence Data ◽

Ratio Test

Download Full-text

A Phylogenomic Supertree of Birds

Diversity ◽

10.3390/d11070109 ◽

2019 ◽

Vol 11 (7) ◽

pp. 109 ◽

Cited By ~ 17

Author(s):

Rebecca T. Kimball ◽

Carl H. Oliveros ◽

Ning Wang ◽

Noor D. White ◽

F. Keith Barker ◽

...

Keyword(s):

Large Scale ◽

Sequence Data ◽

Bird Species ◽

Divide And Conquer ◽

Clear Understanding ◽

Whole Genome ◽

Efficient Manner ◽

Sequence Capture ◽

Branch Lengths ◽

Supertree Methods

It has long been appreciated that analyses of genomic data (e.g., whole genome sequencing or sequence capture) have the potential to reveal the tree of life, but it remains challenging to move from sequence data to a clear understanding of evolutionary history, in part due to the computational challenges of phylogenetic estimation using genome-scale data. Supertree methods solve that challenge because they facilitate a divide-and-conquer approach for large-scale phylogeny inference by integrating smaller subtrees in a computationally efficient manner. Here, we combined information from sequence capture and whole-genome phylogenies using supertree methods. However, the available phylogenomic trees had limited overlap so we used taxon-rich (but not phylogenomic) megaphylogenies to weave them together. This allowed us to construct a phylogenomic supertree, with support values, that included 707 bird species (~7% of avian species diversity). We estimated branch lengths using mitochondrial sequence data and we used these branch lengths to estimate divergence times. Our time-calibrated supertree supports radiation of all three major avian clades (Palaeognathae, Galloanseres, and Neoaves) near the Cretaceous-Paleogene (K-Pg) boundary. The approach we used will permit the continued addition of taxa to this supertree as new phylogenomic data are published, and it could be applied to other taxa as well.

Download Full-text

Implementing Large Genomic Single Nucleotide Polymorphism Data Sets in Phylogenetic Network Reconstructions: A Case Study of Particularly Rapid Radiations of Cichlid Fish

Systematic Biology ◽

10.1093/sysbio/syaa005 ◽

2020 ◽

Vol 69 (5) ◽

pp. 848-862 ◽

Cited By ~ 2

Author(s):

Melisa Olave ◽

Axel Meyer

Keyword(s):

Single Nucleotide Polymorphism ◽

Gene Flow ◽

Genetic Material ◽

Cichlid Fish ◽

Phylogenetic Network ◽

Phylogenetic Networks ◽

Nucleotide Polymorphism ◽

Rapid Radiation ◽

Data Set ◽

Single Nucleotide

Abstract The Midas cichlids of the Amphilophus citrinellus spp. species complex from Nicaragua (13 species) are an extraordinary example of adaptive and rapid radiation ($<$24,000 years old). These cichlids are a very challenging group to infer its evolutionary history in phylogenetic analyses, due to the apparent prevalence of incomplete lineage sorting (ILS), as well as past and current gene flow. Assuming solely a vertical transfer of genetic material from an ancestral lineage to new lineages is not appropriate in many cases of genes transferred horizontally in nature. Recently developed methods to infer phylogenetic networks under such circumstances might be able to circumvent these problems. These models accommodate not just ILS, but also gene flow, under the multispecies network coalescent (MSNC) model, processes that are at work in young, hybridizing, and/or rapidly diversifying lineages. There are currently only a few programs available that implement MSNC for estimating phylogenetic networks. Here, we present a novel way to incorporate single nucleotide polymorphism (SNP) data into the currently available PhyloNetworks program. Based on simulations, we demonstrate that SNPs can provide enough power to recover the true phylogenetic network. We also show that it can accurately infer the true network more often than other similar SNP-based programs (PhyloNet and HyDe). Moreover, our approach results in a faster algorithm compared to the original pipeline in PhyloNetworks, without losing power. We also applied our new approach to infer the phylogenetic network of Midas cichlid radiation. We implemented the most comprehensive genomic data set to date (RADseq data set of 679 individuals and $>$37K SNPs from 19 ingroup lineages) and present estimated phylogenetic networks for this extremely young and fast-evolving radiation of cichlid fish. We demonstrate that the MSNC is more appropriate than the multispecies coalescent alone for the analysis of this rapid radiation. [Genomics; multispecies network coalescent; phylogenetic networks; phylogenomics; RADseq; SNPs.]

Download Full-text

Uniformity Attentive Learning-Based Siamese Network for Person Re-Identification

Sensors ◽

10.3390/s20123603 ◽

2020 ◽

Vol 20 (12) ◽

pp. 3603

Author(s):

Dasol Jeong ◽

Hasil Park ◽

Joongchol Shin ◽

Donggoo Kang ◽

Joonki Paik

Keyword(s):

Large Scale ◽

Body Shape ◽

State Of The Art ◽

The State ◽

Whole Body ◽

Distinctive Features ◽

Common Features ◽

Siamese Network ◽

Art Methods ◽

Triplet Loss

Person re-identification (Re-ID) has a problem that makes learning difficult such as misalignment and occlusion. To solve these problems, it is important to focus on robust features in intra-class variation. Existing attention-based Re-ID methods focus only on common features without considering distinctive features. In this paper, we present a novel attentive learning-based Siamese network for person Re-ID. Unlike existing methods, we designed an attention module and attention loss using the properties of the Siamese network to concentrate attention on common and distinctive features. The attention module consists of channel attention to select important channels and encoder-decoder attention to observe the whole body shape. We modified the triplet loss into an attention loss, called uniformity loss. The uniformity loss generates a unique attention map, which focuses on both common and discriminative features. Extensive experiments show that the proposed network compares favorably to the state-of-the-art methods on three large-scale benchmarks including Market-1501, CUHK03 and DukeMTMC-ReID datasets.

Download Full-text

Large‐scale genomic sequence data resolve the deepest divergences in the legume phylogeny and support a near‐simultaneous evolutionary origin of all six subfamilies

New Phytologist ◽

10.1111/nph.16290 ◽

2019 ◽

Vol 225 (3) ◽

pp. 1355-1369 ◽

Cited By ~ 12

Author(s):

Erik J. M. Koenen ◽

Dario I. Ojeda ◽

Royce Steeves ◽

Jérémy Migliore ◽

Freek T. Bakker ◽

...

Keyword(s):

Large Scale ◽

Genomic Sequence ◽

Sequence Data ◽

Evolutionary Origin

Download Full-text

A Bayesian Implementation of the Multispecies Coalescent Model with Introgression for Phylogenomic Analysis

Molecular Biology and Evolution ◽

10.1093/molbev/msz296 ◽

2019 ◽

Vol 37 (4) ◽

pp. 1211-1223 ◽

Cited By ~ 8

Author(s):

Tomáš Flouri ◽

Xiyun Jiao ◽

Bruce Rannala ◽

Ziheng Yang

Keyword(s):

Gene Flow ◽

Genomic Sequence ◽

Sequence Data ◽

Incomplete Lineage Sorting ◽

Mosquito Species ◽

Phylogenomic Analysis ◽

Data Sets ◽

Lineage Sorting ◽

Coalescent Model ◽

Multispecies Coalescent

Abstract Recent analyses suggest that cross-species gene flow or introgression is common in nature, especially during species divergences. Genomic sequence data can be used to infer introgression events and to estimate the timing and intensity of introgression, providing an important means to advance our understanding of the role of gene flow in speciation. Here, we implement the multispecies-coalescent-with-introgression model, an extension of the multispecies-coalescent model to incorporate introgression, in our Bayesian Markov chain Monte Carlo program Bpp. The multispecies-coalescent-with-introgression model accommodates deep coalescence (or incomplete lineage sorting) and introgression and provides a natural framework for inference using genomic sequence data. Computer simulation confirms the good statistical properties of the method, although hundreds or thousands of loci are typically needed to estimate introgression probabilities reliably. Reanalysis of data sets from the purple cone spruce confirms the hypothesis of homoploid hybrid speciation. We estimated the introgression probability using the genomic sequence data from six mosquito species in the Anopheles gambiae species complex, which varies considerably across the genome, likely driven by differential selection against introgressed alleles.

Download Full-text

A Likelihood Ratio Test of Speciation with Gene Flow Using Genomic Sequence Data

Genome Biology and Evolution ◽

10.1093/gbe/evq011 ◽

2010 ◽

Vol 2 ◽

pp. 200-211 ◽

Cited By ~ 39

Author(s):

Ziheng Yang

Keyword(s):

Gene Flow ◽

Likelihood Ratio ◽

Likelihood Ratio Test ◽

Genomic Sequence ◽

Sequence Data ◽

Ratio Test

Download Full-text