New Methods to Calculate Concordance Factors for Phylogenomic Datasets

Bui Quang Minh; Matthew W Hahn; Robert Lanfear

doi:10.1093/molbev/msaa106

New Methods to Calculate Concordance Factors for Phylogenomic Datasets

Molecular Biology and Evolution ◽

10.1093/molbev/msaa106 ◽

2020 ◽

Vol 37 (9) ◽

pp. 2727-2733 ◽

Cited By ~ 17

Author(s):

Bui Quang Minh ◽

Matthew W Hahn ◽

Robert Lanfear

Keyword(s):

Full Description ◽

Data Sets ◽

The Novel ◽

Gene Trees ◽

Reference Tree ◽

New Methods ◽

Genealogical Concordance ◽

Branch Support ◽

Two Measures ◽

Wide Usage

Abstract We implement two measures for quantifying genealogical concordance in phylogenomic data sets: the gene concordance factor (gCF) and the novel site concordance factor (sCF). For every branch of a reference tree, gCF is defined as the percentage of “decisive” gene trees containing that branch. This measure is already in wide usage, but here we introduce a package that calculates it while accounting for variable taxon coverage among gene trees. sCF is a new measure defined as the percentage of decisive sites supporting a branch in the reference tree. gCF and sCF complement classical measures of branch support in phylogenetics by providing a full description of underlying disagreement among loci and sites. An easy to use implementation and tutorial is freely available in the IQ-TREE software package (http://www.iqtree.org/doc/Concordance-Factor, last accessed May 13, 2020).

Download Full-text

New methods to calculate concordance factors for phylogenomic datasets

10.1101/487801 ◽

2018 ◽

Cited By ~ 30

Author(s):

Bui Quang Minh ◽

Matthew W. Hahn ◽

Robert Lanfear

Keyword(s):

Software Package ◽

Full Description ◽

The Novel ◽

Gene Trees ◽

Reference Tree ◽

New Methods ◽

Genealogical Concordance ◽

Branch Support ◽

Two Measures ◽

Wide Usage

AbstractWe implement two measures for quantifying genealogical concordance in phylogenomic datasets: the gene concordance factor (gCF) and the novel site concordance factor (sCF). For every branch of a reference tree, gCF is defined as the percentage of “decisive” gene trees containing that branch. This measure is already in wide usage, but here we introduce a package that calculates it while accounting for variable taxon coverage among gene trees. sCF is a new measure defined as the percentage of decisive sites supporting a branch in the reference tree. gCF and sCF complement classical measures of branch support in phylogenetics by providing a full description of underlying disagreement among loci and sites. An easy to use implementation and tutorial is freely available in the IQ-TREE software package (http://www.iqtree.org).

Download Full-text

Computing the Internode Certainty and related measures from partial gene trees

10.1101/022053 ◽

2015 ◽

Cited By ~ 2

Author(s):

Kassian Kobert ◽

Leonidas Salichos ◽

Antonis Rokas ◽

Alexandros Stamatakis

Keyword(s):

Empirical Data ◽

Data Sets ◽

Gene Trees ◽

Data Set ◽

Reference Tree ◽

Full Species

AbstractWe present, implement, and evaluate an approach to calculate the internode certainty and tree certainty on a given reference tree from a collection of partial gene trees. Previously, the calculation of these values was only possible from a collection of gene trees with exactly the same taxon set as the reference tree. An application to sets of partial gene trees requires mathematical corrections in the internode certainty and tree certainty calculations. We implement our methods in RAxML and test them on empirical data sets. These tests imply that the inclusion of partial trees does matter. However, in order to provide meaningful measurements, any data set should also include trees containing the full species set.

Download Full-text

Distribution and asymptotic behavior of the phylogenetic transfer distance

10.1101/368993 ◽

2018 ◽

Cited By ~ 1

Author(s):

Miraine Dávila Felipe ◽

Jean-Baka Domelevo Entfellner ◽

Frédéric Lemoine ◽

Jakub Truszkowski ◽

Olivier Gascuel

Keyword(s):

Asymptotic Behavior ◽

Statistical Significance ◽

Null Model ◽

Large Data ◽

Data Sets ◽

Reference Tree ◽

Transfer Distance ◽

Branch Support ◽

Light Side ◽

Transfer Index

AbstractThe transfer distance (TD) was introduced in the classification framework and studied in the context of phylogenetic tree matching. Recently, Lemoine et al. (2018) showed that TD can be a powerful tool to assess the branch support of phylogenies with large data sets, thus providing a relevant alternative to Felsenstein’s bootstrap. This distance allows a reference branch β in a reference tree 𝒯 to be compared to a branch b from another tree T, both on the same set of n taxa. The TD between these branches is the number of taxa that must be transferred from one side of b to the other in order to obtain β. By taking the minimum TD from β to all branches in T we define the transfer index, denoted by ϕ(β, T), measuring the degree of agreement of β with T. Let us consider a reference branch β having p tips on its light side and define the transfer support (TS) as 1 – ϕ(β, T)/(p – 1). The aim of this article is to provide evidence that p 1 is a meaningful normalization constant in the definition of TS, and measure the statistical significance of TS, assuming that β is compared to a tree T drawn according to a null model. We obtain several results that shed light on these questions in a number of settings. In particular, we study the asymptotic behavior of TS when n tends to ∞, and fully characterize the distribution of ϕ when T is a caterpillar tree.

Download Full-text

Quartet-based computations of internode certainty provide accurate and robust measures of phylogenetic incongruence

10.1101/168526 ◽

2017 ◽

Cited By ~ 8

Author(s):

Xiaofan Zhou ◽

Sarah Lutteropp ◽

Lucas Czech ◽

Alexandros Stamatakis ◽

Moritz von Looz ◽

...

Keyword(s):

Phylogenetic Relationships ◽

Phylogenetic Trees ◽

Phylogenetic Signal ◽

Simulated Data ◽

Data Sets ◽

Gene Trees ◽

Branch Support ◽

Robust Measures ◽

Genome Scale ◽

Scale Data

AbstractIncongruence, or topological conflict, is prevalent in genome-scale data sets but relatively few measures have been developed to quantify it. Internode Certainty (IC) and related measures were recently introduced to explicitly quantify the level of incongruence of a given internode (or internal branch) among a set of phylogenetic trees and complement regular branch support statistics in assessing the confidence of the inferred phylogenetic relationships. Since most phylogenomic studies contain data partitions (e.g., genes) with missing taxa and IC scores stem from the frequencies of bipartitions (or splits) on a set of trees, the calculation of IC scores requires adjusting the frequencies of bipartitions from these partial gene trees. However, when the proportion of missing data is high, current approaches that adjust bipartition frequencies in partial gene trees tend to overestimate IC scores and alternative adjustment approaches differ substantially from each other in their scores. To overcome these issues, we developed three new measures for calculating internode certainty that are based on the frequencies of quartets, which naturally apply to both comprehensive and partial trees. Our comparison of these new quartet-based measures to previous bipartition-based measures on simulated data shows that: 1) on comprehensive trees, both types of measures yield highly similar IC scores; 2) on partial trees, quartet-based measures generate more accurate IC scores; and 3) quartet-based measures are more robust to the absence of phylogenetic signal and errors in the phylogenetic relationships to be assessed. Additionally, analysis of 15 empirical phylogenomic data sets using our quartet-based measures suggests that numerous relationships remain unresolved despite the availability of genome-scale data. Finally, we provide an efficient open-source implementation of these quartet-based measures in the program QuartetScores, which is freely available at https://github.com/algomaus/QuartetScores.

Download Full-text

Quartet-Based Computations of Internode Certainty Provide Robust Measures of Phylogenetic Incongruence

Systematic Biology ◽

10.1093/sysbio/syz058 ◽

2019 ◽

Vol 69 (2) ◽

pp. 308-324 ◽

Cited By ~ 7

Author(s):

Xiaofan Zhou ◽

Sarah Lutteropp ◽

Lucas Czech ◽

Alexandros Stamatakis ◽

Moritz Von Looz ◽

...

Keyword(s):

Phylogenetic Trees ◽

Phylogenetic Signal ◽

Simulated Data ◽

Data Sets ◽

Gene Trees ◽

Data Set ◽

Statistical Confidence ◽

Branch Support ◽

Robust Measures ◽

Genome Scale

Abstract Incongruence, or topological conflict, is prevalent in genome-scale data sets. Internode certainty (IC) and related measures were recently introduced to explicitly quantify the level of incongruence of a given internal branch among a set of phylogenetic trees and complement regular branch support measures (e.g., bootstrap, posterior probability) that instead assess the statistical confidence of inference. Since most phylogenomic studies contain data partitions (e.g., genes) with missing taxa and IC scores stem from the frequencies of bipartitions (or splits) on a set of trees, IC score calculation typically requires adjusting the frequencies of bipartitions from these partial gene trees. However, when the proportion of missing taxa is high, the scores yielded by current approaches that adjust bipartition frequencies in partial gene trees differ substantially from each other and tend to be overestimates. To overcome these issues, we developed three new IC measures based on the frequencies of quartets, which naturally apply to both complete and partial trees. Comparison of our new quartet-based measures to previous bipartition-based measures on simulated data shows that: (1) on complete data sets, both quartet-based and bipartition-based measures yield very similar IC scores; (2) IC scores of quartet-based measures on a given data set with and without missing taxa are more similar than the scores of bipartition-based measures; and (3) quartet-based measures are more robust to the absence of phylogenetic signal and errors in phylogenetic inference than bipartition-based measures. Additionally, the analysis of an empirical mammalian phylogenomic data set using our quartet-based measures reveals the presence of substantial levels of incongruence for numerous internal branches. An efficient open-source implementation of these quartet-based measures is freely available in the program QuartetScores (https://github.com/lutteropp/QuartetScores).

Download Full-text

Basic Andragogy-Oriented Strategies and Communication in Online Education

International conference KNOWLEDGE-BASED ORGANIZATION ◽

10.2478/kbo-2020-0103 ◽

2020 ◽

Vol 26 (2) ◽

pp. 350-356

Author(s):

Anca Sîrbu

Keyword(s):

Adult Learners ◽

Online Education ◽

Social Status ◽

Rapid Onset ◽

Learning Goals ◽

The Novel ◽

Academic Education ◽

Online Instructors ◽

New Methods ◽

The World

AbstractWith the rapid onset of an unprecedented lifestyle due to the new coronavirus COVID-19 the world academic scene was forced to reform and adapt to the novel circumstances. Although online education cannot be regarded as a groundbreaking endeavour anymore in the21st century, its current character of exclusivity calls for deeper understanding of, and a sharper focus on the “end-consumer” thereof as well as more cautious procedures to be exercised while teaching. While millennials are no longer thought of as being born with a silver spoon in their mouth but with an iPad or any sort of device in their hand (irrespective of their social status), adults are more hesitant when coerced to alter course unexpectedly and turn to new methods of attaining their learning goals. This is why proper communicative approaches need to be thoroughly considered by online instructors. This article aims at presenting teachers with a set of strategies to employ when the beneficiaries of online academic education are adult learners.

Download Full-text

Suspended Sediment Modeling Using a Heuristic Regression Method Hybridized with Kmeans Clustering

Sustainability ◽

10.3390/su13094648 ◽

2021 ◽

Vol 13 (9) ◽

pp. 4648

Author(s):

Rana Muhammad Adnan ◽

Kulwinder Singh Parmar ◽

Salim Heddam ◽

Shamsuddin Shahid ◽

Ozgur Kisi

Keyword(s):

Clustering Algorithm ◽

Suspended Sediments ◽

Ecological Impacts ◽

Accurate Estimation ◽

Data Sets ◽

The Novel ◽

Neuro Fuzzy ◽

Adaptive Regression ◽

Novel Method ◽

Model Training

The accurate estimation of suspended sediments (SSs) carries significance in determining the volume of dam storage, river carrying capacity, pollution susceptibility, soil erosion potential, aquatic ecological impacts, and the design and operation of hydraulic structures. The presented study proposes a new method for accurately estimating daily SSs using antecedent discharge and sediment information. The novel method is developed by hybridizing the multivariate adaptive regression spline (MARS) and the Kmeans clustering algorithm (MARS–KM). The proposed method’s efficacy is established by comparing its performance with the adaptive neuro-fuzzy system (ANFIS), MARS, and M5 tree (M5Tree) models in predicting SSs at two stations situated on the Yangtze River of China, according to the three assessment measurements, RMSE, MAE, and NSE. Two modeling scenarios are employed; data are divided into 50–50% for model training and testing in the first scenario, and the training and test data sets are swapped in the second scenario. In Guangyuan Station, the MARS–KM showed a performance improvement compared to ANFIS, MARS, and M5Tree methods in term of RMSE by 39%, 30%, and 18% in the first scenario and by 24%, 22%, and 8% in the second scenario, respectively, while the improvement in RMSE of ANFIS, MARS, and M5Tree was 34%, 26%, and 27% in the first scenario and 7%, 16%, and 6% in the second scenario, respectively, at Beibei Station. Additionally, the MARS–KM models provided much more satisfactory estimates using only discharge values as inputs.

Download Full-text

Chronotope of Russian Works about Robinson

Nauchnyy dialog ◽

10.24224/2227-1295-2021-5-287-302 ◽

2021 ◽

pp. 287-302

Author(s):

T. V. Shvetsova ◽

V. E. Shakhova

Keyword(s):

Russian Language ◽

Literary Analysis ◽

The Novel ◽

The Real ◽

New Methods ◽

Space And Time ◽

Language Interpretation ◽

Similarities And Differences ◽

The Russian Language

The results of the study of the chronotope in Russian-language compositions based on the novel about Robinson’s adventures are presented. The material for the work was A. E. Razin’s novel “The Real Robinson” (1860) and Lev Tolstoy’s story “Robinson” (1862). The issues of the specifics of the representation of the chronotopic in the works of Russian writers are considered. The relevance of the study is due to the appeal to the universal of the chronotope, which contains an exhaustive toolkit for the artistic embodiment of images of space and time; as well as the search for new methods of literary analysis of the text. It is shown that in the analyzed texts, a kind of fusion of Russianlanguage compositions with a foreigncultural text in the aspect of a chronotope is realized. The similarities and differences in the rethinking of the story of Robinson are shown on the example of the model of textual connexity, the national specifics of the representation of the image of Robinson are indicated. It is noted that the external and internal chronotopes are retransmitted from work to work and create the basis for the emergence of the author’s intentions. It is proved that chronotopic analysis allows one to form an idea of the peculiarities of the Russian-language interpretation of the story of Robinson.

Download Full-text

Mining Environmental Data in the ADMIRE Project Using New Advanced Methods and Tools

Technology Integration Advancements in Distributed Systems and Computing ◽

10.4018/978-1-4666-0906-8.ch018 ◽

2012 ◽

pp. 296-308

Author(s):

Ondrej Habala ◽

Martin Šeleng ◽

Viet Tran ◽

Branislav Šimo ◽

Ladislav Hluchý

Keyword(s):

Data Mining ◽

Environmental Data ◽

Environmental Applications ◽

Data Sets ◽

Distributed Data ◽

New Methods ◽

Prospective Application ◽

Using Data ◽

Computer Power

The project Advanced Data Mining and Integration Research for Europe (ADMIRE) is designing new methods and tools for comfortable mining and integration of large, distributed data sets. One of the prospective application domains for such methods and tools is the environmental applications domain, which often uses various data sets from different vendors where data mining is becoming increasingly popular and more computer power becomes available. The authors present a set of experimental environmental scenarios, and the application of ADMIRE technology in these scenarios. The scenarios try to predict meteorological and hydrological phenomena which currently cannot or are not predicted by using data mining of distributed data sets from several providers in Slovakia. The scenarios have been designed by environmental experts and apart from being used as the testing grounds for the ADMIRE technology; results are of particular interest to experts who have designed them.

Download Full-text

Detecting destabilizing species in the phylogenetic backbone of Potentilla (Rosaceae) using low-copy nuclear markers

AoB Plants ◽

10.1093/aobpla/plaa017 ◽

2020 ◽

Vol 12 (3) ◽

Cited By ~ 1

Author(s):

Nannie L Persson ◽

Ingrid Toresen ◽

Heidi Lie Andersen ◽

Jenny E E Smedmark ◽

Torsten Eriksson

Keyword(s):

Incomplete Lineage Sorting ◽

Gene Tree ◽

Data Sets ◽

Gene Trees ◽

Lineage Sorting ◽

Nuclear Markers ◽

Data Set ◽

Multispecies Coalescent ◽

Single Marker ◽

Named Groups

Abstract The genus Potentilla (Rosaceae) has been subjected to several phylogenetic studies, but resolving its evolutionary history has proven challenging. Previous analyses recovered six, informally named, groups: the Argentea, Ivesioid, Fragarioides, Reptans, Alba and Anserina clades, but the relationships among some of these clades differ between data sets. The Reptans clade, which includes the type species of Potentilla, has been noticed to shift position between plastid and nuclear ribosomal data sets. We studied this incongruence by analysing four low-copy nuclear markers, in addition to chloroplast and nuclear ribosomal data, with a set of Bayesian phylogenetic and Multispecies Coalescent (MSC) analyses. A selective taxon removal strategy demonstrated that the included representatives from the Fragarioides clade, P. dickinsii and P. fragarioides, were the main sources of the instability seen in the trees. The Fragarioides species showed different relationships in each gene tree, and were only supported as a monophyletic group in a single marker when the Reptans clade was excluded from the analysis. The incongruences could not be explained by allopolyploidy, but rather by homoploid hybridization, incomplete lineage sorting or taxon sampling effects. When P. dickinsii and P. fragarioides were removed from the data set, a fully resolved, supported backbone phylogeny of Potentilla was obtained in the MSC analysis. Additionally, indications of autopolyploid origins of the Reptans and Ivesioid clades were discovered in the low-copy gene trees.

Download Full-text