scholarly journals New methods to calculate concordance factors for phylogenomic datasets

2018 ◽  
Author(s):  
Bui Quang Minh ◽  
Matthew W. Hahn ◽  
Robert Lanfear

AbstractWe implement two measures for quantifying genealogical concordance in phylogenomic datasets: the gene concordance factor (gCF) and the novel site concordance factor (sCF). For every branch of a reference tree, gCF is defined as the percentage of “decisive” gene trees containing that branch. This measure is already in wide usage, but here we introduce a package that calculates it while accounting for variable taxon coverage among gene trees. sCF is a new measure defined as the percentage of decisive sites supporting a branch in the reference tree. gCF and sCF complement classical measures of branch support in phylogenetics by providing a full description of underlying disagreement among loci and sites. An easy to use implementation and tutorial is freely available in the IQ-TREE software package (http://www.iqtree.org).

2020 ◽  
Vol 37 (9) ◽  
pp. 2727-2733 ◽  
Author(s):  
Bui Quang Minh ◽  
Matthew W Hahn ◽  
Robert Lanfear

Abstract We implement two measures for quantifying genealogical concordance in phylogenomic data sets: the gene concordance factor (gCF) and the novel site concordance factor (sCF). For every branch of a reference tree, gCF is defined as the percentage of “decisive” gene trees containing that branch. This measure is already in wide usage, but here we introduce a package that calculates it while accounting for variable taxon coverage among gene trees. sCF is a new measure defined as the percentage of decisive sites supporting a branch in the reference tree. gCF and sCF complement classical measures of branch support in phylogenetics by providing a full description of underlying disagreement among loci and sites. An easy to use implementation and tutorial is freely available in the IQ-TREE software package (http://www.iqtree.org/doc/Concordance-Factor, last accessed May 13, 2020).


2020 ◽  
Vol 26 (2) ◽  
pp. 350-356
Author(s):  
Anca Sîrbu

AbstractWith the rapid onset of an unprecedented lifestyle due to the new coronavirus COVID-19 the world academic scene was forced to reform and adapt to the novel circumstances. Although online education cannot be regarded as a groundbreaking endeavour anymore in the21st century, its current character of exclusivity calls for deeper understanding of, and a sharper focus on the “end-consumer” thereof as well as more cautious procedures to be exercised while teaching. While millennials are no longer thought of as being born with a silver spoon in their mouth but with an iPad or any sort of device in their hand (irrespective of their social status), adults are more hesitant when coerced to alter course unexpectedly and turn to new methods of attaining their learning goals. This is why proper communicative approaches need to be thoroughly considered by online instructors. This article aims at presenting teachers with a set of strategies to employ when the beneficiaries of online academic education are adult learners.


Author(s):  
Mark S. Hibbins ◽  
Matthew J.S. Gibson ◽  
Matthew W. Hahn

AbstractThe incongruence of character states with phylogenetic relationships is often interpreted as evidence of convergent evolution. However, trait evolution along discordant gene trees can also generate these incongruences – a phenomenon known as hemiplasy. Classic phylogenetic comparative methods do not account for discordance, resulting in incorrect inferences about the number of times a trait has evolved, and therefore about convergence. Biological sources of discordance include incomplete lineage sorting (ILS) and introgression, but only ILS has received theoretical consideration in the context of hemiplasy. Here, we derive expectations for the probabilities of hemiplasy and homoplasy with ILS and introgression acting simultaneously. We find that introgression makes hemiplasy more likely than ILS alone, suggesting that methods that account for discordance only due to ILS will be conservative. We also present a method for making statistical inferences about the relative probabilities of hemiplasy and homoplasy in empirical datasets. Our method is implemented in the software package HeIST (Hemiplasy Inference Simulation Tool), and estimates the most probable number of transitions among character states given a set of relationships with discordance. HeIST can accommodate ILS and introgression simultaneously, and can be applied to phylogenies where the number of taxa makes finding an analytical solution impractical. We apply this tool to two empirical cases of apparent trait convergence in the presence of high levels of discordance, one of which involves introgression between the convergent lineages. In both cases we find that hemiplasy is likely to contribute to the observed trait incongruences.


2021 ◽  
pp. 287-302
Author(s):  
T. V. Shvetsova ◽  
V. E. Shakhova

The results of the study of the chronotope in Russian-language compositions based on the novel about Robinson’s adventures are presented. The material for the work was A. E. Razin’s novel “The Real Robinson” (1860) and Lev Tolstoy’s story “Robinson” (1862). The issues of the specifics of the representation of the chronotopic in the works of Russian writers are considered. The relevance of the study is due to the appeal to the universal of the chronotope, which contains an exhaustive toolkit for the artistic embodiment of images of space and time; as well as the search for new methods of literary analysis of the text. It is shown that in the analyzed texts, a kind of fusion of Russianlanguage compositions with a foreigncultural text in the aspect of a chronotope is realized. The similarities and differences in the rethinking of the story of Robinson are shown on the example of the model of textual connexity, the national specifics of the representation of the image of Robinson are indicated. It is noted that the external and internal chronotopes are retransmitted from work to work and create the basis for the emergence of the author’s intentions. It is proved that chronotopic analysis allows one to form an idea of the peculiarities of the Russian-language interpretation of the story of Robinson.


2019 ◽  
Author(s):  
T Jeffrey Cole ◽  
Michael S Brewer

In the era of Next-Generation Sequencing and shotgun proteomics, the sequences of animal toxigenic proteins are being generated at rates exceeding the pace of traditional means for empirical toxicity verification. To facilitate the automation of toxin identification from protein sequences, we trained Recurrent Neural Networks with Gated Recurrent Units on publicly available datasets. The resulting models are available via the novel software package TOXIFY, allowing users to infer the probability of a given protein sequence being a venom protein. TOXIFY is more than 20X faster and uses over an order of magnitude less memory than previously published methods. Additionally, TOXIFY is more accurate, precise, and sensitive at classifying venom proteins. Availability: https://www.github.com/tijeco/toxify


Author(s):  
Lifang Chen ◽  
Dai Cao ◽  
Yuan Liu

Jigsaw puzzle algorithm is important as it can be applied to many areas such as biology, image editing, archaeology and incomplete crime-scene reconstruction. But, still, some problems exist in the process of practical application, for example, when there are a large number of similar objects in the puzzle fragments, the error rate will reach 30%–50%. When some fragments are missing, most algorithms fail to restore the images accurately. When the number of fragments of the jigsaw puzzle is large, efficiency is reduced. During the intelligent puzzle, mainly the Sum of Squared Distance Scoring (SSD), Mahalanobis Gradient Compatibility (MGC) and other metrics are used to calculate the similarity between the fragments. On the basis of these two measures, we put forward some new methods: 1. MGC is one of the most effective measures, but using MGC to reassemble the puzzle can cause an error image every 30 or 50 times, so we combine the Jaccard and MGC metric measure to compute the similarity between the image fragments, and reassemble the puzzle with a greedy algorithm. This algorithm not only reduces the error rate, but can also maintain a high accuracy in the case of a large number of fragments of similar objects. 2. For the lack of fragmentation and low efficiency, this paper uses a new method of SSD measurement and mark matrix, it is general in the sense that it can handle puzzles of unknown size, with fragments of unknown orientation, and even puzzles with missing fragments. The algorithm does not require any preset conditions and is more practical in real life. Finally, experiments show that the algorithm proposed in this paper improves not only the accuracy but also the efficiency of the operation.


2015 ◽  
Author(s):  
Kassian Kobert ◽  
Leonidas Salichos ◽  
Antonis Rokas ◽  
Alexandros Stamatakis

AbstractWe present, implement, and evaluate an approach to calculate the internode certainty and tree certainty on a given reference tree from a collection of partial gene trees. Previously, the calculation of these values was only possible from a collection of gene trees with exactly the same taxon set as the reference tree. An application to sets of partial gene trees requires mathematical corrections in the internode certainty and tree certainty calculations. We implement our methods in RAxML and test them on empirical data sets. These tests imply that the inclusion of partial trees does matter. However, in order to provide meaningful measurements, any data set should also include trees containing the full species set.


2020 ◽  
Vol 8 (11) ◽  
pp. 336-345
Author(s):  
Jasurbek Gulomov ◽  
◽  
Rayimjon Aliev ◽  
Murad Nasirov ◽  
Jakhongir Ziyoitdinov ◽  
...  

Nanotechnologies are entering every field. Nanoparticles have been widely used in medicine and technology. We decided to study the behavior of nanoparticles under the influence of light and its effects on solar cells, based on a number of properties. How gold and silver nanoparticles are introduced into the optical layer of the solar cell has been studied enough to affect the properties of the solar cell. However, the effect of silicon-based solar cell metal nanoparticles in the n domain on the solar cell has not been sufficiently studied. In addition, in this study, the properties of solar cells, which included nanoparticles of various shapes, were modeled. Since the end of the last century, new methods of modeling have been introduced into scientific research. A lot of modeling software has been developed. They are based on a numerical method. Synopsys program of Sentaurus TCAD software package was used in the modeling to ensure the accuracy and reliability of the research. Using Sentaurus TCAD, a model of a silicon-based solar cell with simple and various shapes of platinum nanoparticles embedded in the n field was developed. The focus is on determining the effect of the shape of a nanoparticle introduced on solar cells on its properties. The effect of nanoparticles on the optical and I-V characteristics of a solar cell is also analyzed in depth.


2018 ◽  
Author(s):  
Miraine Dávila Felipe ◽  
Jean-Baka Domelevo Entfellner ◽  
Frédéric Lemoine ◽  
Jakub Truszkowski ◽  
Olivier Gascuel

AbstractThe transfer distance (TD) was introduced in the classification framework and studied in the context of phylogenetic tree matching. Recently, Lemoine et al. (2018) showed that TD can be a powerful tool to assess the branch support of phylogenies with large data sets, thus providing a relevant alternative to Felsenstein’s bootstrap. This distance allows a reference branch β in a reference tree 𝒯 to be compared to a branch b from another tree T, both on the same set of n taxa. The TD between these branches is the number of taxa that must be transferred from one side of b to the other in order to obtain β. By taking the minimum TD from β to all branches in T we define the transfer index, denoted by ϕ(β, T), measuring the degree of agreement of β with T. Let us consider a reference branch β having p tips on its light side and define the transfer support (TS) as 1 – ϕ(β, T)/(p – 1). The aim of this article is to provide evidence that p 1 is a meaningful normalization constant in the definition of TS, and measure the statistical significance of TS, assuming that β is compared to a tree T drawn according to a null model. We obtain several results that shed light on these questions in a number of settings. In particular, we study the asymptotic behavior of TS when n tends to ∞, and fully characterize the distribution of ϕ when T is a caterpillar tree.


2017 ◽  
Author(s):  
Damien M. de Vienne ◽  
Fran Supek ◽  
Toni Gabaldon

AbstractBackgroundOvertraining occurs when an optimization process is applied for too many steps, leading to a model describing noise in addition to the signal present in the data. This effect may affect typical approaches for species tree reconstruction that use maximum likelihood optimization procedures on a small sample of concatenated genes. In this context, overtraining may result in trees better describing the specific evolutionary history of the sampled genes rather than the sought evolutionary relationships among the species.ResultsUsing a cross-validation-like approach on real and simulated datasets we showed that overtraining occurs in a significant fraction of cases, leading to species trees that are more distant from a gold-standard reference tree than a previously considered (and rejected) solution in the optimization process. However, we show that the shape of the likelihood curve is informative of the optimal stopping point. As expected, overtraining is aggravated in smaller gene samples and in datasets with increased levels of topological variation among gene trees, but occurs also in controlled, simulated scenarios where a common underlying topology is enforced.ConclusionsOvertraining is frequent in species tree reconstruction and leads to a final tree that is worse in describing the evolutionary relationships of the species under study than an earlier (and rejected) solution encountered during the likelihood optimization process. This result should help develop specific methods for species tree reconstruction in the future, and may improve our understanding of the complexity of tree likelihood landscapes.


Sign in / Sign up

Export Citation Format

Share Document