scholarly journals Interrogating Genomic-Scale Data to Resolve Recalcitrant Nodes in the Spider Tree of Life

Author(s):  
Siddharth Kulkarni ◽  
Robert J Kallal ◽  
Hannah Wood ◽  
Dimitar Dimitrov ◽  
Gonzalo Giribet ◽  
...  

Abstract Genome-scale data sets are converging on robust, stable phylogenetic hypotheses for many lineages; however, some nodes have shown disagreement across classes of data. We use spiders (Araneae) as a system to identify the causes of incongruence in phylogenetic signal between three classes of data: exons (as in phylotranscriptomics), noncoding regions (included in ultraconserved elements [UCE] analyses), and a combination of both (as in UCE analyses). Gene orthologs, coded as amino acids and nucleotides (with and without third codon positions), were generated by querying published transcriptomes for UCEs, recovering 1,931 UCE loci (codingUCEs). We expected that congeners represented in the codingUCE and UCEs data would form clades in the presence of phylogenetic signal. Noncoding regions derived from UCE sequences were recovered to test the stability of relationships. Phylogenetic relationships resulting from all analyses were largely congruent. All nucleotide data sets from transcriptomes, UCEs, or a combination of both recovered similar topologies in contrast with results from transcriptomes analyzed as amino acids. Most relationships inferred from low-occupancy data sets, containing several hundreds of loci, were congruent across Araneae, as opposed to high occupancy data matrices with fewer loci, which showed more variation. Furthermore, we found that low-occupancy data sets analyzed as nucleotides (as is typical of UCE data sets) can result in more congruent relationships than high occupancy data sets analyzed as amino acids (as in phylotranscriptomics). Thus, omitting data, through amino acid translation or via retention of only high occupancy loci, may have a deleterious effect in phylogenetic reconstruction.

2019 ◽  
Vol 11 (7) ◽  
pp. 1797-1812 ◽  
Author(s):  
Dong Zhang ◽  
Hong Zou ◽  
Cong-Jie Hua ◽  
Wen-Xiang Li ◽  
Shahid Mahboob ◽  
...  

Abstract The phylogeny of Isopoda, a speciose order of crustaceans, remains unresolved, with different data sets (morphological, nuclear, mitochondrial) often producing starkly incongruent phylogenetic hypotheses. We hypothesized that extreme diversity in their life histories might be causing compositional heterogeneity/heterotachy in their mitochondrial genomes, and compromising the phylogenetic reconstruction. We tested the effects of different data sets (mitochondrial, nuclear, nucleotides, amino acids, concatenated genes, individual genes, gene orders), phylogenetic algorithms (assuming data homogeneity, heterogeneity, and heterotachy), and partitioning; and found that almost all of them produced unique topologies. As we also found that mitogenomes of Asellota and two Cymothoida families (Cymothoidae and Corallanidae) possess inversed base (GC) skew patterns in comparison to other isopods, we concluded that inverted skews cause long-branch attraction phylogenetic artifacts between these taxa. These asymmetrical skews are most likely driven by multiple independent inversions of origin of replication (i.e., nonadaptive mutational pressures). Although the PhyloBayes CAT-GTR algorithm managed to attenuate some of these artifacts (and outperform partitioning), mitochondrial data have limited applicability for reconstructing the phylogeny of Isopoda. Regardless of this, our analyses allowed us to propose solutions to some unresolved phylogenetic debates, and support Asellota are the most likely candidate for the basal isopod branch. As our findings show that architectural rearrangements might produce major compositional biases even on relatively short evolutionary timescales, the implications are that proving the suitability of data via composition skew analyses should be a prerequisite for every study that aims to use mitochondrial data for phylogenetic reconstruction, even among closely related taxa.


2017 ◽  
Author(s):  
Xiaofan Zhou ◽  
Sarah Lutteropp ◽  
Lucas Czech ◽  
Alexandros Stamatakis ◽  
Moritz von Looz ◽  
...  

AbstractIncongruence, or topological conflict, is prevalent in genome-scale data sets but relatively few measures have been developed to quantify it. Internode Certainty (IC) and related measures were recently introduced to explicitly quantify the level of incongruence of a given internode (or internal branch) among a set of phylogenetic trees and complement regular branch support statistics in assessing the confidence of the inferred phylogenetic relationships. Since most phylogenomic studies contain data partitions (e.g., genes) with missing taxa and IC scores stem from the frequencies of bipartitions (or splits) on a set of trees, the calculation of IC scores requires adjusting the frequencies of bipartitions from these partial gene trees. However, when the proportion of missing data is high, current approaches that adjust bipartition frequencies in partial gene trees tend to overestimate IC scores and alternative adjustment approaches differ substantially from each other in their scores. To overcome these issues, we developed three new measures for calculating internode certainty that are based on the frequencies of quartets, which naturally apply to both comprehensive and partial trees. Our comparison of these new quartet-based measures to previous bipartition-based measures on simulated data shows that: 1) on comprehensive trees, both types of measures yield highly similar IC scores; 2) on partial trees, quartet-based measures generate more accurate IC scores; and 3) quartet-based measures are more robust to the absence of phylogenetic signal and errors in the phylogenetic relationships to be assessed. Additionally, analysis of 15 empirical phylogenomic data sets using our quartet-based measures suggests that numerous relationships remain unresolved despite the availability of genome-scale data. Finally, we provide an efficient open-source implementation of these quartet-based measures in the program QuartetScores, which is freely available at https://github.com/algomaus/QuartetScores.


2013 ◽  
Vol 42 (4) ◽  
pp. 2391-2404 ◽  
Author(s):  
Anton Shifman ◽  
Noga Ninyo ◽  
Uri Gophna ◽  
Sagi Snir

Abstract The evolutionary history of all life forms is usually represented as a vertical tree-like process. In prokaryotes, however, the vertical signal is partly obscured by the massive influence of horizontal gene transfer (HGT). The HGT creates widespread discordance between evolutionary histories of different genes as genomes become mosaics of gene histories. Thus, the Tree of Life (TOL) has been questioned as an appropriate representation of the evolution of prokaryotes. Nevertheless a common hypothesis is that prokaryotic evolution is primarily tree-like, and a routine effort is made to place new isolates in their appropriate location in the TOL. Moreover, it appears desirable to exploit non–tree-like evolutionary processes for the task of microbial classification. In this work, we present a novel technique that builds on the straightforward observation that gene order conservation (‘synteny’) decreases in time as a result of gene mobility. This is particularly true in prokaryotes, mainly due to HGT. Using a ‘synteny index’ (SI) that measures the average synteny between a pair of genomes, we developed the phylogenetic reconstruction tool ‘Phylo SI’. Phylo SI offers several attractive properties such as easy bootstrapping, high sensitivity in cases where phylogenetic signal is weak and computational efficiency. Phylo SI was tested both on simulated data and on two bacterial data sets and compared with two well-established phylogenetic methods. Phylo SI is particularly efficient on short evolutionary distances where synteny footprints remain detectable, whereas the nucleotide substitution signal is too weak for reliable sequence-based phylogenetic reconstruction. The method is publicly available at http://research.haifa.ac.il/ssagi/software/PhyloSI.zip.


2021 ◽  
Vol 72 (2) ◽  
pp. 603-617
Author(s):  
Moulay Zaidan Lahjouji-Seppälä ◽  
Achim Rabus

Abstract Quantitative, corpus based research on spontaneous spoken Carpathian Rusyn language can cause several data-related problems: Speakers are using ambivalent forms in different quantities, resulting in a biased data set – while a stricter data-cleaning process would lead to a large scale data loss. On top of that, polytomous categorical dependent variables are hard to analyze due to methodological limitations. This paper provides several approaches to face unbalanced and biased data sets containing variation of conjugational forms of the verb maty ‘to have’ and (po-)znaty ‘to know’ in Carpathian Rusyn language. Using resampling based methods like Cross-Validation, Bootstrapping and Random Forests, we provide a strategy for circumventing possible methodological pitfalls and gaining the most information from our precious data, without trying to p-hack the results. Calculating the predictive power of several sociolinguistic factors on linguistic variation, we can make valid statements about the (sociolinguistic) status of Rusyn and the stability of the old dialect continuum of Rusyn varieties.


Author(s):  
S. Ishikawa ◽  
T. Nakashima ◽  
T. Iizumi ◽  
M. C. Hare

Abstract The Global Yield Gap Atlas (GYGA) is an international project that addresses global food production capacity in the form of yield gaps (Yg). The GYGA project is unique in employing its original Climate Zonation Scheme (CZS) composed of three indexed factors, i.e. Growing Degree Days (GDD) related to temperature, Aridity Index (AI) related to available water and Temperature Seasonality (TS) related to annual temperature range, creating 300 Climate Zones (CZs) theoretically across the globe. In the present study, the GYGA CZs were identified for Japan on a municipality basis and analysis of variance (ANOVA) was performed on irrigated rice yield data sets, equating to actual yields (Ya) in the GYGA context, from long-term government statistics. The ANOVA was conducted for the data sets over two decades between 1994 and 2016 by assigning the GDD score of 6 levels and the TS score of 2 levels as fixed factors. Significant interactions with respect to Ya were observed between GDD score and TS score for 13 years out of 21 years implying the existence of favourable combinations of the GDD score and the TS score for rice cultivation. The implication was also supported by the observation with Yg. The lower values of coefficient of variance obtained from the CZs characterized by medium GDD scores indicated the stability over time of rice yields in these areas. These findings suggest a possibility that the GYGA-CZS can be recognized as a tool suitable to identify favourable CZs for growing crops.


1986 ◽  
Vol 6 (12) ◽  
pp. 4602-4610
Author(s):  
U Bond ◽  
M J Schlesinger

A chicken genomic library was screened to obtain genomic clones for ubiquitin genes. Two genes that differ in their genomic location and organization were identified. One gene, designated Ub I, contains four copies of the protein-coding sequence arranged in tandem, while the second gene, Ub II, contains three. The origin of the two major mRNAs that are induced after heat shock in chicken embryo fibroblasts was determined by generating DNA probes from the 5'-and 3'-noncoding regions of the two genes. Both mRNAs are transcribed from Ub I, the larger being the unspliced precursor of the smaller. A 674-base-pair intron was located within the 5'-noncoding region of Ub I. The second gene, Ub II, does not appear to code for an RNA species in normal or heat-shocked chicken embryo fibroblasts. The expression of ubiquitin mRNA during heat shock and recovery was examined. Addition of actinomycin D before heat shock completely abolished the response of ubiquitin mRNA to the stress. Analysis of the stability of the mRNA during recovery revealed that the mRNA accumulated during the heat shock is rapidly degraded with a half-life of approximately 1.5 h, suggesting a specialized but transient role for ubiquitin during heat shock.


1991 ◽  
Author(s):  
Barry Deakin

During the development of new stability regulations for the U.K. Department of Transport, doubt was cast over many of the assumptions made when assessing the stability of sailing vessels. In order to investigate the traditional methods a programme of work was undertaken including wind tunnel tests and full scale data acquisition. The work resulted in a much improved understanding of the behaviour of sailing vessels and indeed indicated that the conventional methods of stability assessment are invalid, the rules now applied in the U.K. being very different to those in use elsewhere. The paper concentrates on the model test techniques which were developed specifically for this project but which will have implications to other vessel types. The tests were of two kinds: measurement of the wind forces and moments on a sailing vessel; and investigation of the response of sailing vessels to gusts of wind. For the force and moment measurements models were mounted in a tank of water on a six component balance and tested in a large boundary layer wind tunnel. Previous tests in wind tunnels have always concentrated on performance and the heeling moments have not normally been measured correctly. As the measurements of heeling moment at a range of heel angles was of prime importance a new balance and mounting system was developed which enabled the above water part of the vessel to be modelled correctly, the underwater part to be unaffected by the wind, and the interface to be correctly represented without interference. Various effects were investigated including rig type, sheeting, heading, heel angle and wind gradient. The gust response tests were conducted with Froude scaled models floating in a pond set in the wind tunnel floor. A mechanism was installed in the tunnel which enabled gusts of various characteristics to be generated, and the roll response of the models was measured with a gyroscope. These tests provided information on the effects of inertia, damping, rolling and the characteristics of the gust. Sample results are presented to illustrate the uses to which these techniques have been put.


2013 ◽  
Vol 63 (2) ◽  
Author(s):  
Nur Syahidah Yusoff ◽  
Maman Abdurachman Djauhari

The stability of covariance matrix is a major issue in multivariate analysis. As can be seen in the literature, the most popular and widely used tests are Box M-test and Jennrich J-test introduced by Box in 1949 and Jennrich in 1970, respectively. These tests involve determinant of sample covariance matrix as multivariate dispersion measure. Since it is only a scalar representation of a complex structure, it cannot represent the whole structure. On the other hand, they are quite cumbersome to compute when the data sets are of high dimension since they do not only involve the computation of determinant of covariance matrix but also the inversion of a matrix. This motivates us to propose a new statistical test which is computationally more efficient and, if it is used simultaneously with M-test or J-test, we will have a better understanding about the stability of covariance structure. An example will be presented to illustrate its advantage


Sign in / Sign up

Export Citation Format

Share Document