Estimating Diversity Through Time using Molecular Phylogenies: Old and Species-Poor Frog Families are the Remnants of a Diverse Past

Mapping Intimacies ◽

10.1101/586420 ◽

2019 ◽

Cited By ~ 2

Author(s):

O. Billaud ◽

D. S. Moen ◽

T. L. Parsons ◽

H. Morion

Keyword(s):

Probabilistic Approach ◽

R Package ◽

Phylogenetic Comparative Methods ◽

Extinction Rate ◽

Competitive Effects ◽

Phenotypic Divergence ◽

Poor Group ◽

Diversity Dynamics ◽

Diversity Estimates ◽

Better Than

Estimating how the number of species in a given group varied in the deep past is of key interest to evolutionary biologists. However, current phylogenetic approaches for obtaining such estimates have limitations, such as providing unrealistic diversity estimates at the origin of the group. Here we develop a robust probabilistic approach for estimating Diversity-Through-Time (DTT) curves and uncertainty around these estimates from phylogenetic data. We show with simulations that under various realistic scenarios of diversification, this approach performs better than previously proposed approaches. We also characterize the effect of tree size and undersampling on the performance of the approach. We apply our method to understand patterns of species diversity in anurans (frogs and toads). We find that Archaeobatrachia – a species-poor group of old frog clades often found in temperate regions – formerly had much higher diversity and net diversification rate, but the group declined in diversity as younger, nested clades diversified. This diversity decline seems to be linked to a decline in speciation rate rather than an increase in extinction rate. Our approach, implemented in the R package RPANDA, should be useful for evolutionary biologists interested in understanding how past diversity dynamics have shaped present-day diversity. It could also be useful in other contexts, such as for analyzing clade-clade competitive effects or the effect of species richness on phenotypic divergence. [phylogenetic comparative methods; birth-death models; diversity curves; diversification; extinction; anurans]

Download Full-text

Estimating Diversity Through Time Using Molecular Phylogenies: Old and Species-Poor Frog Families are the Remnants of a Diverse Past

Systematic Biology ◽

10.1093/sysbio/syz057 ◽

2019 ◽

Cited By ~ 1

Author(s):

O Billaud ◽

D S Moen ◽

T L Parsons ◽

H Morlon

Keyword(s):

Species Richness ◽

Probabilistic Approach ◽

R Package ◽

Extinction Rate ◽

Competitive Effects ◽

Phenotypic Divergence ◽

Poor Group ◽

Diversity Dynamics ◽

Diversity Estimates ◽

Better Than

Abstract Estimating how the number of species in a given group varied in the deep past is of key interest to evolutionary biologists. However, current phylogenetic approaches for obtaining such estimates have limitations, such as providing unrealistic diversity estimates at the origin of the group. Here, we develop a robust probabilistic approach for estimating diversity through time curves and uncertainty around these estimates from phylogenetic data. We show with simulations that under various realistic scenarios of diversification, this approach performs better than previously proposed approaches. We also characterize the effect of tree size and undersampling on the performance of the approach. We apply our method to understand patterns of species diversity in anurans (frogs and toads). We find that Archaeobatrachia—a species-poor group of old frog clades often found in temperate regions—formerly had much higher diversity and net diversification rate, but the group declined in diversity as younger, nested clades diversified. This diversity decline seems to be linked to a decline in speciation rate rather than an increase in extinction rate. Our approach, implemented in the R package RPANDA, should be useful for evolutionary biologists interested in understanding how past diversity dynamics have shaped present-day diversity. It could also be useful in other contexts, such as for analyzing clade–clade competitive effects or the effect of species richness on phenotypic divergence.

Download Full-text

ACDC: Analysis of Congruent Diversification Classes

10.1101/2022.01.12.476142 ◽

2022 ◽

Author(s):

Sebastian Hoehna ◽

Bjoern Tore Kopperud ◽

Andrew F Magee

Keyword(s):

Congruence Class ◽

R Package ◽

Rate Function ◽

Diversification Rate ◽

Extinction Rate ◽

Diversification Rates ◽

Common Features ◽

Alternative Hypotheses ◽

Rate Functions ◽

Congruence Classes

Diversification rates inferred from phylogenies are not identifiable. There are infinitely many combinations of speciation and extinction rate functions that have the exact same likelihood score for a given phylogeny, building a congruence class. The specific shape and characteristics of such congruence classes have not yet been studied. Whether speciation and extinction rate functions within a congruence class share common features is also not known. Instead of striving to make the diversification rates identifiable, we can embrace their inherent non-identifiable nature. We use two different approaches to explore a congruence class: (i) testing of specific alternative hypotheses, and (ii) randomly sampling alternative rate function within the congruence class. Our methods are implemented in the open-source R package ACDC (https://github.com/afmagee/ACDC). ACDC provides a flexible approach to explore the congruence class and provides summaries of rate functions within a congruence class. The summaries can highlight common trends, i.e. increasing, flat or decreasing rates. Although there are infinitely many equally likely diversification rate functions, these can share common features. ACDC can be used to assess if diversification rate patterns are robust despite non-identifiability. In our example, we clearly identify three phases of diversification rate changes that are common among all models in the congruence class. Thus, congruence classes are not necessarily a problem for studying historical patterns of biodiversity from phylogenies.

Download Full-text

optimalFlow: optimal transport approach to flow cytometry gating and population matching

BMC Bioinformatics ◽

10.1186/s12859-020-03795-w ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Eustasio del Barrio ◽

Hristo Inouzhe ◽

Jean-Michel Loubes ◽

Carlos Matrán ◽

Agustín Mayo-Íscar

Keyword(s):

Flow Cytometry ◽

Supervised Learning ◽

Optimal Transport ◽

Cell Types ◽

R Package ◽

Supervised Machine Learning ◽

Intrinsic Variability ◽

Flow Cytometry Data ◽

Different Characteristics ◽

Better Than

Abstract Background Data obtained from flow cytometry present pronounced variability due to biological and technical reasons. Biological variability is a well-known phenomenon produced by measurements on different individuals, with different characteristics such as illness, age, sex, etc. The use of different settings for measurement, the variation of the conditions during experiments and the different types of flow cytometers are some of the technical causes of variability. This mixture of sources of variability makes the use of supervised machine learning for identification of cell populations difficult. The present work is conceived as a combination of strategies to facilitate the task of supervised gating. Results We propose optimalFlowTemplates, based on a similarity distance and Wasserstein barycenters, which clusters cytometries and produces prototype cytometries for the different groups. We show that supervised learning, restricted to the new groups, performs better than the same techniques applied to the whole collection. We also present optimalFlowClassification, which uses a database of gated cytometries and optimalFlowTemplates to assign cell types to a new cytometry. We show that this procedure can outperform state of the art techniques in the proposed datasets. Our code is freely available as optimalFlow, a Bioconductor R package at https://bioconductor.org/packages/optimalFlow. Conclusions optimalFlowTemplates + optimalFlowClassification addresses the problem of using supervised learning while accounting for biological and technical variability. Our methodology provides a robust automated gating workflow that handles the intrinsic variability of flow cytometry data well. Our main innovation is the methodology itself and the optimal transport techniques that we apply to flow cytometry analysis.

Download Full-text

On the Optimal Hyperparameter Behavior in Bayesian Clustering

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2015.p0818 ◽

2015 ◽

Vol 19 (6) ◽

pp. 818-824

Author(s):

Keisuke Yamazaki ◽

Keyword(s):

Prior Distribution ◽

Evaluation Method ◽

Probabilistic Approach ◽

Expectation Maximization Algorithm ◽

Parametric Models ◽

Likelihood Method ◽

Bayesian Clustering ◽

Mixture Of Gaussian ◽

Theoretical Accuracy ◽

Better Than

In a probabilistic approach to cluster analysis, parametric models, such as a mixture of Gaussian distributions, are often used. Since the parameter is unknown, it is necessary to estimate both the parameter and the labels of the clusters. Recently, the statistical properties of Bayesian clustering have been studied. The theoretical accuracy of the label estimation has been analyzed, and it has been found to be better than the maximum-likelihood method, which is based on the expectation-maximization algorithm. However, the effect of a prior distribution on the clustering result remains unknown. The prior distribution has the parameter, which is the hyperparameter. In the present paper, we theoretically and experimentally investigate the behavior of the optimal hyperparameter, and we propose an evaluation method for the clustering result, based on the prior optimization.

Download Full-text

Variance Adaptive Shrinkage (vash): Flexible Empirical Bayes estimation of variances

10.1101/048660 ◽

2016 ◽

Author(s):

Mengyin Lu ◽

Matthew Stephens

Keyword(s):

Gamma Distribution ◽

Empirical Bayes ◽

Variance Estimation ◽

R Package ◽

Bayes Estimation ◽

Inverse Gamma Distribution ◽

Inverse Gamma ◽

Tissues Expression ◽

Flexible Model ◽

Better Than

AbstractMotivationWe consider the problem of estimating variances on a large number of “similar” units, when there are relatively few observations on each unit. This problem is important in genomics, for example, where it is often desired to estimate variances for thousands of genes (or some other genomic unit) from just a few measurements on each. A common approach to this problem is to use an Empirical Bayes (EB) method that assumes the variances among genes follow an inverse-gamma distribution. Here we describe a more flexible EB method, whose main assumption is that the distribution of the variances (or, as an alternative, the precisions) is unimodal.ResultsWe show that this more flexible assumption provides competitive performance with existing methods when the variances truly come from an inverse-gamma distribution, and can outperform them when the distribution of the variances is more complex. In analyses of several human gene expression datasets from the Genotype Tissues Expression (GTEx) consortium, we find that our more flexible model often fits the data appreciably better than the single inverse gamma distribution. At the same time we find that, for variance estimation, the differences between methods is often small, suggesting that the simpler methods will often suffice in practice.AvailabilityOur methods are implemented in an R package vashr available from http://github.com/mengyin/vashr.

Download Full-text

Statistical Reporting Inconsistencies in Experimental Philosophy

10.31235/osf.io/z65fv ◽

2017 ◽

Cited By ~ 1

Author(s):

Matteo Colombo ◽

Georgi Duev ◽

Michele B. Nuijten ◽

Jan Sprenger

Keyword(s):

Null Hypothesis ◽

Experimental Philosophy ◽

R Package ◽

Experimental Methods ◽

Point Of View ◽

Behavioral Sciences ◽

Null Hypothesis Significance Testing ◽

Psychological Science ◽

Philosophical Questions ◽

Better Than

Experimental philosophy (x-phi) is a young field of research in the intersection of philosophy and psychology. It aims to make progress on philosophical questions by using experimental methods traditionally associated with the psychological and behavioral sciences, such as null hypothesis significance testing (NHST). Motivated by recent discussions about a methodological crisis in the behavioral sciences, questions have been raised about the methodological standards of x-phi. Here, we focus on one aspect of this question, namely the rate of inconsistencies in statistical reporting. Previous research has examined the extent to which published articles in psychology and other behavioral sciences present statistical inconsistencies in reporting the results of NHST. In this study, we used the R package statcheck to detect statistical inconsistencies in x-phi, and compared rates of inconsistencies in psychology and philosophy. We found that rates of inconsistencies in x-phi are lower than in the psychological and behavioral sciences. From the point of view of statistical reporting consistency, x-phi seems to do no worse, and perhaps even better, than psychological science.

Download Full-text

normGAM: an R package to remove systematic biases in genome architecture mapping data

BMC Genomics ◽

10.1186/s12864-019-6331-8 ◽

2019 ◽

Vol 20 (S12) ◽

Cited By ~ 2

Author(s):

Tong Liu ◽

Zheng Wang

Keyword(s):

Fragment Length ◽

R Package ◽

Genome Architecture ◽

Systematic Bias ◽

Length Bias ◽

Detection Frequency ◽

Normalization Methods ◽

New Type ◽

Systematic Biases ◽

Better Than

Abstract Background The genome architecture mapping (GAM) technique can capture genome-wide chromatin interactions. However, besides the known systematic biases in the raw GAM data, we have found a new type of systematic bias. It is necessary to develop and evaluate effective normalization methods to remove all systematic biases in the raw GAM data. Results We have detected a new type of systematic bias, the fragment length bias, in the genome architecture mapping (GAM) data, which is significantly different from the bias of window detection frequency previously mentioned in the paper introducing the GAM method but is similar to the bias of distances between restriction sites existing in raw Hi-C data. We have found that the normalization method (a normalized variant of the linkage disequilibrium) used in the GAM paper is not able to effectively eliminate the new fragment length bias at 1 Mb resolution (slightly better at 30 kb resolution). We have developed an R package named normGAM for eliminating the new fragment length bias together with the other three biases existing in raw GAM data, which are the biases related to window detection frequency, mappability, and GC content. Five normalization methods have been implemented and included in the R package including Knight-Ruiz 2-norm (KR2, newly designed by us), normalized linkage disequilibrium (NLD), vanilla coverage (VC), sequential component normalization (SCN), and iterative correction and eigenvector decomposition (ICE). Conclusions Based on our evaluations, the five normalization methods can eliminate the four biases existing in raw GAM data, with VC and KR2 performing better than the others. We have observed that the KR2-normalized GAM data have a higher correlation with the KR-normalized Hi-C data on the same cell samples indicating that the KR-related methods are better than the others for keeping the consistency between the GAM and Hi-C experiments. Compared with the raw GAM data, the normalized GAM data are more consistent with the normalized distances from the fluorescence in situ hybridization (FISH) experiments. The source code of normGAM can be freely downloaded from http://dna.cs.miami.edu/normGAM/.

Download Full-text

Macrofungal diversity of a temperate oak forest: a test of species richness estimators

Canadian Journal of Botany ◽

10.1139/b99-055 ◽

1999 ◽

Vol 77 (7) ◽

pp. 1014-1027 ◽

Cited By ~ 14

Author(s):

John Paul Schmit ◽

John F Murphy ◽

Gregory M Mueller

Keyword(s):

Species Richness ◽

Species Diversity ◽

Deciduous Forest ◽

Hardwood Forests ◽

Oak Forest ◽

Species Richness Estimation ◽

Indiana Dunes ◽

Richness Estimators ◽

Diversity Estimates ◽

Better Than

Two 0.1-ha plots, each divided into 10 contiguous subplots, were established in a Quercus-dominated deciduous forest in the Indiana Dunes National Lakeshore. Macrofungi were surveyed on these plots at weekly intervals during the fruiting season over 3 years. During this survey 177 species were recorded, including 30 species inhabiting leaf litter, 36 ectomycorrhizal species, 29 non-mycorrhizal soil-inhabiting species, and 79 wood-inhabiting species. This species richness is comparable to, but slightly higher than, that reported by other plot-based studies undertaken in hardwood forests. We compared the ability of seven species-richness estimation techniques to determine the true species richness on these plots. While some estimators performed better than others, in general the estimations were too low based on the following year's data and were not consistent from year to year. We found some evidence of spatial autocorrelation of communities of fungi found in adjacent subplots. This indicates that the benefit of using contiguous subplots to increase the homogeneity of the area sampled needs to be balanced against the possibility of underestimating the species richness of an area because of spatial autocorrelation.Key words: macrofungi, species diversity, diversity estimates, Indiana Dunes.

Download Full-text

Denton’s Temporal Disaggregation Methods with the Application of Bangladesh’s Annual Gross Domestic Product for Quarterly Benchmarking and Forecasting

10.31124/advance.16437096 ◽

2021 ◽

Author(s):

Mohammad Rafiqul Islam

Keyword(s):

Gross Domestic Product ◽

R Package ◽

Original Method ◽

Base Metals ◽

Mathematical Methods ◽

The Real ◽

Mean Squared Errors ◽

Temporal Disaggregation ◽

Difference Methods ◽

Better Than

<p>Different methods of temporal disaggregation are discussed in detail; mainly the methods developed by Denton in 1971 and other purely mathematical methods. First, the original method developed by Denton and its solution are described by referencing Denton’s original article. The Cholette–Dagum regression-based method (or Denton-Cholette method) is also included to enrich the comparison. Bangladesh’s annual export figures are then disaggregated into a quarterly series by Denton’s additive and proportional (first and second difference) methods, and the Denton-Cholette additive and proportional (first and second difference) methods by using R package “tempdisagg”. The quarterly imports of capital goods and others (Iron, steel and other base metals; and capital machinery) in Bangladesh are used as the indicator series for the fiscal years FY2009 to FY2019. By comparing the estimated series with the real quarterly exports series, with the aid of root mean squared errors, it is concluded that the Denton-Cholette additive method (the first difference) performs better than the Denton-Cholette’s proportional variants as well as Denton’s additive and proportional variants.<b></b></p>

Download Full-text

Paleobiogeography, paleoecology, diversity, and speciation patterns in the Eublastoidea (Blastozoa: Echinodermata)

Paleobiology ◽

10.1017/pab.2020.27 ◽

2020 ◽

pp. 1-15

Author(s):

Jennifer E. Bauer

Keyword(s):

Evolutionary History ◽

R Package ◽

Sympatric Speciation ◽

Probabilistic Methods ◽

Global Changes ◽

Environmental Preferences ◽

Biogeographic Patterns ◽

Dispersal Events ◽

Future Work ◽

Diversity Dynamics

Abstract Understanding the distribution of taxa in space and time is key to understanding diversity dynamics. The fossil record provides an avenue to assess these patterns on vast timescales and through major global changes. The Eublastoidea were a conservatively plated Paleozoic echinoderm clade that range from the middle Silurian to the end-Permian. The geographic distribution of the eublastoids, as a whole, has been qualitatively assessed but has historically lacked a quantitative analysis. This is the first examination of the Eublastoidea using probabilistic methods within the R package BioGeoBEARS to assess macroevolutionary trends. Results provide an updated understanding of eublastoid diversity with new peaks and troughs in diversity through their evolutionary history. Lithology is examined in an evolutionary framework and does not have clear evolutionary trends, and there is much work to be done regarding environmental preferences. Biogeographic patterns do not recover precise group origins but do support the previous work that outlines Eublastoidea as a Laurentian clade. Sympatric speciation events dominant the clade's history but are likely exaggerated due to the highly combined areas. Vicariance events are rare and restricted to the Silurian and Devonian, and dispersal events are more common throughout the evolutionary history. Pathways allowing for lineage migrations are noted between southern Laurussia and China in the Devonian and Carboniferous and southern Laurussia and eastern Gondwana in the Carboniferous. Future work will include the addition of more non-Laurentian species into the estimated phylogeny to better estimate these global patterns.

Download Full-text