Inference of Adaptive Shifts for Multivariate Correlated Traits

Mapping Intimacies ◽

10.1101/146191 ◽

2017 ◽

Cited By ~ 2

Author(s):

Paul Bastide ◽

Cécile Ané ◽

Stéphane Robin ◽

Mahendra Mariadassou

Keyword(s):

Phylogenetic Tree ◽

Missing Values ◽

Expectation Maximization Algorithm ◽

Selection Criterion ◽

Principal Component ◽

Likelihood Estimation ◽

R Package ◽

Stabilizing Selection ◽

New World Monkeys ◽

Wide Range

AbstractTo study the evolution of several quantitative traits, the classical phylogenetic comparative framework consists of a multivariate random process running along the branches of a phylogenetic tree. The Ornstein-Uhlenbeck (OU) process is sometimes preferred to the simple Brownian Motion (BM) as it models stabilizing selection toward an optimum. The optimum for each trait is likely to be changing over the long periods of time spanned by large modern phylogenies. Our goal is to automatically detect the position of these shifts on a phylogenetic tree, while accounting for correlations between traits, which might exist because of structural or evolutionary constraints. We show that, in the presence shifts, phylogenetic Principal Component Analysis (pPCA) fails to decorrelate traits efficiently, so that any method aiming at finding shift needs to deal with correlation simultaneously. We introduce here a simplification of the full multivariate OU model, named scalar OU (scOU), which allows for noncausal correlations and is still computationally tractable. We extend the equivalence between the OU and a BM on a re-scaled tree to our multivariate framework. We describe an Expectation Maximization algorithm that allows for a maximum likelihood estimation of the shift positions, associated with a new model selection criterion, accounting for the identifiability issues for the shift localization on the tree. The method, freely available as an R-package (PhylogeneticEM) is fast, and can deal with missing values. We demonstrate its efficiency and accuracy compared to another state-of-the-art method (ℓ1ou) on a wide range of simulated scenarios, and use this new framework to re-analyze recently gathered datasets on New World Monkeys and Anolis lizards.

Download Full-text

Maximizing the Reliability of Cross-National Measures of Presidential Power

British Journal of Political Science ◽

10.1017/s0007123414000465 ◽

2014 ◽

Vol 46 (4) ◽

pp. 731-741 ◽

Cited By ~ 39

Author(s):

David Doyle ◽

Robert Elgie

Keyword(s):

Expectation Maximization ◽

Expectation Maximization Algorithm ◽

Principal Component ◽

Likelihood Estimation ◽

Standard Errors ◽

Presidential Power ◽

Single Measure ◽

Future Studies ◽

Time Periods ◽

Cross National

This article aims to maximize the reliability of presidential power scores for a larger number of countries and time periods than currently exists for any single measure, and in a way that is replicable and easy to update. It begins by identifying all of the studies that have estimated the effect of a presidential power variable, clarifying what scholars have attempted to capture when they have operationalized the concept of presidential power. It then identifies all the measures of presidential power that have been proposed over the years, noting the problems associated with each. To generate the new set of presidential power scores, the study draws upon the comparative and local knowledge embedded in existing measures of presidential power. Employing principal component analysis, together with the expectation maximization algorithm and maximum likelihood estimation, a set of presidential power scores is generated for a larger set of countries and country time periods than currently exists, reporting 95 per cent confidence intervals and standard errors for the scores. Finally, the implications of the new set of scores for future studies of presidential power is discussed.

Download Full-text

Patternize: An R Package For Quantifying Color Pattern Variation

10.1101/121962 ◽

2017 ◽

Author(s):

Steven M. Van Belleghem ◽

Riccardo Papa ◽

Humberto Ortiz-Zuazaga ◽

Frederik Hendrickx ◽

Chris Jiggins ◽

...

Keyword(s):

Association Studies ◽

Genetic Association Studies ◽

Image Data ◽

Principal Component ◽

R Package ◽

Color Pattern ◽

Color Patterns ◽

Wide Range ◽

Potential Applications ◽

Population Comparisons

The use of image data to quantify, study and compare variation in the colors and patterns of organisms requires the alignment of images to establish homology, followed by color-based segmentation of images. Here we describe an R package for image alignment and segmentation that has applications to quantify color patterns in a wide range of organisms. patternize is an R package that quantifies variation in color patterns obtained from image data. patternize first defines homology between pattern positions across specimens either through manually placed homologous landmarks or automated image registration. Pattern identification is performed by categorizing the distribution of colors using an RGB threshold, k-means clustering or watershed transformation. We demonstrate that patternize can be used for quantification of the color patterns in a variety of organisms by analyzing image data for butterflies, guppies, spiders and salamanders. Image data can be compared between sets of specimens, visualized as heatmaps and analyzed using principal component analysis (PCA). patternize has potential applications for fine scale quantification of color pattern phenotypes in population comparisons, genetic association studies and investigating the basis of color pattern variation across a wide range of organisms.

Download Full-text

Missing Data - Better "Not to Have Them", but What If You Do? (Part 1)

Marketing ZFP ◽

10.15358/0344-1369-2019-4-21 ◽

2019 ◽

Vol 41 (4) ◽

pp. 21-32

Author(s):

Dirk Temme ◽

Sarah Jensen

Keyword(s):

Missing Data ◽

Statistical Power ◽

Missing Values ◽

Graphical Representation ◽

Marketing Research ◽

Likelihood Estimation ◽

Parameter Estimates ◽

Full Information Maximum Likelihood ◽

Definition Of ◽

Traditional Approaches

Missing values are ubiquitous in empirical marketing research. If missing data are not dealt with properly, this can lead to a loss of statistical power and distorted parameter estimates. While traditional approaches for handling missing data (e.g., listwise deletion) are still widely used, researchers can nowadays choose among various advanced techniques such as multiple imputation analysis or full-information maximum likelihood estimation. Due to the available software, using these modern missing data methods does not pose a major obstacle. Still, their application requires a sound understanding of the prerequisites and limitations of these methods as well as a deeper understanding of the processes that have led to missing values in an empirical study. This article is Part 1 and first introduces Rubin’s classical definition of missing data mechanisms and an alternative, variable-based taxonomy, which provides a graphical representation. Secondly, a selection of visualization tools available in different R packages for the description and exploration of missing data structures is presented.

Download Full-text

It is better an approximate answer to the right question than the exact answer to the wrong question : the case of the psychometric analysis of the ASQ:SE

10.31234/osf.io/a5tdf ◽

2020 ◽

Author(s):

Luis Anunciacao ◽

janet squires ◽

J. Landeira-Fernandez

Keyword(s):

Internal Structure ◽

Statistical Methods ◽

Principal Component ◽

Psychological Theory ◽

Published Data ◽

Multivariate Statistical ◽

Exact Answer ◽

Wide Range ◽

Ages And Stages Questionnaire ◽

The Right

One of the main activities in psychometrics is to analyze the internal structure of a test. Multivariate statistical methods, including Exploratory Factor analysis (EFA) and Principal Component Analysis (PCA) are frequently used to do this, but the growth of Network Analysis (NA) places this method as a promising candidate. The results obtained by these methods are of valuable interest, as they not only produce evidence to explore if the test is measuring its intended construct, but also to deal with the substantive theory that motivated the test development. However, these different statistical methods come up with different answers, providing the basis for different analytical and theoretical strategies when one needs to choose a solution. In this study, we took advantage of a large volume of published data (n = 22,331) obtained by the Ages and Stages Questionnaire Social-Emotional (ASQ:SE), and formed a subset of 500 children to present and discuss alternative psychometric solutions to its internal structure, and also to its subjacent theory. The analyses were based on a polychoric matrix, the number of factors to retain followed several well-known rules of thumb, and a wide range of exploratory methods was fitted to the data, including EFA, PCA, and NA. The statistical outcomes were divergent, varying from 1 to 6 domains, allowing a flexible interpretation of the results. We argue that the use of statistical methods in the absence of a well-grounded psychological theory has limited applications, despite its appeal. All data and codes are available at https://osf.io/z6gwv/.

Download Full-text

Maintenance of Quantitative Genetic Variation

10.1093/oso/9780198830870.003.0028 ◽

2018 ◽

Author(s):

Bruce Walsh ◽

Michael Lynch

Keyword(s):

Genetic Variation ◽

Quantitative Genetics ◽

Stabilizing Selection ◽

Quantitative Genetic ◽

Wide Range

One of the major unresolved issues in quantitative genetics is what accounts for the amount of standing genetic variation in traits. A wide range of models, all reviewed in this chapter, have been proposed, but none fit the data, either giving too much variation or too little apparent stabilizing selection.

Download Full-text

Adjustment models for multivariate geodetic time series with vector-autoregressive errors

Journal of Applied Geodesy ◽

10.1515/jag-2021-0013 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Boris Kargoll ◽

Alexander Dorndorf ◽

Mohammad Omidalizarandi ◽

Jens-André Paffenholz ◽

Hamza Alkhatib

Keyword(s):

Least Squares Method ◽

Functional Model ◽

Expectation Maximization Algorithm ◽

Two Dimensions ◽

Vector Autoregressive ◽

Wide Range ◽

Engineering Geodesy ◽

Cross Correlations ◽

T Distribution ◽

Multivariate T

Abstract In this contribution, a vector-autoregressive (VAR) process with multivariate t-distributed random deviations is incorporated into the Gauss-Helmert model (GHM), resulting in an innovative adjustment model. This model is versatile since it allows for a wide range of functional models, unknown forms of auto- and cross-correlations, and outlier patterns. Subsequently, a computationally convenient iteratively reweighted least squares method based on an expectation maximization algorithm is derived in order to estimate the parameters of the functional model, the unknown coefficients of the VAR process, the cofactor matrix, and the degree of freedom of the t-distribution. The proposed method is validated in terms of its estimation bias and convergence behavior by means of a Monte Carlo simulation based on a GHM of a circle in two dimensions. The methodology is applied in two different fields of application within engineering geodesy: In the first scenario, the offset and linear drift of a noisy accelerometer are estimated based on a Gauss-Markov model with VAR and multivariate t-distributed errors, as a special case of the proposed GHM. In the second scenario real laser tracker measurements with outliers are adjusted to estimate the parameters of a sphere employing the proposed GHM with VAR and multivariate t-distributed errors. For both scenarios the estimated parameters of the fitted VAR model and multivariate t-distribution are analyzed for evidence of auto- or cross-correlations and deviation from a normal distribution regarding the measurement noise.

Download Full-text

BloodGen3Module: Blood transcriptional module repertoire analysis and visualization using R

Bioinformatics ◽

10.1093/bioinformatics/btab121 ◽

2021 ◽

Author(s):

Darawan Rinchai ◽

Jessica Roelands ◽

Mohammed Toufiq ◽

Wouter Hendrickx ◽

Matthew C Altman ◽

...

Keyword(s):

Transcript Abundance ◽

R Package ◽

Supplementary Information ◽

Illustrative Case ◽

Bioinformatic Tools ◽

Transcriptional Module ◽

Wide Range ◽

Downstream Analysis ◽

Computing Module ◽

Parallel Workflow

Abstract Motivation We previously described the construction and characterization of generic and reusable blood transcriptional module repertoires. More recently we released a third iteration (“BloodGen3” module repertoire) that comprises 382 functionally annotated gene sets (modules) and encompasses 14,168 transcripts. Custom bioinformatic tools are needed to support downstream analysis, visualization and interpretation relying on such fixed module repertoires. Results We have developed and describe here a R package, BloodGen3Module. The functions of our package permit group comparison analyses to be performed at the module-level, and to display the results as annotated fingerprint grid plots. A parallel workflow for computing module repertoire changes for individual samples rather than groups of samples is also available; these results are displayed as fingerprint heatmaps. An illustrative case is used to demonstrate the steps involved in generating blood transcriptome repertoire fingerprints of septic patients. Taken together, this resource could facilitate the analysis and interpretation of changes in blood transcript abundance observed across a wide range of pathological and physiological states. Availability The BloodGen3Module package and documentation are freely available from Github: https://github.com/Drinchai/BloodGen3Module Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Phenotypic Characterisation for Growth and Nut Characteristics Revealed the Extent of Genetic Diversity in Wild Macadamia Germplasm

Agriculture ◽

10.3390/agriculture11070680 ◽

2021 ◽

Vol 11 (7) ◽

pp. 680

Author(s):

Thuy T. P. Mai ◽

Craig M. Hardner ◽

Mobashwer M. Alam ◽

Robert J. Henry ◽

Bruce L. Topp

Keyword(s):

Genetic Diversity ◽

Growth Traits ◽

Principal Component ◽

Conservation Strategies ◽

Wild Germplasm ◽

Phenotypic Characterisation ◽

Wide Range ◽

Genetic And Environmental Factors ◽

Positive Correlations ◽

Wild Accessions

Macadamia is a recently domesticated Australian native nut crop, and a large proportion of its wild germplasm is unexploited. Aiming to explore the existing diversity, 247 wild accessions from four species and inter-specific hybrids were phenotyped. A wide range of variation was found in growth and nut traits. Broad-sense heritability of traits were moderate (0.43–0.64), which suggested that both genetic and environmental factors are equally important for the variability of the traits. Correlations among the growth traits were significantly positive (0.49–0.76). There were significant positive correlations among the nut traits except for kernel recovery. The association between kernel recovery and shell thickness was highly significant and negative. Principal component analysis of the traits separated representative species groups. Accessions from Macadamia integrifolia Maiden and Betche, M. tetraphylla L.A.S. Johnson, and admixtures were clustered into one group and those of M. ternifolia F. Muell were separated into another group. In both M. integrifolia and M. tetraphylla groups, variation within site was greater than across sites, which suggested that the conservation strategies should concentrate on increased sampling within sites to capture wide genetic diversity. This study provides a background on the utilisation of wild germplasm as a genetic resource to be used in breeding programs and the direction for gene pool conservation.

Download Full-text

stochprofML: stochastic profiling using maximum likelihood estimation in R

BMC Bioinformatics ◽

10.1186/s12859-021-03970-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Lisa Amrhein ◽

Christiane Fuchs

Keyword(s):

Maximum Likelihood ◽

Cell Fate ◽

Likelihood Estimation ◽

R Package ◽

Cell Populations ◽

Simulation Studies ◽

Likelihood Principle ◽

Maximum Likelihood Principle ◽

Molecular Expression ◽

Mixed Samples

Abstract Background Tissues are often heterogeneous in their single-cell molecular expression, and this can govern the regulation of cell fate. For the understanding of development and disease, it is important to quantify heterogeneity in a given tissue. Results We present the R package stochprofML which uses the maximum likelihood principle to parameterize heterogeneity from the cumulative expression of small random pools of cells. We evaluate the algorithm’s performance in simulation studies and present further application opportunities. Conclusion Stochastic profiling outweighs the necessary demixing of mixed samples with a saving in experimental cost and effort and less measurement error. It offers possibilities for parameterizing heterogeneity, estimating underlying pool compositions and detecting differences between cell populations between samples.

Download Full-text

MUREN: a robust and multi-reference approach of RNA-seq transcript normalization

BMC Bioinformatics ◽

10.1186/s12859-021-04288-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yance Feng ◽

Lei M. Li

Keyword(s):

Biological Significance ◽

Housekeeping Genes ◽

R Package ◽

Data Sets ◽

Statistical Regression ◽

Rna Seq ◽

Least Trimmed Squares ◽

Standard Data ◽

Wide Range ◽

Multiple References

Abstract Background Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. Results We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. Conclusions MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples.

Download Full-text