Colour vision models: a practical guide, some simulations, and colourvision R package

Mapping Intimacies ◽

10.1101/103754 ◽

2017 ◽

Cited By ~ 2

Author(s):

Felipe M. Gawryszewski

Keyword(s):

Colour Vision ◽

Uv Light ◽

Real Data ◽

Careful Analysis ◽

R Package ◽

Logistic Function ◽

Basic Knowledge ◽

List Type ◽

Colour Measurements ◽

Background Photon

AbstractHuman colour vision differs from the vision of other animals. The most obvious differences are the number and type of photoreceptors in the retina. E.g., while humans are insensitive to ultraviolet (UV) light, most non-mammal vertebrates and insects have a colour vision that spans into the UV. The development of colour vision models allowed appraisals of colour vision independent of the human experience. These models are now widespread in ecology and evolution fields. Here I present a guide to colour vision modelling, run a series of simulations, and provide a R package – colourvision – to facilitate the use of colour vision models.I present the mathematical steps for calculation of the most commonly used colour vision models: Chittka (1992) colour hexagon, Endler & Mielke (2005) model, and Vorobyev & Osorio (1998) linear and log-linear receptor noise limited models (RNL). These models are then tested using identical simulated and real data. These comprise of reflectance spectra generated by a logistic function against an achromatic background, achromatic reflectance against an achromatic background, achromatic reflectance against a chromatic background, and real flower reflectance data against a natural background reflectance.When the specific requirements of each model are met, between model results are, overall, qualitatively and quantitatively similar. However, under many common scenarios of colour measurements, models may generate spurious values and/or considerably different predictions. Models that log-transform data and use relative photoreceptor outputs are prone to generate unrealistic results when the stimulus photon catch is smaller than the background photon catch. Moreover, models may generate unrealistic results when the background is chromatic (e.g. leaf reflectance) and the stimulus is an achromatic low reflectance spectrum.Colour vision models are a valuable tool in several ecology and evolution subfields. Nonetheless, knowledge of model assumptions, careful analysis of model outputs, and basic knowledge of calculation behind each model are crucial for appropriate model application, and generation of meaningful and reproducible results. Other aspects of vision not incorporated into these models should be considered when drawing conclusion from model results.

Download Full-text

Detection of differentially methylated CpG sites between tumor samples with uneven tumor purities

Bioinformatics ◽

10.1093/bioinformatics/btz885 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2017-2024

Author(s):

Weiwei Zhang ◽

Ziyi Li ◽

Nana Wei ◽

Hua-Jun Wu ◽

Xiaoqi Zheng

Keyword(s):

Real Data ◽

R Package ◽

Differential Methylation ◽

Least Square ◽

Epigenetic Mechanism ◽

Supplementary Information ◽

Cpg Sites ◽

Tumor Purity ◽

Different Sources ◽

Normal Controls

Abstract Motivation Inference of differentially methylated (DM) CpG sites between two groups of tumor samples with different geno- or pheno-types is a critical step to uncover the epigenetic mechanism of tumorigenesis, and identify biomarkers for cancer subtyping. However, as a major source of confounding factor, uneven distributions of tumor purity between two groups of tumor samples will lead to biased discovery of DM sites if not properly accounted for. Results We here propose InfiniumDM, a generalized least square model to adjust tumor purity effect for differential methylation analysis. Our method is applicable to a variety of experimental designs including with or without normal controls, different sources of normal tissue contaminations. We compared our method with conventional methods including minfi, limma and limma corrected by tumor purity using simulated datasets. Our method shows significantly better performance at different levels of differential methylation thresholds, sample sizes, mean purity deviations and so on. We also applied the proposed method to breast cancer samples from TCGA database to further evaluate its performance. Overall, both simulation and real data analyses demonstrate favorable performance over existing methods serving similar purpose. Availability and implementation InfiniumDM is a part of R package InfiniumPurify, which is freely available from GitHub (https://github.com/Xiaoqizheng/InfiniumPurify). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

mixIndependR: a R package for statistical independence testing of loci in database of multi-locus genotypes

BMC Bioinformatics ◽

10.1186/s12859-020-03945-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Bing Song ◽

August E. Woerner ◽

John Planz

Keyword(s):

Population Genetics ◽

Linkage Disequilibrium ◽

Genetic Markers ◽

Software Package ◽

Tandem Repeats ◽

Population Data ◽

Real Data ◽

R Package ◽

Nucleotide Polymorphisms ◽

Mutual Independence

Abstract Background Multi-locus genotype data are widely used in population genetics and disease studies. In evaluating the utility of multi-locus data, the independence of markers is commonly considered in many genomic assessments. Generally, pairwise non-random associations are tested by linkage disequilibrium; however, the dependence of one panel might be triplet, quartet, or other. Therefore, a compatible and user-friendly software is necessary for testing and assessing the global linkage disequilibrium among mixed genetic data. Results This study describes a software package for testing the mutual independence of mixed genetic datasets. Mutual independence is defined as no non-random associations among all subsets of the tested panel. The new R package “mixIndependR” calculates basic genetic parameters like allele frequency, genotype frequency, heterozygosity, Hardy–Weinberg equilibrium, and linkage disequilibrium (LD) by mutual independence from population data, regardless of the type of markers, such as simple nucleotide polymorphisms, short tandem repeats, insertions and deletions, and any other genetic markers. A novel method of assessing the dependence of mixed genetic panels is developed in this study and functionally analyzed in the software package. By comparing the observed distribution of two common summary statistics (the number of heterozygous loci [K] and the number of share alleles [X]) with their expected distributions under the assumption of mutual independence, the overall independence is tested. Conclusion The package “mixIndependR” is compatible to all categories of genetic markers and detects the overall non-random associations. Compared to pairwise disequilibrium, the approach described herein tends to have higher power, especially when number of markers is large. With this package, more multi-functional or stronger genetic panels can be developed, like mixed panels with different kinds of markers. In population genetics, the package “mixIndependR” makes it possible to discover more about admixture of populations, natural selection, genetic drift, and population demographics, as a more powerful method of detecting LD. Moreover, this new approach can optimize variants selection in disease studies and contribute to panel combination for treatments in multimorbidity. Application of this approach in real data is expected in the future, and this might bring a leap in the field of genetic technology. Availability The R package mixIndependR, is available on the Comprehensive R Archive Network (CRAN) at: https://cran.r-project.org/web/packages/mixIndependR/index.html.

Download Full-text

FIREcaller: Detecting Frequently Interacting Regions from Hi-C Data

10.1101/619288 ◽

2019 ◽

Cited By ~ 3

Author(s):

Cheynna Crowley ◽

Yuchen Yang ◽

Yunjiang Qiu ◽

Benxia Hu ◽

Armen Abnousi ◽

...

Keyword(s):

Gene Regulation ◽

Spatial Organization ◽

R Package ◽

Specific Gene ◽

List Type ◽

Cell Type ◽

R Software ◽

Computational Tools ◽

Cell Type Specific ◽

User Friendly

AbstractHi-C experiments have been widely adopted to study chromatin spatial organization, which plays an essential role in genome function. We have recently identified frequently interacting regions (FIREs) and found that they are closely associated with cell-type-specific gene regulation. However, computational tools for detecting FIREs from Hi-C data are still lacking. In this work, we present FIREcaller, a stand-alone, user-friendly R package for detecting FIREs from Hi-C data. FIREcaller takes raw Hi-C contact matrices as input, performs within-sample and cross-sample normalization, and outputs continuous FIRE scores, dichotomous FIREs, and super-FIREs. Applying FIREcaller to Hi-C data from various human tissues, we demonstrate that FIREs and super-FIREs identified, in a tissue-specific manner, are closely related to gene regulation, are enriched for enhancer-promoter (E-P) interactions, tend to overlap with regions exhibiting epigenomic signatures of cis-regulatory roles, and aid the interpretation or GWAS variants. The FIREcaller package is implemented in R and freely available at https://yunliweb.its.unc.edu/FIREcaller.Highlights– Frequently Interacting Regions (FIREs) can be used to identify tissue and cell-type-specific cis-regulatory regions.– An R software, FIREcaller, has been developed to identify FIREs and clustered FIREs into super-FIREs.

Download Full-text

Efficient estimation of stereo thresholds: what slope should be assumed for the psychometric function?

10.31234/osf.io/rgak8 ◽

2019 ◽

Author(s):

Ignacio Serrano-Pedraza ◽

Kathleen Vancleef ◽

William Herbert ◽

Nicola Goodship ◽

Maeve Woodhouse ◽

...

Keyword(s):

Psychometric Function ◽

Population Distribution ◽

Real Data ◽

Logistic Function ◽

Efficient Estimation ◽

Threshold Estimate ◽

Bayesian Procedure ◽

Detection Thresholds ◽

Bayesian Procedures ◽

Wide Range

Bayesian staircases are widely used in psychophysics to estimate detection thresholds. Simulations have revealed the importance of the parameters selected for the assumed subject’s psychometric function in enabling thresholds to be estimated with small bias and high precision. One important parameter is the slope of the psychometric function, or equivalently its spread. This is often held fixed, rather than estimated for individual subjects, because much larger numbers of trials are required to estimate the spread as well as the threshold. However, if this fixed value is wrong, the threshold estimate can be biased. Here we determine the optimal slope to minimize bias and maximize precision when measuring stereoacuity with Bayesian staircases. We performed 2- and 4AFC disparity detection stereo experiments in order to measure the spread of the disparity psychometric function in human observers assuming a Logistic function. We found a wide range, between 0.03 and 3.5 log10 arcsec, with little change with age. We then ran simulations to examine the optimal spread using the real data. From our simulations and for three different experiments, we recommend selecting assumed spread values between the percentiles 60-80% of the population distribution of spreads (these percentiles can be extended to other type of thresholds). For stereo thresholds, we recommend a spread σ=1.7 log10 arcsec for 2AFC (slope 𝛽 = 4.3/log10 arcsec), and σ=1.5 log10 arcsec for 4AFC (𝛽 = 4.9/log10 arcsec). Finally, we compared a Bayesian procedure (ZEST using the optimal σ) with five Bayesian procedures that are versions of ZEST-2D, Psi, and Psi-marginal. In general, our recommended procedure showed the lowest threshold bias and highest precision.

Download Full-text

Categorical Functional Data Analysis. The cfda R Package

Mathematics ◽

10.3390/math9233074 ◽

2021 ◽

Vol 9 (23) ◽

pp. 3074

Author(s):

Cristian Preda ◽

Quentin Grimonprez ◽

Vincent Vandewalle

Keyword(s):

Functional Data ◽

Multiple Correspondence Analysis ◽

Real Data ◽

Jump Process ◽

R Package ◽

Finite Basis ◽

Data Set ◽

Stochastic Jump ◽

Finite Set ◽

Infinite Set

Categorical functional data represented by paths of a stochastic jump process with continuous time and a finite set of states are considered. As an extension of the multiple correspondence analysis to an infinite set of variables, optimal encodings of states over time are approximated using an arbitrary finite basis of functions. This allows dimension reduction, optimal representation, and visualisation of data in lower dimensional spaces. The methodology is implemented in the cfda R package and is illustrated using a real data set in the clustering framework.

Download Full-text

The asymptotic distribution of the Net Benefit estimator in presence of right-censoring

Statistical Methods in Medical Research ◽

10.1177/09622802211037067 ◽

2021 ◽

pp. 096228022110370

Author(s):

Brice Ozenne ◽

Esben Budtz-Jørgensen ◽

Julien Péron

Keyword(s):

Asymptotic Distribution ◽

Nuisance Parameter ◽

Real Data ◽

R Package ◽

Right Censoring ◽

Drop Out ◽

Net Benefit ◽

Asymptotic Results ◽

Finite Samples ◽

Benefit Risk Assessment

The benefit–risk balance is a critical information when evaluating a new treatment. The Net Benefit has been proposed as a metric for the benefit–risk assessment, and applied in oncology to simultaneously consider gains in survival and possible side effects of chemotherapies. With complete data, one can construct a U-statistic estimator for the Net Benefit and obtain its asymptotic distribution using standard results of the U-statistic theory. However, real data is often subject to right-censoring, e.g. patient drop-out in clinical trials. It is then possible to estimate the Net Benefit using a modified U-statistic, which involves the survival time. The latter can be seen as a nuisance parameter affecting the asymptotic distribution of the Net Benefit estimator. We present here how existing asymptotic results on U-statistics can be applied to estimate the distribution of the net benefit estimator, and assess their validity in finite samples. The methodology generalizes to other statistics obtained using generalized pairwise comparisons, such as the win ratio. It is implemented in the R package BuyseTest (version 2.3.0 and later) available on Comprehensive R Archive Network.

Download Full-text

powmic: an R package for power assessment in microbiome case–control studies

Bioinformatics ◽

10.1093/bioinformatics/btaa197 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3563-3565

Author(s):

Li Chen

Keyword(s):

Power Analysis ◽

Real Data ◽

Analytical Form ◽

R Package ◽

Case Control ◽

Supplementary Information ◽

Metagenomic Sequencing ◽

Case Control Studies ◽

Simulation Based ◽

Over Dispersion

Abstract Summary Power analysis is essential to decide the sample size of metagenomic sequencing experiments in a case–control study for identifying differentially abundant (DA) microbes. However, the complexity of microbial data characteristics, such as excessive zeros, over-dispersion, compositionality, intrinsically microbial correlations and variable sequencing depths, makes the power analysis particularly challenging because the analytical form is usually unavailable. Here, we develop a simulation-based power assessment strategy and R package powmic, which considers the complexity of microbial data characteristics. A real data example demonstrates the usage of powmic. Availability and implementation powmic R package and online tutorial are available at https://github.com/lichen-lab/powmic. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SPIKY: a graphical user interface for monitoring spike train synchrony

Journal of Neurophysiology ◽

10.1152/jn.00848.2014 ◽

2015 ◽

Vol 113 (9) ◽

pp. 3432-3445 ◽

Cited By ~ 36

Author(s):

Thomas Kreuz ◽

Mario Mulansky ◽

Nebojsa Bozanic

Keyword(s):

User Interface ◽

Spike Train ◽

Graphical User Interface ◽

Large Scale ◽

Spatial Scales ◽

Real Data ◽

Basic Knowledge ◽

High Temporal Resolution ◽

Time Resolved ◽

Significance Levels

Techniques for recording large-scale neuronal spiking activity are developing very fast. This leads to an increasing demand for algorithms capable of analyzing large amounts of experimental spike train data. One of the most crucial and demanding tasks is the identification of similarity patterns with a very high temporal resolution and across different spatial scales. To address this task, in recent years three time-resolved measures of spike train synchrony have been proposed, the ISI-distance, the SPIKE-distance, and event synchronization. The Matlab source codes for calculating and visualizing these measures have been made publicly available. However, due to the many different possible representations of the results the use of these codes is rather complicated and their application requires some basic knowledge of Matlab. Thus it became desirable to provide a more user-friendly and interactive interface. Here we address this need and present SPIKY, a graphical user interface that facilitates the application of time-resolved measures of spike train synchrony to both simulated and real data. SPIKY includes implementations of the ISI-distance, the SPIKE-distance, and the SPIKE-synchronization (an improved and simplified extension of event synchronization) that have been optimized with respect to computation speed and memory demand. It also comprises a spike train generator and an event detector that makes it capable of analyzing continuous data. Finally, the SPIKY package includes additional complementary programs aimed at the analysis of large numbers of datasets and the estimation of significance levels.

Download Full-text

blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models

10.1101/357798 ◽

2018 ◽

Cited By ~ 3

Author(s):

Roozbeh Valavi ◽

Jane Elith ◽

José J. Lahoz-Monfort ◽

Gurutzeta Guillera-Arroita

Keyword(s):

Species Distribution ◽

Cross Validation ◽

Species Distribution Models ◽

Predictive Performance ◽

R Package ◽

Species Distribution Modelling ◽

List Type ◽

Distribution Models ◽

Distribution Modelling ◽

Evaluation Approaches

SummaryWhen applied to structured data, conventional random cross-validation techniques can lead to underestimation of prediction error, and may result in inappropriate model selection.We present the R package blockCV, a new toolbox for cross-validation of species distribution modelling.The package can generate spatially or environmentally separated folds. It includes tools to measure spatial autocorrelation ranges in candidate covariates, providing the user with insights into the spatial structure in these data. It also offers interactive graphical capabilities for creating spatial blocks and exploring data folds.Package blockCV enables modellers to more easily implement a range of evaluation approaches. It will help the modelling community learn more about the impacts of evaluation approaches on our understanding of predictive performance of species distribution models.

Download Full-text

Joint species distribution modelling with HMSC-R

10.1101/603217 ◽

2019 ◽

Cited By ~ 3

Author(s):

Gleb Tikhonov ◽

Øystein Opedal ◽

Nerea Abrego ◽

Aleksi Lehikoinen ◽

Otso Ovaskainen

Keyword(s):

Community Ecology ◽

Species Distribution ◽

Phylogenetic Relationships ◽

Real Data ◽

Species Traits ◽

Species Distribution Modelling ◽

List Type ◽

Large Species ◽

Distribution Modelling ◽

Environmental Covariates

AbstractJoint Species Distribution Modelling (JSDM) is becoming an increasingly popular statistical method for analyzing data in community ecology. JSDM allow the integration of community ecology data with data on environmental covariates, species traits, phylogenetic relationships, and the spatio-temporal context of the study, providing predictive insights into community assembly processes from non-manipulative observational data of species communities. Hierarchical Modelling of Species Communities (HMSC) is a general and flexible framework for fitting JSDMs, yet its full range of functionality has remained restricted to Matlab users only.To make HMSC accessible to the wider community of ecologists, we introduce HMSC-R 3.0, a user-friendly R implementation of the framework described in Ovaskainen et al (Ecology Letters, 20 (5), 561-576, 2017) and further extended in several later publications.We illustrate the use of the package by providing a series of five vignettes that apply HMSC-R 3.0 to simulated and real data. HMSC-R applications to simulated data involve single-species models, models of small communities, and models of large species communities. They demonstrate the estimation of species responses to environmental covariates and how these depend on species traits, as well as the estimation of residual species associations. They further demonstrate how HMSC-R can be applied to normally distributed data, count data, and presence-absence data. The real data consist of bird counts in a spatio-temporally structured dataset, environmental covariates, species traits and phylogenetic relationships. The vignettes demonstrate how to construct and fit many kinds of models, how to examine MCMC convergence, how to examine the explanatory and predictive powers of the models, how to assess parameter estimates, and how to make predictions.The package, along with the extended vignettes, makes JSDM fitting and post-processing easily accessible to ecologists familiar with R.

Download Full-text