stratifyR: An R Package for optimal stratification and sample allocation for univariate populations

K. G. Reddy; M. G. M. Khan

doi:10.1111/anzs.12301

Optimal Sample Allocation Under Unequal Costs in Cluster-Randomized Trials

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998620912418 ◽

2020 ◽

Vol 45 (4) ◽

pp. 446-474

Author(s):

Zuchao Shen ◽

Benjamin Kelcey

Keyword(s):

Optimal Design ◽

Statistical Power ◽

Randomized Trials ◽

Intraclass Correlation ◽

R Package ◽

Design Parameters ◽

Cluster Randomized Trials ◽

Cost Structures ◽

Cluster Randomized ◽

Sample Allocation

Conventional optimal design frameworks consider a narrow range of sampling cost structures that thereby constrict their capacity to identify the most powerful and efficient designs. We relax several constraints of previous optimal design frameworks by allowing for variable sampling costs in cluster-randomized trials. The proposed framework introduces additional design considerations and has the potential to identify designs with more statistical power, even when some parameters are constrained due to immutable practical concerns. The results also suggest that the gains in efficiency introduced through the expanded framework are fairly robust to misspecifications of the expanded cost structure and concomitant design parameters (e.g., intraclass correlation coefficient). The proposed framework is implemented in the R package odr.

Download Full-text

Making External Validation Valid for Molecular Classifier Development

JCO Precision Oncology ◽

10.1200/po.21.00103 ◽

2021 ◽

pp. 1250-1258

Author(s):

Yilin Wu ◽

Huei-Chung Huang ◽

Li-Xuan Qin

Keyword(s):

Test Data ◽

Signal To Noise Ratio ◽

External Validation ◽

R Package ◽

Specific Method ◽

Data Sets ◽

Precision Oncology ◽

Data Normalization ◽

Sample Allocation ◽

Molecular Classifier

PURPOSE Accurate assessment of a molecular classifier that guides patient care is of paramount importance in precision oncology. Recent years have seen an increasing use of external validation for such assessment. However, little is known about how it is affected by ubiquitous unwanted variations in test data because of disparate experimental handling and by the use of data normalization for alleviating such variations. METHODS In this paper, we studied these issues using two microarray data sets for the same set of tumor samples and additional data simulated by resampling under various levels of signal-to-noise ratio and different designs for array-to-sample allocation. RESULTS We showed that (1) unwanted variations can lead to biased classifier assessment and (2) data normalization mitigates the bias to varying extents depending on the specific method used. In particular, frozen normalization methods for test data outperform their conventional forms in terms of both reducing the bias in accuracy estimation and increasing robustness to handling effects. We make available our benchmarking tool as an R package on GitHub for performing such evaluation on additional methods for normalization and classification. CONCLUSION Our findings thus highlight the importance of proper test-data normalization for valid assessment by external validation and call for caution on the choice of normalization method for molecular classifier development.

Download Full-text

Optimal Design of Cluster- and Multisite-Randomized Studies Using Fallible Outcome Measures

Evaluation Review ◽

10.1177/0193841x19870878 ◽

2019 ◽

Vol 43 (3-4) ◽

pp. 189-225 ◽

Cited By ~ 1

Author(s):

Kyle Cox ◽

Benjamin Kelcey

Keyword(s):

Measurement Error ◽

Statistical Power ◽

Outcome Measurement ◽

Evaluation Studies ◽

Empirical Studies ◽

R Package ◽

Detectable Effect ◽

Sampling Schemes ◽

Sample Allocation ◽

Level Cluster

Background: Evaluation studies frequently draw on fallible outcomes that contain significant measurement error. Ignoring outcome measurement error in the planning stages can undermine the sufficiency and efficiency of an otherwise well-designed study and can further constrain the evidence studies bring to bear on the effectiveness of programs. Objectives: We develop simple formulas to adjust statistical power, minimum detectable effect (MDE), and optimal sample allocation formulas for two-level cluster- and multisite-randomized designs when the outcome is subject to measurement error. Results: The resulting adjusted formulas suggest that outcome measurement error typically amplifies treatment effect uncertainty, reduces power, increases the MDE, and undermines the efficiency of conventional optimal sampling schemes. Therefore, achieving adequate power for a given effect size will typically demand increased sample sizes when considering fallible outcomes, while maintaining design efficiency will require increasing portions of a budget be applied toward sampling a larger number of individuals within clusters. We illustrate evaluation planning with the new formulas while comparing them to conventional formulas using hypothetical examples based on recent empirical studies. To encourage adoption of the new formulas, we implement them in the R package PowerUpR and in the PowerUp software.

Download Full-text

Marine halogenated compound analysis: from an R package to the isolation of new griseophenone derivatives

Planta Medica ◽

10.1055/s-0036-1596648 ◽

2016 ◽

Vol 81 (S 01) ◽

pp. S1-S381

Author(s):

C Roullier ◽

Y Guitton ◽

S Prado ◽

O Grovel ◽

YF Pouchus

Keyword(s):

R Package ◽

Halogenated Compound

Download Full-text

The Orchard Plot: Cultivating a Forest Plot for Use in Ecology, Evolution and Beyond

10.32942/osf.io/epqa7 ◽

2019 ◽

Author(s):

Shinichi Nakagawa ◽

Malgorzata Lagisz ◽

Rose E O'Dea ◽

Joanna Rutkowska ◽

Yefeng Yang ◽

...

Keyword(s):

Meta Analysis ◽

R Package ◽

Effect Sizes ◽

Forest Plot ◽

Point Estimates ◽

Aggregate Effect ◽

The Individual ◽

Meta Analyses ◽

Heterogeneous Effect ◽

Intuitive Interpretation

‘Classic’ forest plots show the effect sizes from individual studies and the aggregate effect from a meta-analysis. However, in ecology and evolution meta-analyses routinely contain over 100 effect sizes, making the classic forest plot of limited use. We surveyed 102 meta-analyses in ecology and evolution, finding that only 11% use the classic forest plot. Instead, most used a ‘forest-like plot’, showing point estimates (with 95% confidence intervals; CIs) from a series of subgroups or categories in a meta-regression. We propose a modification of the forest-like plot, which we name the ‘orchard plot’. Orchard plots, in addition to showing overall mean effects and CIs from meta-analyses/regressions, also includes 95% prediction intervals (PIs), and the individual effect sizes scaled by their precision. The PI allows the user and reader to see the range in which an effect size from a future study may be expected to fall. The PI, therefore, provides an intuitive interpretation of any heterogeneity in the data. Supplementing the PI, the inclusion of underlying effect sizes also allows the user to see any influential or outlying effect sizes. We showcase the orchard plot with example datasets from ecology and evolution, using the R package, orchard, including several functions for visualizing meta-analytic data using forest-plot derivatives. We consider the orchard plot as a variant on the classic forest plot, cultivated to the needs of meta-analysts in ecology and evolution. Hopefully, the orchard plot will prove fruitful for visualizing large collections of heterogeneous effect sizes regardless of the field of study.

Download Full-text

Faculty Opinions recommendation of TIMMA-R: an R package for predicting synergistic multi-targeted drug combinations in cancer cell lines or patient-derived samples.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.725335383.793503769 ◽

2015 ◽

Author(s):

Zdeněk Valenta

Keyword(s):

Cell Lines ◽

Cancer Cell ◽

R Package ◽

Cancer Cell Lines ◽

Drug Combinations ◽

Targeted Drug

Download Full-text

Systematics, Biogeography, and Morphological Character Evolution of the Hemiepiphytic Subfamily Monsteroideae (Araceae)

Annals of the Missouri Botanical Garden ◽

10.3417/2018269 ◽

2019 ◽

Vol 104 (1) ◽

pp. 33-48 ◽

Cited By ~ 4

Author(s):

Alejandro Zuluaga ◽

Martin Llano ◽

Ken Cameron

Keyword(s):

Dna Sequences ◽

Pacific Islands ◽

R Package ◽

Morphological Characters ◽

Ancestral State ◽

Seed Shape ◽

Long Distance ◽

The Pacific ◽

And Migration ◽

The Tropics

The subfamily Monsteroideae (Araceae) is the third richest clade in the family, with ca. 369 described species and ca. 700 estimated. It comprises mostly hemiepiphytic or epiphytic plants restricted to the tropics, with three intercontinental disjunctions. Using a dataset representing all 12 genera in Monsteroideae (126 taxa), and five plastid and two nuclear markers, we studied the systematics and historical biogeography of the group. We found high support for the monophyly of the three major clades (Spathiphylleae sister to Heteropsis Kunth and Rhaphidophora Hassk. clades), and for six of the genera within Monsteroideae. However, we found low rates of variation in the DNA sequences used and a lack of molecular markers suitable for species-level phylogenies in the group. We also performed ancestral state reconstruction of some morphological characters traditionally used for genera delimitation. Only seed shape and size, number of seeds, number of locules, and presence of endosperm showed utility in the classification of genera in Monsteroideae. We estimated ancestral ranges using a dispersal-extinction-cladogenesis model as implemented in the R package BioGeoBEARS and found evidence for a Gondwanan origin of the clade. One tropical disjunction (Monstera Adans. sister to Amydrium Schott–Epipremnum Schott) was found to be the product of a previous Boreotropical distribution. Two other disjunctions are more recent and likely due to long-distance dispersal: Spathiphyllum Schott (with Holochlamys Engl. nested within) represents a dispersal from South America to the Pacific Islands in Southeast Asia, and Rhaphidophora represents a dispersal from Asia to Africa. Future studies based on stronger phylogenetic reconstructions and complete morphological datasets are needed to explore the details of speciation and migration within and among areas in Asia.

Download Full-text

WeibullR: An R Package for Weibull Analysis for Reliability Engineers

New Developments in Data Science and Data Analytics ◽

10.35566/isdsa2019c3 ◽

2020 ◽

Author(s):

David Silkworth

Keyword(s):

R Package ◽

Weibull Analysis

Download Full-text

LRMoE: An R Package for Flexible Actuarial Loss Modelling Using Mixture of Experts Regression Model

SSRN Electronic Journal ◽

10.2139/ssrn.3740215 ◽

2020 ◽

Author(s):

Spark C. Tseung ◽

Andrei Badescu ◽

Tsz Chai Fung ◽

Xiaodong Sheldon Lin

Keyword(s):

Regression Model ◽

R Package ◽

Mixture Of Experts ◽

Loss Modelling

Download Full-text

EpiPen: An R Package to Investigate Two-Locus Epistatic Models

Twin Research and Human Genetics ◽

10.1017/thg.2014.25 ◽

2014 ◽

Vol 17 (4) ◽

Cited By ~ 2

Author(s):

Raymond K. Walters ◽

Charles Laurin ◽

Gitta H. Lubke

Keyword(s):

Power Analysis ◽

R Package ◽

Simulation Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Epistatic Interactions ◽

Model Interpretation ◽

Genome Wide ◽

Using Data ◽

Power Analyses

Epistasis is a growing area of research in genome-wide studies, but the differences between alternative definitions of epistasis remain a source of confusion for many researchers. One problem is that models for epistasis are presented in a number of formats, some of which have difficult-to-interpret parameters. In addition, the relation between the different models is rarely explained. Existing software for testing epistatic interactions between single-nucleotide polymorphisms (SNPs) does not provide the flexibility to compare the available model parameterizations. For that reason we have developed an R package for investigating epistatic and penetrance models, EpiPen, to aid users who wish to easily compare, interpret, and utilize models for two-locus epistatic interactions. EpiPen facilitates research on SNP-SNP interactions by allowing the R user to easily convert between common parametric forms for two-locus interactions, generate data for simulation studies, and perform power analyses for the selected model with a continuous or dichotomous phenotype. The usefulness of the package for model interpretation and power analysis is illustrated using data on rheumatoid arthritis.

Download Full-text