stratifyR: An R Package for optimal stratification and sample allocation for univariate populations

2020 ◽  
Vol 62 (3) ◽  
pp. 383-405
Author(s):  
K. G. Reddy ◽  
M. G. M. Khan
Keyword(s):  
2020 ◽  
Vol 45 (4) ◽  
pp. 446-474
Author(s):  
Zuchao Shen ◽  
Benjamin Kelcey

Conventional optimal design frameworks consider a narrow range of sampling cost structures that thereby constrict their capacity to identify the most powerful and efficient designs. We relax several constraints of previous optimal design frameworks by allowing for variable sampling costs in cluster-randomized trials. The proposed framework introduces additional design considerations and has the potential to identify designs with more statistical power, even when some parameters are constrained due to immutable practical concerns. The results also suggest that the gains in efficiency introduced through the expanded framework are fairly robust to misspecifications of the expanded cost structure and concomitant design parameters (e.g., intraclass correlation coefficient). The proposed framework is implemented in the R package odr.


2021 ◽  
pp. 1250-1258
Author(s):  
Yilin Wu ◽  
Huei-Chung Huang ◽  
Li-Xuan Qin

PURPOSE Accurate assessment of a molecular classifier that guides patient care is of paramount importance in precision oncology. Recent years have seen an increasing use of external validation for such assessment. However, little is known about how it is affected by ubiquitous unwanted variations in test data because of disparate experimental handling and by the use of data normalization for alleviating such variations. METHODS In this paper, we studied these issues using two microarray data sets for the same set of tumor samples and additional data simulated by resampling under various levels of signal-to-noise ratio and different designs for array-to-sample allocation. RESULTS We showed that (1) unwanted variations can lead to biased classifier assessment and (2) data normalization mitigates the bias to varying extents depending on the specific method used. In particular, frozen normalization methods for test data outperform their conventional forms in terms of both reducing the bias in accuracy estimation and increasing robustness to handling effects. We make available our benchmarking tool as an R package on GitHub for performing such evaluation on additional methods for normalization and classification. CONCLUSION Our findings thus highlight the importance of proper test-data normalization for valid assessment by external validation and call for caution on the choice of normalization method for molecular classifier development.


2019 ◽  
Vol 43 (3-4) ◽  
pp. 189-225 ◽  
Author(s):  
Kyle Cox ◽  
Benjamin Kelcey

Background: Evaluation studies frequently draw on fallible outcomes that contain significant measurement error. Ignoring outcome measurement error in the planning stages can undermine the sufficiency and efficiency of an otherwise well-designed study and can further constrain the evidence studies bring to bear on the effectiveness of programs. Objectives: We develop simple formulas to adjust statistical power, minimum detectable effect (MDE), and optimal sample allocation formulas for two-level cluster- and multisite-randomized designs when the outcome is subject to measurement error. Results: The resulting adjusted formulas suggest that outcome measurement error typically amplifies treatment effect uncertainty, reduces power, increases the MDE, and undermines the efficiency of conventional optimal sampling schemes. Therefore, achieving adequate power for a given effect size will typically demand increased sample sizes when considering fallible outcomes, while maintaining design efficiency will require increasing portions of a budget be applied toward sampling a larger number of individuals within clusters. We illustrate evaluation planning with the new formulas while comparing them to conventional formulas using hypothetical examples based on recent empirical studies. To encourage adoption of the new formulas, we implement them in the R package PowerUpR and in the PowerUp software.


Planta Medica ◽  
2016 ◽  
Vol 81 (S 01) ◽  
pp. S1-S381
Author(s):  
C Roullier ◽  
Y Guitton ◽  
S Prado ◽  
O Grovel ◽  
YF Pouchus

2019 ◽  
Author(s):  
Shinichi Nakagawa ◽  
Malgorzata Lagisz ◽  
Rose E O'Dea ◽  
Joanna Rutkowska ◽  
Yefeng Yang ◽  
...  

‘Classic’ forest plots show the effect sizes from individual studies and the aggregate effect from a meta-analysis. However, in ecology and evolution meta-analyses routinely contain over 100 effect sizes, making the classic forest plot of limited use. We surveyed 102 meta-analyses in ecology and evolution, finding that only 11% use the classic forest plot. Instead, most used a ‘forest-like plot’, showing point estimates (with 95% confidence intervals; CIs) from a series of subgroups or categories in a meta-regression. We propose a modification of the forest-like plot, which we name the ‘orchard plot’. Orchard plots, in addition to showing overall mean effects and CIs from meta-analyses/regressions, also includes 95% prediction intervals (PIs), and the individual effect sizes scaled by their precision. The PI allows the user and reader to see the range in which an effect size from a future study may be expected to fall. The PI, therefore, provides an intuitive interpretation of any heterogeneity in the data. Supplementing the PI, the inclusion of underlying effect sizes also allows the user to see any influential or outlying effect sizes. We showcase the orchard plot with example datasets from ecology and evolution, using the R package, orchard, including several functions for visualizing meta-analytic data using forest-plot derivatives. We consider the orchard plot as a variant on the classic forest plot, cultivated to the needs of meta-analysts in ecology and evolution. Hopefully, the orchard plot will prove fruitful for visualizing large collections of heterogeneous effect sizes regardless of the field of study.


2019 ◽  
Vol 104 (1) ◽  
pp. 33-48 ◽  
Author(s):  
Alejandro Zuluaga ◽  
Martin Llano ◽  
Ken Cameron

The subfamily Monsteroideae (Araceae) is the third richest clade in the family, with ca. 369 described species and ca. 700 estimated. It comprises mostly hemiepiphytic or epiphytic plants restricted to the tropics, with three intercontinental disjunctions. Using a dataset representing all 12 genera in Monsteroideae (126 taxa), and five plastid and two nuclear markers, we studied the systematics and historical biogeography of the group. We found high support for the monophyly of the three major clades (Spathiphylleae sister to Heteropsis Kunth and Rhaphidophora Hassk. clades), and for six of the genera within Monsteroideae. However, we found low rates of variation in the DNA sequences used and a lack of molecular markers suitable for species-level phylogenies in the group. We also performed ancestral state reconstruction of some morphological characters traditionally used for genera delimitation. Only seed shape and size, number of seeds, number of locules, and presence of endosperm showed utility in the classification of genera in Monsteroideae. We estimated ancestral ranges using a dispersal-extinction-cladogenesis model as implemented in the R package BioGeoBEARS and found evidence for a Gondwanan origin of the clade. One tropical disjunction (Monstera Adans. sister to Amydrium Schott–Epipremnum Schott) was found to be the product of a previous Boreotropical distribution. Two other disjunctions are more recent and likely due to long-distance dispersal: Spathiphyllum Schott (with Holochlamys Engl. nested within) represents a dispersal from South America to the Pacific Islands in Southeast Asia, and Rhaphidophora represents a dispersal from Asia to Africa. Future studies based on stronger phylogenetic reconstructions and complete morphological datasets are needed to explore the details of speciation and migration within and among areas in Asia.


2020 ◽  
Author(s):  
Spark C. Tseung ◽  
Andrei Badescu ◽  
Tsz Chai Fung ◽  
Xiaodong Sheldon Lin

2014 ◽  
Vol 17 (4) ◽  
Author(s):  
Raymond K. Walters ◽  
Charles Laurin ◽  
Gitta H. Lubke

Epistasis is a growing area of research in genome-wide studies, but the differences between alternative definitions of epistasis remain a source of confusion for many researchers. One problem is that models for epistasis are presented in a number of formats, some of which have difficult-to-interpret parameters. In addition, the relation between the different models is rarely explained. Existing software for testing epistatic interactions between single-nucleotide polymorphisms (SNPs) does not provide the flexibility to compare the available model parameterizations. For that reason we have developed an R package for investigating epistatic and penetrance models, EpiPen, to aid users who wish to easily compare, interpret, and utilize models for two-locus epistatic interactions. EpiPen facilitates research on SNP-SNP interactions by allowing the R user to easily convert between common parametric forms for two-locus interactions, generate data for simulation studies, and perform power analyses for the selected model with a continuous or dichotomous phenotype. The usefulness of the package for model interpretation and power analysis is illustrated using data on rheumatoid arthritis.


Sign in / Sign up

Export Citation Format

Share Document