scholarly journals Optimal auxiliary-covariate-based two-phase sampling design for semiparametric efficient estimation of a mean or mean difference, with application to clinical trials

2013 ◽  
Vol 33 (6) ◽  
pp. 901-917 ◽  
Author(s):  
Peter B. Gilbert ◽  
Xuesong Yu ◽  
Andrea Rotnitzky
2020 ◽  
Vol 5 (6) ◽  
pp. 7604-7622 ◽  
Author(s):  
Yasir Hassan ◽  
◽  
Muhammad Ismail ◽  
Will Murray ◽  
Muhammad Qaiser Shahbaz ◽  
...  

2012 ◽  
Vol 3 (4) ◽  
pp. 721-730 ◽  
Author(s):  
Krishna Pacifici ◽  
Robert M. Dorazio ◽  
Michael J. Conroy

2017 ◽  
Vol 13 (2) ◽  
pp. 5-28 ◽  
Author(s):  
P. Parichha ◽  
K. Basu ◽  
A. Bandyopadhyay ◽  
P. Mukhopadhyay

Abstract The present investigation deals with the problem of estimation of population mean in two-phase (double) sampling. Utilizing information on two auxiliary variables, one chain exponential ratio and regression type estimator has been proposed and its properties are studied under two different structures of twophase sampling. To make the estimator practicable, unbiased version of the proposed strategy has also been developed. The dominance of the suggested estimator over some contemporary estimators of population mean has been established through numerical illustrations carried over the data set of some natural population and artificially generated population. Categorization of the dominance ranges of the proposed estimation strategies are deployed through defuzzification tools, which are followed by suitable recommendations.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Sunwoo Han ◽  
Brian D. Williamson ◽  
Youyi Fong

Abstract Background While random forests are one of the most successful machine learning methods, it is necessary to optimize their performance for use with datasets resulting from a two-phase sampling design with a small number of cases—a common situation in biomedical studies, which often have rare outcomes and covariates whose measurement is resource-intensive. Methods Using an immunologic marker dataset from a phase III HIV vaccine efficacy trial, we seek to optimize random forest prediction performance using combinations of variable screening, class balancing, weighting, and hyperparameter tuning. Results Our experiments show that while class balancing helps improve random forest prediction performance when variable screening is not applied, class balancing has a negative impact on performance in the presence of variable screening. The impact of the weighting similarly depends on whether variable screening is applied. Hyperparameter tuning is ineffective in situations with small sample sizes. We further show that random forests under-perform generalized linear models for some subsets of markers, and prediction performance on this dataset can be improved by stacking random forests and generalized linear models trained on different subsets of predictors, and that the extent of improvement depends critically on the dissimilarities between candidate learner predictions. Conclusion In small datasets from two-phase sampling design, variable screening and inverse sampling probability weighting are important for achieving good prediction performance of random forests. In addition, stacking random forests and simple linear models can offer improvements over random forests.


Sign in / Sign up

Export Citation Format

Share Document