Optimal auxiliary-covariate-based two-phase sampling design for semiparametric efficient estimation of a mean or mean difference, with application to clinical trials

Peter B. Gilbert; Xuesong Yu; Andrea Rotnitzky

doi:10.1002/sim.6006

Optimal auxiliary-covariate-based two-phase sampling design for semiparametric efficient estimation of a mean or mean difference, with application to clinical trials

Statistics in Medicine ◽

10.1002/sim.6006 ◽

2013 ◽

Vol 33 (6) ◽

pp. 901-917 ◽

Cited By ~ 5

Author(s):

Peter B. Gilbert ◽

Xuesong Yu ◽

Andrea Rotnitzky

Keyword(s):

Clinical Trials ◽

Sampling Design ◽

Mean Difference ◽

Efficient Estimation ◽

Two Phase ◽

Phase Sampling ◽

Auxiliary Covariate ◽

Two Phase Sampling

Download Full-text

Efficient family of estimators of median using two-phase sampling design

Communication in Statistics- Theory and Methods ◽

10.1080/03610926.2014.911912 ◽

2015 ◽

Vol 45 (15) ◽

pp. 4325-4331 ◽

Cited By ~ 1

Author(s):

H.S. Jhajj ◽

Harpreet Kaur ◽

Puneet Jhajj

Keyword(s):

Sampling Design ◽

Two Phase ◽

Phase Sampling ◽

Two Phase Sampling

Download Full-text

An Improved Difference Type Estimator for Population Mean Under Two-Phase Sampling Design

Journal of Statistics Applications & Probability ◽

10.18576/jsap/070212 ◽

2018 ◽

Vol 7 (2) ◽

pp. 349-355

Author(s):

Asra Nazir ◽

Rafia Jan ◽

T. R. Jan

Keyword(s):

Sampling Design ◽

Two Phase ◽

Population Mean ◽

Phase Sampling ◽

Difference Type ◽

Two Phase Sampling

Download Full-text

A modified efficient difference-type estimator for population mean under two-phase sampling design

Open Journal of Mathematical Sciences ◽

10.30538/oms2020.0110 ◽

2020 ◽

Vol 4 (1) ◽

pp. 195-199

Author(s):

A. E. Anieting ◽

◽

J. K. Mosugu ◽

Keyword(s):

Sampling Design ◽

Two Phase ◽

Population Mean ◽

Phase Sampling ◽

Difference Type ◽

Two Phase Sampling

Download Full-text

Efficient estimation combining exponential and ln functions under two phase sampling

AIMS Mathematics ◽

10.3934/math.2020486 ◽

2020 ◽

Vol 5 (6) ◽

pp. 7604-7622 ◽

Cited By ~ 1

Author(s):

Yasir Hassan ◽

◽

Muhammad Ismail ◽

Will Murray ◽

Muhammad Qaiser Shahbaz ◽

...

Keyword(s):

Efficient Estimation ◽

Two Phase ◽

Phase Sampling ◽

Two Phase Sampling

Download Full-text

Estimation of Population Mean Using Imputation Methods for Missing Data Under Two-Phase Sampling Design

Journal of Statistical Theory and Practice ◽

10.1007/s42519-018-0016-5 ◽

2018 ◽

Vol 13 (1) ◽

Cited By ~ 1

Author(s):

G. N. Singh ◽

S. Suman

Keyword(s):

Missing Data ◽

Sampling Design ◽

Two Phase ◽

Imputation Methods ◽

Population Mean ◽

Phase Sampling ◽

Two Phase Sampling

Download Full-text

A note on a difference-type estimator for population mean under two-phase sampling design

SpringerPlus ◽

10.1186/s40064-016-2368-1 ◽

2016 ◽

Vol 5 (1) ◽

Cited By ~ 3

Author(s):

Mursala Khan ◽

Abdullah Yahia Al-Hossain

Keyword(s):

Sampling Design ◽

Two Phase ◽

Population Mean ◽

Phase Sampling ◽

Difference Type ◽

Two Phase Sampling

Download Full-text

A two-phase sampling design for increasing detections of rare species in occupancy surveys

Methods in Ecology and Evolution ◽

10.1111/j.2041-210x.2012.00201.x ◽

2012 ◽

Vol 3 (4) ◽

pp. 721-730 ◽

Cited By ~ 14

Author(s):

Krishna Pacifici ◽

Robert M. Dorazio ◽

Michael J. Conroy

Keyword(s):

Rare Species ◽

Sampling Design ◽

Two Phase ◽

Phase Sampling ◽

Two Phase Sampling

Download Full-text

Development of Efficient Estimation Technique for Population Mean in Two Phase Sampling Using Fuzzy Tools

Journal of Applied Mathematics Statistics and Informatics ◽

10.1515/jamsi-2017-0006 ◽

2017 ◽

Vol 13 (2) ◽

pp. 5-28 ◽

Cited By ~ 1

Author(s):

P. Parichha ◽

K. Basu ◽

A. Bandyopadhyay ◽

P. Mukhopadhyay

Keyword(s):

Natural Population ◽

Efficient Estimation ◽

Double Sampling ◽

Estimation Technique ◽

Auxiliary Variables ◽

Two Phase ◽

Data Set ◽

Population Mean ◽

Phase Sampling ◽

Two Phase Sampling

Abstract The present investigation deals with the problem of estimation of population mean in two-phase (double) sampling. Utilizing information on two auxiliary variables, one chain exponential ratio and regression type estimator has been proposed and its properties are studied under two different structures of twophase sampling. To make the estimator practicable, unbiased version of the proposed strategy has also been developed. The dominance of the suggested estimator over some contemporary estimators of population mean has been established through numerical illustrations carried over the data set of some natural population and artificially generated population. Categorization of the dominance ranges of the proposed estimation strategies are deployed through defuzzification tools, which are followed by suitable recommendations.

Download Full-text

Mean Estimation under Imputation based on Two-Phase Sampling Design using an Auxiliary Variable

Pakistan Journal of Statistics and Operation Research ◽

10.18187/pjsor.v12i4.1399 ◽

2016 ◽

Vol 12 (4) ◽

pp. 639 ◽

Cited By ~ 1

Author(s):

Ranjita Pandey ◽

Kalpana Yadav

Keyword(s):

Sampling Design ◽

Auxiliary Variable ◽

Two Phase ◽

Mean Estimation ◽

Phase Sampling ◽

Two Phase Sampling

Download Full-text

Improving random forest predictions in small datasets from two-phase sampling designs

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01688-3 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Sunwoo Han ◽

Brian D. Williamson ◽

Youyi Fong

Keyword(s):

Random Forest ◽

Generalized Linear Models ◽

Random Forests ◽

Sampling Design ◽

Linear Models ◽

Prediction Performance ◽

Two Phase ◽

Variable Screening ◽

Phase Sampling ◽

Two Phase Sampling

Abstract Background While random forests are one of the most successful machine learning methods, it is necessary to optimize their performance for use with datasets resulting from a two-phase sampling design with a small number of cases—a common situation in biomedical studies, which often have rare outcomes and covariates whose measurement is resource-intensive. Methods Using an immunologic marker dataset from a phase III HIV vaccine efficacy trial, we seek to optimize random forest prediction performance using combinations of variable screening, class balancing, weighting, and hyperparameter tuning. Results Our experiments show that while class balancing helps improve random forest prediction performance when variable screening is not applied, class balancing has a negative impact on performance in the presence of variable screening. The impact of the weighting similarly depends on whether variable screening is applied. Hyperparameter tuning is ineffective in situations with small sample sizes. We further show that random forests under-perform generalized linear models for some subsets of markers, and prediction performance on this dataset can be improved by stacking random forests and generalized linear models trained on different subsets of predictors, and that the extent of improvement depends critically on the dissimilarities between candidate learner predictions. Conclusion In small datasets from two-phase sampling design, variable screening and inverse sampling probability weighting are important for achieving good prediction performance of random forests. In addition, stacking random forests and simple linear models can offer improvements over random forests.

Download Full-text