scholarly journals Designing and sample size calculation in presence of heterogeneity in biological studies involving high-throughput data.

Author(s):  
Sudhir Srivastava
2014 ◽  
Vol 13s6 ◽  
pp. CIN.S17688 ◽  
Author(s):  
Yan Guo ◽  
Shilin Zhao ◽  
Chung-I Li ◽  
Quanhu Sheng ◽  
Yu Shyr

Sample size and power determination is the first step in the experimental design of a successful study. Sample size and power calculation is required for applications for National Institutes of Health (NIH) funding. Sample size and power calculation is well established for traditional biological studies such as mouse model, genome wide association study (GWAS), and microarray studies. Recent developments in high-throughput sequencing technology have allowed RNAseq to replace microarray as the technology of choice for high-throughput gene expression profiling. However, the sample size and power analysis of RNAseq technology is an underdeveloped area. Here, we present RNAseqPS, an advanced online RNAseq power and sample size calculation tool based on the Poisson and negative binomial distributions. RNAseqPS was built using the Shiny package in R. It provides an interactive graphical user interface that allows the users to easily conduct sample size and power analysis for RNAseq experimental design. RNAseqPS can be accessed directly at http://cqs.mc.vanderbilt.edu/shiny/RNAseqPS/ .


2015 ◽  
Vol 53 ◽  
pp. 355-362 ◽  
Author(s):  
Dae-Soon Son ◽  
DongHyuk Lee ◽  
Kyusang Lee ◽  
Sin-Ho Jung ◽  
Taejin Ahn ◽  
...  

2013 ◽  
Vol 2013 ◽  
pp. 1-11 ◽  
Author(s):  
Dongmei Li ◽  
Timothy D. Dye

Resampling-based multiple testing procedures are widely used in genomic studies to identify differentially expressed genes and to conduct genome-wide association studies. However, the power and stability properties of these popular resampling-based multiple testing procedures have not been extensively evaluated. Our study focuses on investigating the power and stability of seven resampling-based multiple testing procedures frequently used in high-throughput data analysis for small sample size data through simulations and gene oncology examples. The bootstrap single-step minPprocedure and the bootstrap step-down minPprocedure perform the best among all tested procedures, when sample size is as small as 3 in each group and either familywise error rate or false discovery rate control is desired. When sample size increases to 12 and false discovery rate control is desired, the permutation maxTprocedure and the permutation minPprocedure perform best. Our results provide guidance for high-throughput data analysis when sample size is small.


2020 ◽  
Author(s):  
Xinzhou Ge ◽  
Yiling Elaine Chen ◽  
Dongyuan Song ◽  
MeiLu McDermott ◽  
Kyla Woyshner ◽  
...  

AbstractHigh-throughput biological data analysis commonly involves identifying “interesting” features (e.g., genes, genomic regions, and proteins), whose values differ between two conditions, from numerous features measured simultaneously. The most widely-used criterion to ensure the analysis reliability is the false discovery rate (FDR), the expected proportion of uninteresting features among the identified ones. Existing bioinformatics tools primarily control the FDR based on p-values. However, obtaining valid p-values relies on either reasonable assumptions of data distribution or large numbers of replicates under both conditions, two requirements that are often unmet in biological studies. To address this issue, we propose Clipper, a general statistical framework for FDR control without relying on p-values or specific data distributions. Clipper is applicable to identifying both enriched and differential features from high-throughput biological data of diverse types. In comprehensive simulation and real-data benchmarking, Clipper outperforms existing generic FDR control methods and specific bioinformatics tools designed for various tasks, including peak calling from ChIP-seq data, differentially expressed gene identification from RNA-seq data, differentially interacting chromatin region identification from Hi-C data, and peptide identification from mass spectrometry data. Notably, our benchmarking results for peptide identification are based on the first mass spectrometry data standard with a realistic dynamic range. Our results demonstrate Clipper’s flexibility and reliability for FDR control, as well as its broad applications in high-throughput data analysis.


2017 ◽  
Vol 23 (5) ◽  
pp. 644-646 ◽  
Author(s):  
Maria Pia Sormani

The calculation of the sample size needed for a clinical study is the challenge most frequently put to statisticians, and it is one of the most relevant issues in the study design. The correct size of the study sample optimizes the number of patients needed to get the result, that is, to detect the minimum treatment effect that is clinically relevant. Minimizing the sample size of a study has the advantage of reducing costs, enhancing feasibility, and also has ethical implications. In this brief report, I will explore the main concepts on which the sample size calculation is based.


1994 ◽  
Vol 13 (8) ◽  
pp. 859-870 ◽  
Author(s):  
Robert P. McMahon ◽  
Michael Proschan ◽  
Nancy L. Geller ◽  
Peter H. Stone ◽  
George Sopko

Author(s):  
Yongjoo Kim ◽  
Jongeun Lee ◽  
A. Shrivastava ◽  
J. W. Yoon ◽  
Doosan Cho ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document