Designing and sample size calculation in presence of heterogeneity in biological studies involving high-throughput data.

RNAseqPS: A Web Tool for Estimating Sample Size and Power for RNAseq Experiment

Cancer Informatics ◽

10.4137/cin.s17688 ◽

2014 ◽

Vol 13s6 ◽

pp. CIN.S17688 ◽

Cited By ~ 8

Author(s):

Yan Guo ◽

Shilin Zhao ◽

Chung-I Li ◽

Quanhu Sheng ◽

Yu Shyr

Keyword(s):

Experimental Design ◽

Sample Size ◽

High Throughput ◽

Power Analysis ◽

High Throughput Sequencing ◽

Genome Wide Association Study ◽

Negative Binomial ◽

Sample Size Calculation ◽

Power Calculation ◽

Biological Studies

Sample size and power determination is the first step in the experimental design of a successful study. Sample size and power calculation is required for applications for National Institutes of Health (NIH) funding. Sample size and power calculation is well established for traditional biological studies such as mouse model, genome wide association study (GWAS), and microarray studies. Recent developments in high-throughput sequencing technology have allowed RNAseq to replace microarray as the technology of choice for high-throughput gene expression profiling. However, the sample size and power analysis of RNAseq technology is an underdeveloped area. Here, we present RNAseqPS, an advanced online RNAseq power and sample size calculation tool based on the Poisson and negative binomial distributions. RNAseqPS was built using the Shiny package in R. It provides an interactive graphical user interface that allows the users to easily conduct sample size and power analysis for RNAseq experimental design. RNAseqPS can be accessed directly at http://cqs.mc.vanderbilt.edu/shiny/RNAseqPS/ .

Download Full-text

Practical approach to determine sample size for building logistic prediction models using high-throughput data

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2014.12.010 ◽

2015 ◽

Vol 53 ◽

pp. 355-362 ◽

Cited By ~ 3

Author(s):

Dae-Soon Son ◽

DongHyuk Lee ◽

Kyusang Lee ◽

Sin-Ho Jung ◽

Taejin Ahn ◽

...

Keyword(s):

Sample Size ◽

High Throughput ◽

Prediction Models ◽

Practical Approach ◽

High Throughput Data

Download Full-text

Power and Stability Properties of Resampling-Based Multiple Testing Procedures with Applications to Gene Oncology Studies

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/610297 ◽

2013 ◽

Vol 2013 ◽

pp. 1-11 ◽

Cited By ~ 6

Author(s):

Dongmei Li ◽

Timothy D. Dye

Keyword(s):

Data Analysis ◽

Sample Size ◽

High Throughput ◽

Rate Control ◽

Multiple Testing ◽

Testing Procedures ◽

High Throughput Data ◽

False Discovery ◽

Multiple Testing Procedures ◽

High Throughput Data Analysis

Resampling-based multiple testing procedures are widely used in genomic studies to identify differentially expressed genes and to conduct genome-wide association studies. However, the power and stability properties of these popular resampling-based multiple testing procedures have not been extensively evaluated. Our study focuses on investigating the power and stability of seven resampling-based multiple testing procedures frequently used in high-throughput data analysis for small sample size data through simulations and gene oncology examples. The bootstrap single-step minPprocedure and the bootstrap step-down minPprocedure perform the best among all tested procedures, when sample size is as small as 3 in each group and either familywise error rate or false discovery rate control is desired. When sample size increases to 12 and false discovery rate control is desired, the permutation maxTprocedure and the permutation minPprocedure perform best. Our results provide guidance for high-throughput data analysis when sample size is small.

Download Full-text

Clipper: p-value-free FDR control on high-throughput data from two conditions

10.1101/2020.11.19.390773 ◽

2020 ◽

Author(s):

Xinzhou Ge ◽

Yiling Elaine Chen ◽

Dongyuan Song ◽

MeiLu McDermott ◽

Kyla Woyshner ◽

...

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

High Throughput ◽

Peptide Identification ◽

Biological Data ◽

Mass Spectrometry Data ◽

P Values ◽

High Throughput Data ◽

Bioinformatics Tools ◽

Biological Studies

AbstractHigh-throughput biological data analysis commonly involves identifying “interesting” features (e.g., genes, genomic regions, and proteins), whose values differ between two conditions, from numerous features measured simultaneously. The most widely-used criterion to ensure the analysis reliability is the false discovery rate (FDR), the expected proportion of uninteresting features among the identified ones. Existing bioinformatics tools primarily control the FDR based on p-values. However, obtaining valid p-values relies on either reasonable assumptions of data distribution or large numbers of replicates under both conditions, two requirements that are often unmet in biological studies. To address this issue, we propose Clipper, a general statistical framework for FDR control without relying on p-values or specific data distributions. Clipper is applicable to identifying both enriched and differential features from high-throughput biological data of diverse types. In comprehensive simulation and real-data benchmarking, Clipper outperforms existing generic FDR control methods and specific bioinformatics tools designed for various tasks, including peak calling from ChIP-seq data, differentially expressed gene identification from RNA-seq data, differentially interacting chromatin region identification from Hi-C data, and peptide identification from mass spectrometry data. Notably, our benchmarking results for peptide identification are based on the first mass spectrometry data standard with a realistic dynamic range. Our results demonstrate Clipper’s flexibility and reliability for FDR control, as well as its broad applications in high-throughput data analysis.

Download Full-text

Faculty Opinions recommendation of Finding disease genes: a fast and flexible approach for analyzing high-throughput data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13277014.14635129 ◽

2011 ◽

Author(s):

Alejandro Schaffer

Keyword(s):

High Throughput ◽

Disease Genes ◽

Flexible Approach ◽

High Throughput Data

Download Full-text

Robust and Efficient Parametric Spectral Density Estimation for High-Throughput Data

Technometrics ◽

10.1080/00401706.2021.1884134 ◽

2021 ◽

pp. 1-22

Author(s):

Martin Lysy ◽

Feiyu Zhu ◽

Bryan Yates ◽

Aleksander Labuda

Keyword(s):

Spectral Density ◽

Density Estimation ◽

High Throughput ◽

Spectral Density Estimation ◽

High Throughput Data

Download Full-text

Simulation-Based Sample-Size Calculation for Designing New Clinical Trials and Diagnostic Test Accuracy Studies to Update an Existing Meta-Analysis

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x1301300302 ◽

2013 ◽

Vol 13 (3) ◽

pp. 451-473 ◽

Cited By ~ 6

Author(s):

Michael J. Crowther ◽

Sally R. Hinchliffe ◽

Alison Donald ◽

Alex J. Sutton

Keyword(s):

Clinical Trials ◽

Sample Size ◽

Diagnostic Test ◽

Meta Analysis ◽

Sample Size Calculation ◽

Diagnostic Test Accuracy ◽

Test Accuracy ◽

Simulation Based

Download Full-text

The most frequently asked question to a statistician: The sample size

Multiple Sclerosis Journal ◽

10.1177/1352458517695471 ◽

2017 ◽

Vol 23 (5) ◽

pp. 644-646 ◽

Cited By ~ 2

Author(s):

Maria Pia Sormani

Keyword(s):

Clinical Study ◽

Sample Size ◽

Study Design ◽

Treatment Effect ◽

Sample Size Calculation ◽

Ethical Implications ◽

Number Of Patients ◽

Reducing Costs ◽

Correct Size

The calculation of the sample size needed for a clinical study is the challenge most frequently put to statisticians, and it is one of the most relevant issues in the study design. The correct size of the study sample optimizes the number of patients needed to get the result, that is, to detect the minimum treatment effect that is clinically relevant. Minimizing the sample size of a study has the advantage of reducing costs, enhancing feasibility, and also has ethical implications. In this brief report, I will explore the main concepts on which the sample size calculation is based.

Download Full-text

Sample size calculation for clinical trials in which entry criteria and outcomes are counts of events

Statistics in Medicine ◽

10.1002/sim.4780130806 ◽

1994 ◽

Vol 13 (8) ◽

pp. 859-870 ◽

Cited By ~ 18

Author(s):

Robert P. McMahon ◽

Michael Proschan ◽

Nancy L. Geller ◽

Peter H. Stone ◽

George Sopko

Keyword(s):

Clinical Trials ◽

Sample Size ◽

Sample Size Calculation

Download Full-text

High Throughput Data Mapping for Coarse-Grained Reconfigurable Architectures

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ◽

10.1109/tcad.2011.2161217 ◽

2011 ◽

Vol 30 (11) ◽

pp. 1599-1609 ◽

Cited By ~ 22

Author(s):

Yongjoo Kim ◽

Jongeun Lee ◽

A. Shrivastava ◽

J. W. Yoon ◽

Doosan Cho ◽

...

Keyword(s):

High Throughput ◽

Coarse Grained ◽

Reconfigurable Architectures ◽

Data Mapping ◽

High Throughput Data

Download Full-text