Reproducibility of R-fMRI metrics on the impact of different strategies for multiple comparison correction and sample sizes

Xiao Chen; Bin Lu; Chao-Gan Yan

doi:10.1002/hbm.23843

Reproducibility of R-fMRI Metrics on the Impact of Different Strategies for Multiple Comparison Correction and Sample Sizes

10.1101/128645 ◽

2017 ◽

Cited By ~ 2

Author(s):

Xiao Chen ◽

Bin Lu ◽

Chao-Gan Yan

Keyword(s):

Sex Differences ◽

Sample Size ◽

Small Sample ◽

Multiple Comparison ◽

Sample Sizes ◽

Retest Reliability ◽

Test Retest Reliability ◽

The Impact ◽

Correction Strategies ◽

Multiple Comparison Correction

ABSTRACTConcerns regarding reproducibility of resting-state functional magnetic resonance imaging (R-fMRI) findings have been raised. Little is known about how to operationally define R-fMRI reproducibility and to what extent it is affected by multiple comparison correction strategies and sample size. We comprehensively assessed two aspects of reproducibility, test-retest reliability and replicability, on widely used R-fMRI metrics in both between-subject contrasts of sex differences and within-subject comparisons of eyes-open and eyes-closed (EOEC) conditions. We noted permutation test with Threshold-Free Cluster Enhancement (TFCE), a strict multiple comparison correction strategy, reached the best balance between family-wise error rate (under 5%) and test-retest reliability / replicability (e.g., 0.68 for test-retest reliability and 0.25 for replicability of amplitude of low-frequency fluctuations (ALFF) for between-subject sex differences, 0.49 for replicability of ALFF for within-subject EOEC differences). Although R-fMRI indices attained moderate reliabilities, they replicated poorly in distinct datasets (replicability < 0.3 for between-subject sex differences, < 0.5 for within-subject EOEC differences). By randomly drawing different sample sizes from a single site, we found reliability, sensitivity and positive predictive value (PPV) rose as sample size increased. Small sample sizes (e.g., < 80 (40 per group)) not only minimized power (sensitivity < 2%), but also decreased the likelihood that significant results reflect “true” effects (PPV < 0.26) in sex differences. Our findings have implications for how to select multiple comparison correction strategies and highlight the importance of sufficiently large sample sizes in R-fMRI studies to enhance reproducibility.

Download Full-text

scHinter: imputing dropout events for single-cell RNA-seq data with limited sample size

Bioinformatics ◽

10.1093/bioinformatics/btz627 ◽

2019 ◽

Author(s):

Pengchao Ye ◽

Wenbin Ye ◽

Congting Ye ◽

Shuchao Li ◽

Lishan Ye ◽

...

Keyword(s):

Gene Expression ◽

Sample Size ◽

Single Cell ◽

Wide Spectrum ◽

Dropout Rate ◽

Supplementary Information ◽

Sample Sizes ◽

Limited Sample ◽

Limited Sample Size ◽

The Impact

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) is fast and becoming a powerful technique for studying dynamic gene regulation at unprecedented resolution. However, scRNA-seq data suffer from problems of extremely high dropout rate and cell-to-cell variability, demanding new methods to recover gene expression loss. Despite the availability of various dropout imputation approaches for scRNA-seq, most studies focus on data with a medium or large number of cells, while few studies have explicitly investigated the differential performance across different sample sizes or the applicability of the approach on small or imbalanced data. It is imperative to develop new imputation approaches with higher generalizability for data with various sample sizes. Results We proposed a method called scHinter for imputing dropout events for scRNA-seq with special emphasis on data with limited sample size. scHinter incorporates a voting-based ensemble distance and leverages the synthetic minority oversampling technique for random interpolation. A hierarchical framework is also embedded in scHinter to increase the reliability of the imputation for small samples. We demonstrated the ability of scHinter to recover gene expression measurements across a wide spectrum of scRNA-seq datasets with varied sample sizes. We comprehensively examined the impact of sample size and cluster number on imputation. Comprehensive evaluation of scHinter across diverse scRNA-seq datasets with imbalanced or limited sample size showed that scHinter achieved higher and more robust performance than competing approaches, including MAGIC, scImpute, SAVER and netSmooth. Availability and implementation Freely available for download at https://github.com/BMILAB/scHinter. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Beyond forest plots: clinical gestalt and its influence on COPD telemonitoring studies and outcomes review

BMJ Open ◽

10.1136/bmjopen-2019-030779 ◽

2019 ◽

Vol 9 (12) ◽

pp. e030779 ◽

Cited By ~ 2

Author(s):

Sheree M Smith ◽

Anne E Holland ◽

Christine F McDonald

Keyword(s):

Decision Making ◽

Length Of Stay ◽

Systematic Reviews ◽

Verbal Communication ◽

Small Sample ◽

Acute Hospital ◽

Cochrane Library ◽

Sample Sizes ◽

Health Services Needs ◽

The Impact

BackgroundChronic obstructive pulmonary disease (COPD) is a progressive chronic condition. Improvements in therapies have resulted in better patient outcomes. The use of technology such as telemonitoring as an additional intervention is aimed at enhancing care and reducing unnecessary acute hospital service use. The influence of verbal communication between health staff and patients to inform decision making regarding use of acute hospital services within telemonitoring studies has not been assessed.MethodA systematic overview of published systematic reviews of COPD and telemonitoring was conducted using ana prioriprotocol to ascertain the impact of verbal communication in telemonitoring studies on health service outcomes such as emergency department attendances, hospitalisation and hospital length of stay. The search of the following electronic databases: Cochrane Library, Medline, Pubmed, CINAHL, Embase, TROVE, Australian Digital Thesis and Proquest International Dissertations and Theses was conducted in 2017 and updated in September 2019.ResultsSix systematic reviews were identified. All reviews involved home monitoring of COPD symptoms and biometric data. Included reviews reported 5–28 studies with sample sizes ranging from 310 to 2891 participants. Many studies reported in the systematic reviews were excluded as they were telephone support, cost effectiveness studies, and/or did not report the outcomes of interest for this overview. Irrespective of group assignment, verbal communication with the health or research team did not alter the emergency attendance or hospitalisation outcome. The length of stay was longer for those who were assigned home telemonitoring in the majority of studies.ConclusionThis overview of telemonitoring for COPD had small sample sizes and a wide variety of included studies. Communication was not consistent in all included studies. Understanding the context of communication with study participants and the decision-making process for referring patients to various health services needs to be reported in future studies of telemonitoring and COPD.

Download Full-text

Estimating sample size in the presence of competing risks – Cause-specific hazard or cumulative incidence approach?

Statistical Methods in Medical Research ◽

10.1177/0962280215623107 ◽

2015 ◽

Vol 27 (1) ◽

pp. 114-125 ◽

Cited By ~ 6

Author(s):

BC Tai ◽

ZJ Chen ◽

D Machin

Keyword(s):

Sample Size ◽

Competing Risks ◽

Cumulative Incidence ◽

Exponential Model ◽

Sample Sizes ◽

Subdistribution Hazard ◽

Competing Events ◽

Event Times ◽

The Impact

In designing randomised clinical trials involving competing risks endpoints, it is important to consider competing events to ensure appropriate determination of sample size. We conduct a simulation study to compare sample sizes obtained from the cause-specific hazard and cumulative incidence (CMI) approaches, by first assuming exponential event times. As the proportional subdistribution hazard assumption does not hold for the CMI exponential (CMIExponential) model, we further investigate the impact of violation of such an assumption by comparing the results obtained from the CMI exponential model with those of a CMI model assuming a Gompertz distribution (CMIGompertz) where the proportional assumption is tenable. The simulation suggests that the CMIExponential approach requires a considerably larger sample size when treatment reduces the hazards of both the main event, A, and the competing risk, B. When treatment has a beneficial effect on A but no effect on B, the sample sizes required by both methods are largely similar, especially for large reduction in the main risk. If treatment has a protective effect on A but adversely affects B, then the sample size required by CMIExponential is notably smaller than cause-specific hazard for small to moderate reduction in the main risk. Further, a smaller sample size is required for CMIGompertz as compared with CMIExponential. The choice between a cause-specific hazard or CMI model in competing risks outcomes has implications on the study design. This should be made on the basis of the clinical question of interest and the validity of the associated model assumption.

Download Full-text

The significance of DOHaD for Small Island Developing States

Journal of Developmental Origins of Health and Disease ◽

10.1017/s2040174418000466 ◽

2018 ◽

Vol 9 (5) ◽

pp. 487-491 ◽

Cited By ~ 5

Author(s):

S. Tu’akoi ◽

M. H. Vickers ◽

K. Tairea ◽

Y. Y. M. Aung ◽

N. Tamarua-Herman ◽

...

Keyword(s):

Early Life ◽

Disease Risk ◽

Communicable Disease ◽

Adult Population ◽

Well Being ◽

Small Sample ◽

Small Island ◽

Sample Sizes ◽

Small Island Developing States ◽

The Impact

AbstractSmall Island Developing States (SIDS) are island nations that experience specific social, economic and environmental vulnerabilities associated with small populations, isolation and limited resources. Globally, SIDS exhibit exceptionally high rates of non-communicable disease (NCD) risk and incidence. Despite this, there is a lack of context-specific research within SIDS focused on life course approaches to NCD prevention, particularly the impact of the early-life environment on later disease risk as defined by the Developmental Origins of Health and Disease (DOHaD) framework. Given that globalization has contributed to significant nutritional transitions in these populations, the DOHaD paradigm is highly relevant. SIDS in the Pacific region have the highest rates of NCD risk and incidence globally. Transitions from traditional foods grown locally to reliance on importation of Western-style processed foods high in fat and sugar are common. The Cook Islands is one Pacific SIDS that reports this transition, alongside rising overweight/obesity rates, currently 91%/72%, in the adult population. However, research on early-life NCD prevention within this context, as in many low- and middle-income countries, is scarce. Although traditional research emphasizes the need for large sample sizes, this is rarely possible in the smaller SIDS. In these vulnerable, high priority countries, consideration should be given to utilizing ‘small’ sample sizes that encompass a high proportion of the total population. This may enable contextually relevant research, crucial to inform NCD prevention strategies that can contribute to improving health and well-being for these at-risk communities.

Download Full-text

The Impact of Improved Microarray Coverage and Larger Sample Sizes on Future Genome-Wide Association Studies

Genetic Epidemiology ◽

10.1002/gepi.21724 ◽

2013 ◽

Vol 37 (4) ◽

pp. 383-392 ◽

Cited By ~ 16

Author(s):

Karla J. Lindquist ◽

Eric Jorgenson ◽

Thomas J. Hoffmann ◽

John S. Witte

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Sample Sizes ◽

Genome Wide ◽

The Impact ◽

Larger Sample

Download Full-text

Evaluation of multiple comparison correction procedures in drug assessment studies using LORETA maps

Medical & Biological Engineering & Computing ◽

10.1007/s11517-015-1315-6 ◽

2015 ◽

Vol 53 (10) ◽

pp. 1011-1023 ◽

Cited By ~ 8

Author(s):

Joan Francesc Alonso ◽

Sergio Romero ◽

Miguel Ángel Mañanas ◽

Mónica Rojas ◽

Jordi Riba ◽

...

Keyword(s):

Multiple Comparison ◽

Multiple Comparison Correction

Download Full-text

APPROPRIATE CRITICAL VALUES FOR PAIRWISE COMPARISONS OF MULTIPLE COMPARISON PROCEDURES FOR NORMAL VARIANCES WHEN SAMPLE SIZES ARE UNBALANCED

Bulletin of informatics and cybernetics ◽

10.5109/4102449 ◽

2020 ◽

Vol 52 (2) ◽

pp. 1-15

Author(s):

Tsunehisa Imada

Keyword(s):

Critical Values ◽

Multiple Comparison ◽

Pairwise Comparisons ◽

Sample Sizes ◽

Multiple Comparison Procedures

Download Full-text

Simulation and minimization: Technical advances for factorial experiments designed to optimize clinical interventions

10.21203/rs.2.12285/v2 ◽

2019 ◽

Author(s):

Jocelyn Lara Kuhn ◽

Radley Christopher Sheldrick ◽

Sarabeth Broder-Fingert ◽

Andrea Chu ◽

Lisa Fortuna ◽

...

Keyword(s):

Computer Simulation ◽

Random Element ◽

Optimization Strategy ◽

Randomized Experiments ◽

Factorial Experiments ◽

Sample Sizes ◽

Multiphase Optimization Strategy ◽

Technical Advances ◽

Healthcare Interventions ◽

The Impact

Abstract Background: The Multiphase Optimization Strategy (MOST) is designed to maximize the impact of clinical healthcare interventions, which are typically multicomponent and increasingly complex. MOST often relies on factorial experiments to identify which components of an intervention are most effective, efficient, and scalable. When assigning participants to conditions in factorial experiments, researchers must be careful to select the assignment procedure that will result in balanced sample sizes and equivalence of covariates across conditions while maintaining unpredictability. Methods: In the context of a MOST optimization trial with a 2x2x2x2 factorial design, we used computer simulation to empirically test five subject allocation procedures: simple randomization, stratified randomization with permuted blocks, maximum tolerated imbalance (MTI), minimal sufficient balance (MSB), and minimization. We compared these methods across the 16 study cells with respect to sample size balance, equivalence on key covariates, and unpredictability. Leveraging an existing dataset to compare these procedures, we conducted 250 computerized simulations using bootstrap samples of 304 participants. Results: Simple randomization, the most unpredictable procedure, generated poor sample balance and equivalence of covariates across the 16 study cells. Stratified randomization with permuted blocks performed well on stratified variables but resulted in poor equivalence on other covariates and poor balance. MTI, MSB, and minimization had higher complexity and cost. MTI resulted in balance close to pre-specified thresholds and a higher degree of unpredictability, but poor equivalence of covariates. MSB had 19.7% deterministic allocations, poor sample balance and improved equivalence on only a few covariates. Minimization was most successful in achieving balanced sample sizes and equivalence across a large number of covariates, but resulted in 34% deterministic allocations. Small differences in proportion of correct guesses were found across the procedures. Conclusions: Computer simulation was highly useful for evaluating tradeoffs among randomization procedures. Based on the computer simulation results and priorities within the study context, minimization with a random element was selected for the planned research study. Minimization with a random element, as well as computer simulation to make an informed randomization procedure choice, are utilized infrequently in randomized experiments but represent important technical advances that researchers implementing multi-arm and factorial studies should consider.

Download Full-text

Analytical guidelines to increase the value of citizen science data: using eBird data to estimate species occurrence

10.1101/574392 ◽

2019 ◽

Cited By ~ 8

Author(s):

A Johnston ◽

WM Hochachka ◽

ME Strimas-Mackey ◽

V Ruiz Gutierrez ◽

OJ Robinson ◽

...

Keyword(s):

Data Processing ◽

Citizen Science ◽

Model Performance ◽

Species Distributions ◽

Ecological Knowledge ◽

Sample Sizes ◽

Science Data ◽

Science Projects ◽

Wide Range ◽

The Impact

AbstractCitizen science data are valuable for addressing a wide range of ecological research questions, and there has been a rapid increase in the scope and volume of data available. However, data from large-scale citizen science projects typically present a number of challenges that can inhibit robust ecological inferences. These challenges include: species bias, spatial bias, and variation in effort.To demonstrate addressing key challenges in analysing citizen science data, we use the example of estimating species distributions with data from eBird, a large semi-structured citizen science project. We estimate two widely applied metrics of species distributions: encounter rate and occupancy probability. For each metric, we assess the impact of data processing steps that either degrade or refine the data used in the analyses. We also test whether differences in model performance are maintained at different sample sizes.Model performance improved when data processing and analytical methods addressed the challenges arising from citizen science data. The largest gains in model performance were achieved with: 1) the use of complete checklists (where observers report all the species they detect and identify); and 2) the use of covariates describing variation in effort and detectability for each checklist. Occupancy models were more robust to a lack of complete checklists and effort variables. Improvements in model performance with data refinement were more evident with larger sample sizes.Here, we describe processes to refine semi-structured citizen science data to estimate species distributions. We demonstrate the value of complete checklists, which can inform the design and adaptation of citizen science projects. We also demonstrate the value of information on effort. The methods we have outlined are also likely to improve other forms of inference, and will enable researchers to conduct robust analyses and harness the vast ecological knowledge that exists within citizen science data.

Download Full-text