Identification of a nonseparable model under endogeneity using binary proxies for unobserved heterogeneity

Benjamin Williams

doi:10.3982/qe674

Identification of a nonseparable model under endogeneity using binary proxies for unobserved heterogeneity

Quantitative Economics ◽

10.3982/qe674 ◽

2019 ◽

Vol 10 (2) ◽

pp. 527-563 ◽

Cited By ~ 2

Author(s):

Benjamin Williams

Keyword(s):

Sample Size ◽

Instrumental Variable ◽

Unobserved Heterogeneity ◽

Real Data ◽

Fixed Number ◽

Nonparametric Estimator

In this paper, I study identification of a nonseparable model with endogeneity arising due to unobserved heterogeneity. Identification relies on the availability of binary proxies that can be used to control for the unobserved heterogeneity. I show that the model is identified in the limit as the number of proxies increases. The argument does not require an instrumental variable that is excluded from the outcome equation nor does it require the support of the unobserved heterogeneity to be finite. I then propose a nonparametric estimator that is consistent as the number of proxies increases with the sample size. I also show that, for a fixed number of proxies, nontrivial bounds on objects of interest can be obtained. Finally, I study two real data applications that illustrate computation of the bounds and estimation with a large number of items.

Download Full-text

Mendelian randomisation for mediation analysis: current methods and challenges for implementation

European Journal of Epidemiology ◽

10.1007/s10654-021-00757-1 ◽

2021 ◽

Author(s):

Alice R. Carter ◽

Eleanor Sanderson ◽

Gemma Hammerton ◽

Rebecca C. Richmond ◽

George Davey Smith ◽

...

Keyword(s):

Measurement Error ◽

Causal Inference ◽

Mediation Analysis ◽

Instrumental Variable ◽

Real Data ◽

Mendelian Randomisation ◽

Differential Measurement ◽

Individual Level ◽

Differential Measurement Error ◽

Summary Data

AbstractMediation analysis seeks to explain the pathway(s) through which an exposure affects an outcome. Traditional, non-instrumental variable methods for mediation analysis experience a number of methodological difficulties, including bias due to confounding between an exposure, mediator and outcome and measurement error. Mendelian randomisation (MR) can be used to improve causal inference for mediation analysis. We describe two approaches that can be used for estimating mediation analysis with MR: multivariable MR (MVMR) and two-step MR. We outline the approaches and provide code to demonstrate how they can be used in mediation analysis. We review issues that can affect analyses, including confounding, measurement error, weak instrument bias, interactions between exposures and mediators and analysis of multiple mediators. Description of the methods is supplemented by simulated and real data examples. Although MR relies on large sample sizes and strong assumptions, such as having strong instruments and no horizontally pleiotropic pathways, our simulations demonstrate that these methods are unaffected by confounders of the exposure or mediator and the outcome and non-differential measurement error of the exposure or mediator. Both MVMR and two-step MR can be implemented in both individual-level MR and summary data MR. MR mediation methods require different assumptions to be made, compared with non-instrumental variable mediation methods. Where these assumptions are more plausible, MR can be used to improve causal inference in mediation analysis.

Download Full-text

Erratum to: Sample size calculations for cluster randomised controlled trials with a fixed number of clusters

BMC Medical Research Methodology ◽

10.1186/s12874-017-0292-x ◽

2017 ◽

Vol 17 (1) ◽

Author(s):

Karla Hemming ◽

Alan J Girling ◽

Alice J Sitch ◽

Jennifer Marsh ◽

Richard J Lilford

Keyword(s):

Sample Size ◽

Randomised Controlled Trials ◽

Fixed Number ◽

Controlled Trials ◽

Number Of Clusters ◽

Cluster Randomised ◽

Sample Size Calculations ◽

Randomised Controlled

Download Full-text

SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data

BioMed Research International ◽

10.1155/2015/780519 ◽

2015 ◽

Vol 2015 ◽

pp. 1-5 ◽

Cited By ~ 2

Author(s):

Yuxiang Tan ◽

Yann Tambouret ◽

Stefano Monti

Keyword(s):

Sample Size ◽

Rna Sequencing ◽

High Throughput Sequencing ◽

Performance Metrics ◽

Simulated Data ◽

Real Data ◽

Rna Seq ◽

Sequencing Data ◽

Detection Algorithms ◽

Fusion Detection

The performance evaluation of fusion detection algorithms from high-throughput sequencing data crucially relies on the availability of data with known positive and negative cases of gene rearrangements. The use of simulated data circumvents some shortcomings of real data by generation of an unlimited number of true and false positive events, and the consequent robust estimation of accuracy measures, such as precision and recall. Although a few simulated fusion datasets from RNA Sequencing (RNA-Seq) are available, they are of limited sample size. This makes it difficult to systematically evaluate the performance of RNA-Seq based fusion-detection algorithms. Here, we present SimFuse to address this problem. SimFuse utilizes real sequencing data as the fusions’ background to closely approximate the distribution of reads from a real sequencing library and uses a reference genome as the template from which to simulate fusions’ supporting reads. To assess the supporting read-specific performance, SimFuse generates multiple datasets with various numbers of fusion supporting reads. Compared to an extant simulated dataset, SimFuse gives users control over the supporting read features and the sample size of the simulated library, based on which the performance metrics needed for the validation and comparison of alternative fusion-detection algorithms can be rigorously estimated.

Download Full-text

Two-Stage Adaptive Optimal Design with Fixed First-Stage Sample Size

Journal of Probability and Statistics ◽

10.1155/2012/436239 ◽

2012 ◽

Vol 2012 ◽

pp. 1-15 ◽

Cited By ~ 5

Author(s):

Adam Lane ◽

Nancy Flournoy

Keyword(s):

Maximum Likelihood ◽

Optimal Design ◽

Sample Size ◽

Asymptotic Distribution ◽

Fixed Number ◽

Maximum Likelihood Estimates ◽

Small Pilot Study ◽

Distribution Of The Maximum ◽

Optimal Procedures ◽

Mean Function

In adaptive optimal procedures, the design at each stage is an estimate of the optimal design based on all previous data. Asymptotics for regular models with fixed number of stages are straightforward if one assumes the sample size of each stage goes to infinity with the overall sample size. However, it is not uncommon for a small pilot study of fixed size to be followed by a much larger experiment. We study the large sample behavior of such studies. For simplicity, we assume a nonlinear regression model with normal errors. We show that the distribution of the maximum likelihood estimates converges to a scale mixture family of normal random variables. Then, for a one parameter exponential mean function we derive the asymptotic distribution of the maximum likelihood estimate explicitly and present a simulation to compare the characteristics of this asymptotic distribution with some commonly used alternatives.

Download Full-text

Sample size and power calculations in Mendelian randomization with a single instrumental variable and a binary outcome

International Journal of Epidemiology ◽

10.1093/ije/dyu005 ◽

2014 ◽

Vol 43 (3) ◽

pp. 922-929 ◽

Cited By ~ 129

Author(s):

S. Burgess

Keyword(s):

Sample Size ◽

Instrumental Variable ◽

Mendelian Randomization ◽

Binary Outcome ◽

Power Calculations

Download Full-text

Are methylation beta-values simplex distributed?

10.1101/753459 ◽

2019 ◽

Author(s):

Lara Nonell ◽

Juan R González

Keyword(s):

Dna Methylation ◽

Sample Size ◽

Microarray Data ◽

Regression Models ◽

Real Data ◽

Small Sample ◽

Beta Regression ◽

Data Sets ◽

Methylation Data ◽

Simplex Distribution

AbstractDNA methylation plays an important role in the development and progression of disease. Beta-values are the standard methylation measures. Different statistical methods have been proposed to assess differences in methylation between conditions. However, most of them do not completely account for the distribution of beta-values. The simplex distribution can accommodate beta-values data. We hypothesize that simplex is a quite flexible distribution which is able to model methylation data.To test our hypothesis, we conducted several analyses using four real data sets obtained from microarrays and sequencing technologies. Standard data distributions were studied and modelled in comparison to the simplex. Besides, some simulations were conducted in different scenarios encompassing several distribution assumptions, regression models and sample sizes. Finally, we compared DNA methylation between females and males in order to benchmark the assessed methodologies under different scenarios.According to the results obtained by the simulations and real data analyses, DNA methylation data are concordant with the simplex distribution in many situations. Simplex regression models work well in small sample size data sets. However, when sample size increases, other models such as the beta regression or even the linear regression can be employed to assess group comparisons and obtain unbiased results. Based on these results, we can provide some practical recommendations when analyzing methylation data: 1) use data sets of at least 10 samples per studied condition for microarray data sets or 30 in NGS data sets, 2) apply a simplex or beta regression model for microarray data, 3) apply a linear model in any other case.

Download Full-text

Reference range: Which statistical intervals to use?

Statistical Methods in Medical Research ◽

10.1177/0962280220961793 ◽

2020 ◽

pp. 096228022096179

Author(s):

Wei Liu ◽

Frank Bretz ◽

Mario Cortina-Borja

Keyword(s):

Sample Size ◽

Random Sample ◽

Reference Range ◽

Prediction Interval ◽

Real Data ◽

Reference Ranges ◽

Tolerance Interval ◽

Occurrence Probability ◽

Clinical Laboratories ◽

Key Points

Reference ranges, which are data-based intervals aiming to contain a pre-specified large proportion of the population values, are powerful tools to analyse observations in clinical laboratories. Their main point is to classify any future observations from the population which fall outside them as atypical and thus may warrant further investigation. As a reference range is constructed from a random sample from the population, the event ‘a reference range contains [Formula: see text] of the population’ is also random. Hence, all we can hope for is that such event has a large occurrence probability. In this paper we argue that some intervals, including the P prediction interval, are not suitable as reference ranges since there is a substantial probability that these intervals contain less than [Formula: see text] of the population, especially when the sample size is large. In contrast, a [Formula: see text] tolerance interval is designed to contain [Formula: see text] of the population with a pre-specified large confidence γ so it is eminently adequate as a reference range. An example based on real data illustrates the paper’s key points.

Download Full-text

New exact Bayesian prediction of the range for the exponential lifetime based on fixed and random sample sizes

International Journal of Algebra and Statistics ◽

10.20454/ijas.2017.1369 ◽

2017 ◽

Vol 6 (1-2) ◽

pp. 169

Author(s):

A. H. Abd Ellah

Keyword(s):

Sample Size ◽

Random Sample ◽

Real Data ◽

Predictive Distribution ◽

Data Sets ◽

Life Testing ◽

Random Sample Size ◽

Fixed Sample Size ◽

Fixed Sample ◽

Special Case

We consider the problem of predictive interval for the range of the future observations from an exponential distribution. Two cases are considered, (1) Fixed sample size (FSS). (2) Random sample size (RSS). Further, I derive the predictive function for both FSS and RSS in closely forms. Random sample size is appeared in many application of life testing. Fixed sample size is a special case from the case of random sample size. Illustrative examples are given. Factors of the predictive distribution are given. A comparison in savings is made with the above method. To show the applications of our results, we present some simulation experiments. Finally, we apply our results to some real data sets in life testing.

Download Full-text

Parameter Estimation of Inverse Rayleigh Distribution under Competing Risk Model for Masked data

Journal of Institute of Science and Technology ◽

10.3126/jist.v20i2.13965 ◽

2015 ◽

Vol 20 (2) ◽

pp. 122-127 ◽

Cited By ~ 1

Author(s):

M.S. Panwar ◽

Bapat Akanshya Sudhir ◽

Rashmi Bundel ◽

Sanjeev K. Tomer

Keyword(s):

Sample Size ◽

Mean Square Error ◽

Confidence Intervals ◽

Risk Model ◽

Life Time ◽

Real Data ◽

Rayleigh Distribution ◽

Mean Square ◽

True Value ◽

The Mean

This paper tries to derive maximum likelihood estimators (MLEs) for the parameters of the inverse Rayleigh distribution (IRD) when the observed data is masked. MLEs, asymptotic confidence intervals (ACIs) and boot-p confidence intervals (boot-p CIs) for the lifetime parameters have been discussed. The simulation illustrations provided that as the sample size increases the estimated value approaches to the true value, and the mean square error decreases with the increase in sample size, and mean square error increases with increase in level of masking, the ACIs are always symmetric and the boot-p CIs approaches to symmetry as the sample size increases whereas the mean life time due to the local spread of the disease is less than that due to the metastasis spread in case of real data set.Journal of Institute of Science and Technology, 2015, 20(2): 122-127

Download Full-text

Many instruments: Implementation in Stata

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x19893627 ◽

2019 ◽

Vol 19 (4) ◽

pp. 849-866

Author(s):

Stanislav Anatolyev ◽

Alena Skolkova

Keyword(s):

Instrumental Variables ◽

Instrumental Variable ◽

Simulation Experiment ◽

Real Data ◽

Projection Matrix ◽

Consistent Estimation ◽

Specification Tests ◽

Weak Instruments ◽

Many Instruments

In recent decades, econometric tools for handling instrumental-variable regressions characterized by many instruments have been developed. We introduce a command, mivreg, that implements consistent estimation and testing in linear instrumental-variables regressions with many (possibly weak) instruments. mivreg covers both homoskedastic and heteroskedastic environments, estimators that are both nonrobust and robust to error nonnormality and projection matrix limit, and parameter tests and specification tests both with and without correction for existence of moments. We also run a small simulation experiment using mivreg and illustrate how mivreg works with real data.

Download Full-text