scholarly journals Identification of a nonseparable model under endogeneity using binary proxies for unobserved heterogeneity

10.3982/qe674 ◽  
2019 ◽  
Vol 10 (2) ◽  
pp. 527-563 ◽  
Author(s):  
Benjamin Williams

In this paper, I study identification of a nonseparable model with endogeneity arising due to unobserved heterogeneity. Identification relies on the availability of binary proxies that can be used to control for the unobserved heterogeneity. I show that the model is identified in the limit as the number of proxies increases. The argument does not require an instrumental variable that is excluded from the outcome equation nor does it require the support of the unobserved heterogeneity to be finite. I then propose a nonparametric estimator that is consistent as the number of proxies increases with the sample size. I also show that, for a fixed number of proxies, nontrivial bounds on objects of interest can be obtained. Finally, I study two real data applications that illustrate computation of the bounds and estimation with a large number of items.

Author(s):  
Alice R. Carter ◽  
Eleanor Sanderson ◽  
Gemma Hammerton ◽  
Rebecca C. Richmond ◽  
George Davey Smith ◽  
...  

AbstractMediation analysis seeks to explain the pathway(s) through which an exposure affects an outcome. Traditional, non-instrumental variable methods for mediation analysis experience a number of methodological difficulties, including bias due to confounding between an exposure, mediator and outcome and measurement error. Mendelian randomisation (MR) can be used to improve causal inference for mediation analysis. We describe two approaches that can be used for estimating mediation analysis with MR: multivariable MR (MVMR) and two-step MR. We outline the approaches and provide code to demonstrate how they can be used in mediation analysis. We review issues that can affect analyses, including confounding, measurement error, weak instrument bias, interactions between exposures and mediators and analysis of multiple mediators. Description of the methods is supplemented by simulated and real data examples. Although MR relies on large sample sizes and strong assumptions, such as having strong instruments and no horizontally pleiotropic pathways, our simulations demonstrate that these methods are unaffected by confounders of the exposure or mediator and the outcome and non-differential measurement error of the exposure or mediator. Both MVMR and two-step MR can be implemented in both individual-level MR and summary data MR. MR mediation methods require different assumptions to be made, compared with non-instrumental variable mediation methods. Where these assumptions are more plausible, MR can be used to improve causal inference in mediation analysis.


2015 ◽  
Vol 2015 ◽  
pp. 1-5 ◽  
Author(s):  
Yuxiang Tan ◽  
Yann Tambouret ◽  
Stefano Monti

The performance evaluation of fusion detection algorithms from high-throughput sequencing data crucially relies on the availability of data with known positive and negative cases of gene rearrangements. The use of simulated data circumvents some shortcomings of real data by generation of an unlimited number of true and false positive events, and the consequent robust estimation of accuracy measures, such as precision and recall. Although a few simulated fusion datasets from RNA Sequencing (RNA-Seq) are available, they are of limited sample size. This makes it difficult to systematically evaluate the performance of RNA-Seq based fusion-detection algorithms. Here, we present SimFuse to address this problem. SimFuse utilizes real sequencing data as the fusions’ background to closely approximate the distribution of reads from a real sequencing library and uses a reference genome as the template from which to simulate fusions’ supporting reads. To assess the supporting read-specific performance, SimFuse generates multiple datasets with various numbers of fusion supporting reads. Compared to an extant simulated dataset, SimFuse gives users control over the supporting read features and the sample size of the simulated library, based on which the performance metrics needed for the validation and comparison of alternative fusion-detection algorithms can be rigorously estimated.


2012 ◽  
Vol 2012 ◽  
pp. 1-15 ◽  
Author(s):  
Adam Lane ◽  
Nancy Flournoy

In adaptive optimal procedures, the design at each stage is an estimate of the optimal design based on all previous data. Asymptotics for regular models with fixed number of stages are straightforward if one assumes the sample size of each stage goes to infinity with the overall sample size. However, it is not uncommon for a small pilot study of fixed size to be followed by a much larger experiment. We study the large sample behavior of such studies. For simplicity, we assume a nonlinear regression model with normal errors. We show that the distribution of the maximum likelihood estimates converges to a scale mixture family of normal random variables. Then, for a one parameter exponential mean function we derive the asymptotic distribution of the maximum likelihood estimate explicitly and present a simulation to compare the characteristics of this asymptotic distribution with some commonly used alternatives.


2019 ◽  
Author(s):  
Lara Nonell ◽  
Juan R González

AbstractDNA methylation plays an important role in the development and progression of disease. Beta-values are the standard methylation measures. Different statistical methods have been proposed to assess differences in methylation between conditions. However, most of them do not completely account for the distribution of beta-values. The simplex distribution can accommodate beta-values data. We hypothesize that simplex is a quite flexible distribution which is able to model methylation data.To test our hypothesis, we conducted several analyses using four real data sets obtained from microarrays and sequencing technologies. Standard data distributions were studied and modelled in comparison to the simplex. Besides, some simulations were conducted in different scenarios encompassing several distribution assumptions, regression models and sample sizes. Finally, we compared DNA methylation between females and males in order to benchmark the assessed methodologies under different scenarios.According to the results obtained by the simulations and real data analyses, DNA methylation data are concordant with the simplex distribution in many situations. Simplex regression models work well in small sample size data sets. However, when sample size increases, other models such as the beta regression or even the linear regression can be employed to assess group comparisons and obtain unbiased results. Based on these results, we can provide some practical recommendations when analyzing methylation data: 1) use data sets of at least 10 samples per studied condition for microarray data sets or 30 in NGS data sets, 2) apply a simplex or beta regression model for microarray data, 3) apply a linear model in any other case.


2020 ◽  
pp. 096228022096179
Author(s):  
Wei Liu ◽  
Frank Bretz ◽  
Mario Cortina-Borja

Reference ranges, which are data-based intervals aiming to contain a pre-specified large proportion of the population values, are powerful tools to analyse observations in clinical laboratories. Their main point is to classify any future observations from the population which fall outside them as atypical and thus may warrant further investigation. As a reference range is constructed from a random sample from the population, the event ‘a reference range contains [Formula: see text] of the population’ is also random. Hence, all we can hope for is that such event has a large occurrence probability. In this paper we argue that some intervals, including the P prediction interval, are not suitable as reference ranges since there is a substantial probability that these intervals contain less than [Formula: see text] of the population, especially when the sample size is large. In contrast, a [Formula: see text] tolerance interval is designed to contain [Formula: see text] of the population with a pre-specified large confidence γ so it is eminently adequate as a reference range. An example based on real data illustrates the paper’s key points.


2017 ◽  
Vol 6 (1-2) ◽  
pp. 169
Author(s):  
A. H. Abd Ellah

We consider the problem of predictive interval for the range of the future observations from an exponential distribution. Two cases are considered, (1) Fixed sample size (FSS). (2) Random sample size (RSS). Further, I derive the predictive function for both FSS and RSS in closely forms. Random sample size is appeared in many application of life testing. Fixed sample size is a special case from the case of random sample size. Illustrative examples are given. Factors of the predictive distribution are given. A comparison in savings is made with the above method. To show the applications of our results, we present some simulation experiments. Finally, we apply our results to some real data sets in life testing.


2015 ◽  
Vol 20 (2) ◽  
pp. 122-127 ◽  
Author(s):  
M.S. Panwar ◽  
Bapat Akanshya Sudhir ◽  
Rashmi Bundel ◽  
Sanjeev K. Tomer

This paper tries to derive maximum likelihood estimators (MLEs) for the parameters of the inverse Rayleigh distribution (IRD) when the observed data is masked. MLEs, asymptotic confidence intervals (ACIs) and boot-p confidence intervals (boot-p CIs) for the lifetime parameters have been discussed. The simulation illustrations provided that as the sample size increases the estimated value approaches to the true value, and the mean square error decreases with the increase in sample size, and mean square error increases with increase in level of masking, the ACIs are always symmetric and the boot-p CIs approaches to symmetry as the sample size increases whereas the mean life time due to the local spread of the disease is less than that due to the metastasis spread in case of real data set.Journal of Institute of Science and Technology, 2015, 20(2): 122-127


Author(s):  
Stanislav Anatolyev ◽  
Alena Skolkova

In recent decades, econometric tools for handling instrumental-variable regressions characterized by many instruments have been developed. We introduce a command, mivreg, that implements consistent estimation and testing in linear instrumental-variables regressions with many (possibly weak) instruments. mivreg covers both homoskedastic and heteroskedastic environments, estimators that are both nonrobust and robust to error nonnormality and projection matrix limit, and parameter tests and specification tests both with and without correction for existence of moments. We also run a small simulation experiment using mivreg and illustrate how mivreg works with real data.


Sign in / Sign up

Export Citation Format

Share Document