Identification and Estimation of Causal Effects Using a Negative-Control Exposure in Time-Series Studies With Applications to Environmental Epidemiology

American Journal of Epidemiology ◽

10.1093/aje/kwaa172 ◽

2020 ◽

Cited By ~ 1

Author(s):

Yuanyuan Yu ◽

Hongkai Li ◽

Xiaoru Sun ◽

Xinhui Liu ◽

Fan Yang ◽

...

Keyword(s):

Time Series ◽

Causal Effect ◽

Environmental Epidemiology ◽

R Package ◽

Negative Control ◽

Environmental Data ◽

Causal Effects ◽

Unbiased Estimation ◽

Data Sets ◽

Time Series Studies

Abstract The initial aim of environmental epidemiology is to estimate the causal effects of environmental exposures on health outcomes. However, due to lack of enough covariates in most environmental data sets, current methods without enough adjustments for confounders inevitably lead to residual confounding. We propose a negative-control exposure based on a time-series studies (NCE-TS) model to effectively eliminate unobserved confounders using an after-outcome exposure as a negative-control exposure. We show that the causal effect is identifiable and can be estimated by the NCE-TS for continuous and categorical outcomes. Simulation studies indicate unbiased estimation by the NCE-TS model. The potential of NCE-TS is illustrated by 2 challenging applications: We found that living in areas with higher levels of surrounding greenness over 6 months was associated with less risk of stroke-specific mortality, based on the Shandong Ecological Health Cohort during January 1, 2010, to December 31, 2018. In addition, we found that the widely established negative association between temperature and cancer risks was actually caused by numbers of unobserved confounders, according to the Global Open Database from 2003–2012. The proposed NCE-TS model is implemented in an R package (R Foundation for Statistical Computing, Vienna, Austria) called NCETS, freely available on GitHub.

Download Full-text

A Negative Control Outcome Regression Accounting for Unobserved Confounding and Lagged Causal Effects

10.21203/rs.3.rs-696980/v1 ◽

2021 ◽

Author(s):

Hongkai Li ◽

Yuanyuan Yu ◽

Lei Hou ◽

Xiaoru Sun ◽

Xinhui Liu ◽

...

Keyword(s):

Causal Effect ◽

R Package ◽

Negative Control ◽

Effect Estimate ◽

Causal Association ◽

Difference In Differences ◽

Post Exposure ◽

All Cause Mortality ◽

Effect Estimation ◽

Negative Controls

Abstract Background: Epidemiologists are increasingly interested in using negative controls to eliminate unobserved confounding. Particularly, difference-in-differences method, which uses pre-exposure outcomes as negative control outcomes, is widely used. However, it obtains biased estimations when pre-exposure outcome has lagged causal effect on post-exposure outcome.Methods: Taking advantage of pre-exposure outcomes as negative control outcomes, Negative Control Outcome Regression (NCOR) is proposed to eliminate unobserved confounding. The intercept term of NCOR provides an unbiased causal effect estimate of exposure on post-exposure outcome, and the slope minus 1 denotes the lagged causal effect estimation of pre-exposure outcome on post-exposure outcome. We then illustrate the potential of NCOR in a challenging application to estimate the causal association of PM₂.₅ on all-cause mortality rates (AMR) and lagged causal effect of pre AMR on post AMR.Results: Both theoretical justifications and simulation studies validate that the causal effect of exposure on outcome, along with the lagged causal effect of outcomes are identifiable and can be estimated by proposed NCOR model. The application results demonstrate that the previously estimated association between PM₂.₅ and AMR can be attributed to the unobserved confounding. Furthermore, the NCOR model reveal that pre AMR has no causal association with post AMR.Conclusion: The proposed NCOR model can obtain unbiased and robust causal effect estimation of exposure on outcome, and the lagged causal effect of outcomes. The proposed NCOR is implemented as an R package, called NCOR, and is freely available on GitHub.

Download Full-text

Variable selection and estimation in causal inference using Bayesian spike and slab priors

Statistical Methods in Medical Research ◽

10.1177/0962280219898497 ◽

2020 ◽

Vol 29 (9) ◽

pp. 2445-2469

Author(s):

Brandon Koch ◽

David M Vock ◽

Julian Wolfson ◽

Laura Boehm Vock

Keyword(s):

Variable Selection ◽

Treatment Effect ◽

Mean Squared Error ◽

Causal Effect ◽

Predictive Ability ◽

Causal Effects ◽

Unbiased Estimation ◽

Model Parameters ◽

Squared Error ◽

Bayesian Formulation

Unbiased estimation of causal effects with observational data requires adjustment for confounding variables that are related to both the outcome and treatment assignment. Standard variable selection techniques aim to maximize predictive ability of the outcome model, but they ignore covariate associations with treatment and may not adjust for important confounders weakly associated to outcome. We propose a novel method for estimating causal effects that simultaneously considers models for both outcome and treatment, which we call the bilevel spike and slab causal estimator (BSSCE). By using a Bayesian formulation, BSSCE estimates the posterior distribution of all model parameters and provides straightforward and reliable inference. Spike and slab priors are used on each covariate coefficient which aim to minimize the mean squared error of the treatment effect estimator. Theoretical properties of the treatment effect estimator are derived justifying the prior used in BSSCE. Simulations show that BSSCE can substantially reduce mean squared error over numerous methods and performs especially well with large numbers of covariates, including situations where the number of covariates is greater than the sample size. We illustrate BSSCE by estimating the causal effect of vasoactive therapy vs. fluid resuscitation on hypotensive episode length for patients in the Multiparameter Intelligent Monitoring in Intensive Care III critical care database.

Download Full-text

Interpretation of the Chemical and Physical Time-Series Retrieved from Sentik Glacier, Ladakh Himalaya, India

Journal of Glaciology ◽

10.3189/s0022143000008509 ◽

1984 ◽

Vol 30 (104) ◽

pp. 66-76 ◽

Cited By ~ 2

Author(s):

Paul A. Mayewski ◽

W. Berry Lyons ◽

N. Ahmad ◽

Gordon Smith ◽

M. Pourchet

Keyword(s):

Time Series ◽

Chemical Species ◽

Data Sets ◽

Reactive Iron ◽

Physical Time ◽

Ladakh Himalaya ◽

Data Density ◽

Mass Circulation ◽

The Himalaya ◽

Analysis Of Time Series

AbstractSpectral analysis of time series of a c. 17 ± 0.3 year core, calibrated for total ß activity recovered from Sentik Glacier (4908m) Ladakh, Himalaya, yields several recognizable periodicities including subannual, annual, and multi-annual. The time-series, include both chemical data (chloride, sodium, reactive iron, reactive silicate, reactive phosphate, ammonium, δD, δ(18O) and pH) and physical data (density, debris and ice-band locations, and microparticles in size grades 0.50 to 12.70 μm). Source areas for chemical species investigated and general air-mass circulation defined from chemical and physical time-series are discussed to demonstrate the potential of such studies in the development of paleometeorological data sets from remote high-alpine glacierized sites such as the Himalaya.

Download Full-text

Understanding the nature of association between anxiety phenotypes and anorexia nervosa: a triangulation approach

BMC Psychiatry ◽

10.1186/s12888-020-02883-8 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

E. Caitlin Lloyd ◽

Hannah M. Sallis ◽

Bas Verplanken ◽

Anne M. Haase ◽

Marcus R. Munafò

Keyword(s):

Anorexia Nervosa ◽

Longitudinal Study ◽

Anxiety Disorder ◽

Anxiety Disorders ◽

Causal Effect ◽

Causal Effects ◽

Observational Research ◽

Genome Wide Association Studies ◽

Genetic Liability ◽

Depressed Affect

Abstract Background Evidence from observational studies suggests an association between anxiety disorders and anorexia nervosa (AN), but causal inference is complicated by the potential for confounding in these studies. We triangulate evidence across a longitudinal study and a Mendelian randomization (MR) study, to evaluate whether there is support for anxiety disorder phenotypes exerting a causal effect on AN risk. Methods Study One assessed longitudinal associations of childhood worry and anxiety disorders with lifetime AN in the Avon Longitudinal Study of Parents and Children cohort. Study Two used two-sample MR to evaluate: causal effects of worry, and genetic liability to anxiety disorders, on AN risk; causal effects of genetic liability to AN on anxiety outcomes; and the causal influence of worry on anxiety disorder development. The independence of effects of worry, relative to depressed affect, on AN and anxiety disorder outcomes, was explored using multivariable MR. Analyses were completed using summary statistics from recent genome-wide association studies. Results Study One did not support an association between worry and subsequent AN, but there was strong evidence for anxiety disorders predicting increased risk of AN. Study Two outcomes supported worry causally increasing AN risk, but did not support a causal effect of anxiety disorders on AN development, or of AN on anxiety disorders/worry. Findings also indicated that worry causally influences anxiety disorder development. Multivariable analysis estimates suggested the influence of worry on both AN and anxiety disorders was independent of depressed affect. Conclusions Overall our results provide mixed evidence regarding the causal role of anxiety exposures in AN aetiology. The inconsistency between outcomes of Studies One and Two may be explained by limitations surrounding worry assessment in Study One, confounding of the anxiety disorder and AN association in observational research, and low power in MR analyses probing causal effects of genetic liability to anxiety disorders. The evidence for worry acting as a causal risk factor for anxiety disorders and AN supports targeting worry for prevention of both outcomes. Further research should clarify how a tendency to worry translates into AN risk, and whether anxiety disorder pathology exerts any causal effect on AN.

Download Full-text

Analogy-Based Crop Yield Forecasts Based on Temporal Similarity of Leaf Area Index

Remote Sensing ◽

10.3390/rs13163069 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3069

Author(s):

Yadong Liu ◽

Junhwan Kim ◽

David H. Fleisher ◽

Kwang Soo Kim

Keyword(s):

Time Series ◽

Leaf Area Index ◽

Leaf Area ◽

Crop Yield ◽

Satellite Data ◽

Growing Season ◽

Environmental Data ◽

Area Index ◽

Current Season ◽

Wide Range

Seasonal forecasts of crop yield are important components for agricultural policy decisions and farmer planning. A wide range of input data are often needed to forecast crop yield in a region where sophisticated approaches such as machine learning and process-based models are used. This requires considerable effort for data preparation in addition to identifying data sources. Here, we propose a simpler approach called the Analogy Based Crop-yield (ABC) forecast scheme to make timely and accurate prediction of regional crop yield using a minimum set of inputs. In the ABC method, a growing season from a prior long-term period, e.g., 10 years, is first identified as analogous to the current season by the use of a similarity index based on the time series leaf area index (LAI) patterns. Crop yield in the given growing season is then forecasted using the weighted yield average reported in the analogous seasons for the area of interest. The ABC approach was used to predict corn and soybean yields in the Midwestern U.S. at the county level for the period of 2017–2019. The MOD15A2H, which is a satellite data product for LAI, was used to compile inputs. The mean absolute percentage error (MAPE) of crop yield forecasts was <10% for corn and soybean in each growing season when the time series of LAI from the day of year 89 to 209 was used as inputs to the ABC approach. The prediction error for the ABC approach was comparable to results from a deep neural network model that relied on soil and weather data as well as satellite data in a previous study. These results indicate that the ABC approach allowed for crop yield forecast with a lead-time of at least two months before harvest. In particular, the ABC scheme would be useful for regions where crop yield forecasts are limited by availability of reliable environmental data.

Download Full-text

SambaR: An R package for fast, easy and reproducible population‐genetic analyses of biallelic SNP data sets

Molecular Ecology Resources ◽

10.1111/1755-0998.13339 ◽

2021 ◽

Author(s):

Menno J. Jong ◽

Joost F. Jong ◽

A. Rus Hoelzel ◽

Axel Janke

Keyword(s):

Population Genetic ◽

R Package ◽

Data Sets ◽

Genetic Analyses ◽

Snp Data ◽

Population Genetic Analyses

Download Full-text

An edge-cloud collaboration architecture for pattern anomaly detection of time series in wireless sensor networks

Complex & Intelligent Systems ◽

10.1007/s40747-021-00442-6 ◽

2021 ◽

Author(s):

Cong Gao ◽

Ping Yang ◽

Yanping Chen ◽

Zhongmin Wang ◽

Yue Wang

Keyword(s):

Time Series ◽

Wireless Sensor Networks ◽

Sensor Networks ◽

Anomaly Detection ◽

Estimation Method ◽

Feature Representation ◽

Sensor Data ◽

Wireless Sensor ◽

Data Sets ◽

Edge Node

AbstractWith large deployment of wireless sensor networks, anomaly detection for sensor data is becoming increasingly important in various fields. As a vital data form of sensor data, time series has three main types of anomaly: point anomaly, pattern anomaly, and sequence anomaly. In production environments, the analysis of pattern anomaly is the most rewarding one. However, the traditional processing model cloud computing is crippled in front of large amount of widely distributed data. This paper presents an edge-cloud collaboration architecture for pattern anomaly detection of time series. A task migration algorithm is developed to alleviate the problem of backlogged detection tasks at edge node. Besides, the detection tasks related to long-term correlation and short-term correlation in time series are allocated to cloud and edge node, respectively. A multi-dimensional feature representation scheme is devised to conduct efficient dimension reduction. Two key components of the feature representation trend identification and feature point extraction are elaborated. Based on the result of feature representation, pattern anomaly detection is performed with an improved kernel density estimation method. Finally, extensive experiments are conducted with synthetic data sets and real-world data sets.

Download Full-text

A Hypothesis Test for the Goodness-of-Fit of the Marginal Distribution of a Time Series with Application to Stablecoin Data

Engineering Proceedings ◽

10.3390/engproc2021005010 ◽

2021 ◽

Vol 5 (1) ◽

pp. 10

Author(s):

Mark Levene

Keyword(s):

Time Series ◽

Goodness Of Fit ◽

Marginal Distribution ◽

Hypothesis Test ◽

Data Sets ◽

Test Statistic ◽

Sample Test ◽

Kolmogorov Smirnov ◽

Heavy Tailed ◽

Jensen Shannon Divergence

A bootstrap-based hypothesis test of the goodness-of-fit for the marginal distribution of a time series is presented. Two metrics, the empirical survival Jensen–Shannon divergence (ESJS) and the Kolmogorov–Smirnov two-sample test statistic (KS2), are compared on four data sets—three stablecoin time series and a Bitcoin time series. We demonstrate that, after applying first-order differencing, all the data sets fit heavy-tailed α-stable distributions with 1<α<2 at the 95% confidence level. Moreover, ESJS is more powerful than KS2 on these data sets, since the widths of the derived confidence intervals for KS2 are, proportionately, much larger than those of ESJS.

Download Full-text

The Pice Effects of Competition from Parallel Imports and Therapeutic Alternatives: Using Dynamic Models to Estimate the Causal Effect on the Extensive and Intensive Margins

Review of Industrial Organization ◽

10.1007/s11151-021-09834-x ◽

2021 ◽

Author(s):

David Granlund

Keyword(s):

Dynamic Models ◽

Causal Effect ◽

The Other ◽

Causal Effects ◽

Parallel Imports ◽

Extensive And Intensive Margins ◽

Price Effects ◽

Short And Long Term ◽

Therapeutic Alternatives

AbstractThis paper studies responses to competition with the use of dynamic models that distinguish between short- and long-term price effects. The dynamic models also allow lagged numbers of competitors to become valid and strong instruments for the current numbers, which enables studying the causal effects using flexible specifications. A first parallel trader is found to decrease prices of exchangeable products by 7% in the long term. On the other hand, prices do not respond to the first competitor that sells therapeutic alternatives; but competition from four or more competitors that sell on-patent therapeutic alternatives decreases prices by about 10% in the long term.

Download Full-text

MUREN: a robust and multi-reference approach of RNA-seq transcript normalization

BMC Bioinformatics ◽

10.1186/s12859-021-04288-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yance Feng ◽

Lei M. Li

Keyword(s):

Biological Significance ◽

Housekeeping Genes ◽

R Package ◽

Data Sets ◽

Statistical Regression ◽

Rna Seq ◽

Least Trimmed Squares ◽

Standard Data ◽

Wide Range ◽

Multiple References

Abstract Background Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. Results We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. Conclusions MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples.

Download Full-text