scholarly journals Linear models enable powerful differential activity analysis in massively parallel reporter assays

2017 ◽  
Author(s):  
Leslie Myint ◽  
Dimitrios G. Avramopoulos ◽  
Loyal A. Goff ◽  
Kasper D. Hansen

AbstractMassively parallel reporter assays (MPRAs) have emerged as a popular means for understanding noncoding variation in a variety of conditions. While a large number of experiments have been described in the literature, analysis typically uses ad-hoc methods. There has been little attention to comparing performance of methods across datasets.We present the mpralm method which we show is calibrated and powerful, by analyzing its performance on multiple MPRA datasets. We show that it outperforms existing statistical methods for analysis of this data type, in the first comprehensive evaluation of statistical methods on several datasets. We investigate theoretical and real-data properties of barcode summarization methods and show an unappreciated impact of summarization method for some datasets. Finally, we use our model to conduct a power analysis for this assay and show substantial improvements in power by performing up to 6 replicates per condition, whereas sequencing depth has smaller impact; we recommend to always use at least 4 replicates. Together, these results inform recommendations for differential analysis, general group comparisons, and power analysis and will help improve design and analysis of MPRA experiments. An R package is available from the Bioconductor project athttps://bioconductor.org/packages/mpra.

BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Leslie Myint ◽  
Dimitrios G. Avramopoulos ◽  
Loyal A. Goff ◽  
Kasper D. Hansen

2019 ◽  
Author(s):  
Tal Ashuach ◽  
David Sebastian Fischer ◽  
Anat Kreimer ◽  
Nadav Ahituv ◽  
Fabian Theis ◽  
...  

AbstractMassively parallel reporter assays (MPRAs) are a technique that enables testing thousands of regulatory DNA sequences and their variants in a single, quantitative experiment. Despite growing popularity, there is lack of statistical methods that account for the different sources of uncertainty inherent to these assays, thus effectively leveraging their promise. Development of such methods could help enhance our ability to identify regulatory sequences in the genome, understand their function under various setting, and ultimately gain a better understanding of how the regulatory code and its alteration lead to phenotypic consequence.Here we present MPRAnalyze: a statistical framework dedicated to analyzing MPRA count data. MPRAnalyze addresses the major questions that are posed in the context of MPRA experiments: estimating the magnitude of the effect of a regulatory sequence in a single condition setting, and comparing differential activity of regulatory sequences across multiple conditions. The framework uses a nested construction of generalized linear models to account for uncertainty in both DNA and RNA observations, controls for various sources of unwanted variation, and incorporates negative controls for robust hypothesis testing, thereby providing clear quantitative answers in complex experimental settings.We demonstrate the robustness, accuracy and applicability of MPR-Analyze on simulated data and published data sets and compare it against the existing analysis methodologies. MPRAnalyze is implemented as an R package and is publicly available through Bioconductor [1].


2020 ◽  
pp. 1-11
Author(s):  
Xiaoying Xu ◽  
Zhijian Zeng

The regional economic evaluation and analysis has guiding significance for the subsequent economic strategy formulation. Due to the influence of various factors, the volatility of some current economic evaluation models is relatively large. According to the needs of regional economic evaluation, this study uses computer technology combined with regional economic development to build an economic development evaluation model to evaluate and analyze the regional economy. Through comparative analysis, this study selects the entropy weight-TOPSIS model as the comprehensive evaluation model of regional economy, uses the entropy weight method to determine the weight of each index, and then uses the TOPSIS method to conduct comprehensive evaluation. In addition, this study designs a control experiment to analyze the performance of this study model. Moreover, this study uses the model proposed in this study to conduct regional economic evaluation in recent years, and compares it with real data, and observes the test results with statistical charts and table data. The research results show that this research model has a certain effect, which can provide analytical tools for the follow-up economic strategy research and analysis.


Author(s):  
Fiorella Pia Salvatore ◽  
Alessia Spada ◽  
Francesca Fortunato ◽  
Demetris Vrontis ◽  
Mariantonietta Fiore

The purpose of this paper is to investigate the determinants influencing the costs of cardiovascular disease in the regional health service in Italy’s Apulia region from 2014 to 2016. Data for patients with acute myocardial infarction (AMI), heart failure (HF), and atrial fibrillation (AF) were collected from the hospital discharge registry. Generalized linear models (GLM), and generalized linear mixed models (GLMM) were used to identify the role of random effects in improving the model performance. The study was based on socio-demographic variables and disease-specific variables (diagnosis-related group, hospitalization type, hospital stay, surgery, and economic burden of the hospital discharge form). Firstly, both models indicated an increase in health costs in 2016, and lower spending values for women (p < 0.001) were shown. GLMM indicates a significant increase in health expenditure with increasing age (p < 0.001). Day-hospital has the lowest cost, surgery increases the cost, and AMI is the most expensive pathology, contrary to AF (p < 0.001). Secondly, AIC and BIC assume the lowest values for the GLMM model, indicating the random effects’ relevance in improving the model performance. This study is the first that considers real data to estimate the economic burden of CVD from the regional health service’s perspective. It appears significant for its ability to provide a large set of estimates of the economic burden of CVD, providing information to managers for health management and planning.


2018 ◽  
Vol 7 (3.15) ◽  
pp. 36 ◽  
Author(s):  
Sarah Nadirah Mohd Johari ◽  
Fairuz Husna Muhamad Farid ◽  
Nur Afifah Enara Binti Nasrudin ◽  
Nur Sarah Liyana Bistamam ◽  
Nur Syamira Syamimi Muhammad Shuhaili

Predicting financial market changes is an important issue in time series analysis, receiving an increasing attention due to financial crisis. Autoregressive integrated moving average (ARIMA) model has been one of the most widely used linear models in time series forecasting but ARIMA model cannot capture nonlinear patterns easily. Generalized autoregressive conditional heteroscedasticity (GARCH) model applied understanding of volatility depending to the estimation of previous forecast error and current volatility, improving ARIMA model. Support vector machine (SVM) and artificial neural network (ANN) have been successfully applied in solving nonlinear regression estimation problems. This study proposes hybrid methodology that exploits unique strength of GARCH + SVM model, and GARCH + ANN model in forecasting stock index. Real data sets of stock prices FTSE Bursa Malaysia KLCI were used to examine the forecasting accuracy of the proposed model. The results shows that the proposed hybrid model achieves best forecasting compared to other model.  


2016 ◽  
Vol 2016 ◽  
pp. 1-8 ◽  
Author(s):  
Lorentz Jäntschi ◽  
Donatella Bálint ◽  
Sorana D. Bolboacă

Multiple linear regression analysis is widely used to link an outcome with predictors for better understanding of the behaviour of the outcome of interest. Usually, under the assumption that the errors follow a normal distribution, the coefficients of the model are estimated by minimizing the sum of squared deviations. A new approach based on maximum likelihood estimation is proposed for finding the coefficients on linear models with two predictors without any constrictive assumptions on the distribution of the errors. The algorithm was developed, implemented, and tested as proof-of-concept using fourteen sets of compounds by investigating the link between activity/property (as outcome) and structural feature information incorporated by molecular descriptors (as predictors). The results on real data demonstrated that in all investigated cases the power of the error is significantly different by the convenient value of two when the Gauss-Laplace distribution was used to relax the constrictive assumption of the normal distribution of the error. Therefore, the Gauss-Laplace distribution of the error could not be rejected while the hypothesis that the power of the error from Gauss-Laplace distribution is normal distributed also failed to be rejected.


Psych ◽  
2020 ◽  
Vol 2 (4) ◽  
pp. 198-208
Author(s):  
Clemens Draxler ◽  
Stephan Dahm

This paper treats a so called pseudo exact or conditional approach of testing assumptions of a psychometric model known as the Rasch model. Draxler and Zessin derived the power function of such tests. They provide an alternative to asymptotic or large sample theory, i.e., chi square tests, since they are also valid in small sample scenarios. This paper suggests an extension and applies it in a research context of investigating the effects of response times. In particular, the interest lies in the examination of the influence of response times on the unidimensionality assumption of the model. A real data example is provided which illustrates its application, including a power analysis of the test, and points to possible drawbacks.


PLoS ONE ◽  
2019 ◽  
Vol 14 (6) ◽  
pp. e0218073 ◽  
Author(s):  
Rajiv Movva ◽  
Peyton Greenside ◽  
Georgi K. Marinov ◽  
Surag Nair ◽  
Avanti Shrikumar ◽  
...  

2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Jiafeng Wu ◽  
Huiming Deng ◽  
Qianghua Chen ◽  
Qiang Wu ◽  
Xiaolong Li ◽  
...  

This study is aimed at identifying potential molecular mechanisms and candidate biomarkers in the left atrial regions for the diagnosis and treatment of valvular atrial fibrillation (VAF). Multibioinformatics methods, including linear models for microarray analysis (LIMMA), an SVA algorithm, CIBERSORT immune infiltration, and DNA methylation analysis, were employed. In addition, the protein-protein interaction (PPI) network, Gene Ontology (GO), and molecular pathways of differentially expressed genes (DEGs) or differential methylation regions were constructed. In all, compared with the normal rhythm group, 243 different mRNAs (29 downregulated and 214 upregulated) and 26 different lncRNAs (3 downregulated and 23 upregulated) were detected in the left atrium (LA) of atrial fibrillation (AF) patients, and the neutrophil and CD8+ T cell were infiltrated. Additionally, 199 different methylation sites (107 downregulated and 92 upregulated) were also identified based on DNA methylation analysis. After integration, ELOVL2, CCR2, and WEE1 were detected for differentially methylated and differentially transcribed genes. Among them, WEE1 was also a core gene identified by the competing endogenous RNA (ceRNA) network that included WEE1-KRBOX1-AS1-hsa-miR-17-5p, in VAF left atrial tissue. We combined the DNA methylation and transcriptional expression differential analysis and found that WEE1 (cg13365543) may well be a candidate gene regulated by DNA methylation modification. Moreover, KRBOX1-AS1 and WEE1 can compete endogenously and may mediate myocardial tissue infiltration into CD8+ T cells and participate in the AF process.


2020 ◽  
Vol 36 (11) ◽  
pp. 3563-3565
Author(s):  
Li Chen

Abstract Summary Power analysis is essential to decide the sample size of metagenomic sequencing experiments in a case–control study for identifying differentially abundant (DA) microbes. However, the complexity of microbial data characteristics, such as excessive zeros, over-dispersion, compositionality, intrinsically microbial correlations and variable sequencing depths, makes the power analysis particularly challenging because the analytical form is usually unavailable. Here, we develop a simulation-based power assessment strategy and R package powmic, which considers the complexity of microbial data characteristics. A real data example demonstrates the usage of powmic. Availability and implementation powmic R package and online tutorial are available at https://github.com/lichen-lab/powmic. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document