Linear models enable powerful differential activity analysis in massively parallel reporter assays

Mapping Intimacies ◽

10.1101/196394 ◽

2017 ◽

Cited By ~ 3

Author(s):

Leslie Myint ◽

Dimitrios G. Avramopoulos ◽

Loyal A. Goff ◽

Kasper D. Hansen

Keyword(s):

Statistical Methods ◽

Power Analysis ◽

Linear Models ◽

Comprehensive Evaluation ◽

Real Data ◽

Massively Parallel ◽

Differential Analysis ◽

General Group ◽

Improve Design ◽

Reporter Assays

AbstractMassively parallel reporter assays (MPRAs) have emerged as a popular means for understanding noncoding variation in a variety of conditions. While a large number of experiments have been described in the literature, analysis typically uses ad-hoc methods. There has been little attention to comparing performance of methods across datasets.We present the mpralm method which we show is calibrated and powerful, by analyzing its performance on multiple MPRA datasets. We show that it outperforms existing statistical methods for analysis of this data type, in the first comprehensive evaluation of statistical methods on several datasets. We investigate theoretical and real-data properties of barcode summarization methods and show an unappreciated impact of summarization method for some datasets. Finally, we use our model to conduct a power analysis for this assay and show substantial improvements in power by performing up to 6 replicates per condition, whereas sequencing depth has smaller impact; we recommend to always use at least 4 replicates. Together, these results inform recommendations for differential analysis, general group comparisons, and power analysis and will help improve design and analysis of MPRA experiments. An R package is available from the Bioconductor project athttps://bioconductor.org/packages/mpra.

Download Full-text

Linear models enable powerful differential activity analysis in massively parallel reporter assays

BMC Genomics ◽

10.1186/s12864-019-5556-x ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 57

Author(s):

Leslie Myint ◽

Dimitrios G. Avramopoulos ◽

Loyal A. Goff ◽

Kasper D. Hansen

Keyword(s):

Linear Models ◽

Activity Analysis ◽

Massively Parallel ◽

Reporter Assays

Download Full-text

MPRAnalyze - A statistical framework for Massively Parallel Reporter Assays

10.1101/527887 ◽

2019 ◽

Author(s):

Tal Ashuach ◽

David Sebastian Fischer ◽

Anat Kreimer ◽

Nadav Ahituv ◽

Fabian Theis ◽

...

Keyword(s):

Dna Sequences ◽

Linear Models ◽

Simulated Data ◽

R Package ◽

Massively Parallel ◽

Published Data ◽

Regulatory Sequence ◽

Regulatory Sequences ◽

Statistical Framework ◽

Reporter Assays

AbstractMassively parallel reporter assays (MPRAs) are a technique that enables testing thousands of regulatory DNA sequences and their variants in a single, quantitative experiment. Despite growing popularity, there is lack of statistical methods that account for the different sources of uncertainty inherent to these assays, thus effectively leveraging their promise. Development of such methods could help enhance our ability to identify regulatory sequences in the genome, understand their function under various setting, and ultimately gain a better understanding of how the regulatory code and its alteration lead to phenotypic consequence.Here we present MPRAnalyze: a statistical framework dedicated to analyzing MPRA count data. MPRAnalyze addresses the major questions that are posed in the context of MPRA experiments: estimating the magnitude of the effect of a regulatory sequence in a single condition setting, and comparing differential activity of regulatory sequences across multiple conditions. The framework uses a nested construction of generalized linear models to account for uncertainty in both DNA and RNA observations, controls for various sources of unwanted variation, and incorporates negative controls for robust hypothesis testing, thereby providing clear quantitative answers in complex experimental settings.We demonstrate the robustness, accuracy and applicability of MPR-Analyze on simulated data and published data sets and compare it against the existing analysis methodologies. MPRAnalyze is implemented as an R package and is publicly available through Bioconductor [1].

Download Full-text

Analysis of regional economic evaluation based on machine learning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189575 ◽

2020 ◽

pp. 1-11

Author(s):

Xiaoying Xu ◽

Zhijian Zeng

Keyword(s):

Economic Development ◽

Economic Evaluation ◽

Regional Economy ◽

Comprehensive Evaluation ◽

Evaluation Model ◽

Real Data ◽

Entropy Weight ◽

Regional Economic ◽

Economic Strategy ◽

Table Data

The regional economic evaluation and analysis has guiding significance for the subsequent economic strategy formulation. Due to the influence of various factors, the volatility of some current economic evaluation models is relatively large. According to the needs of regional economic evaluation, this study uses computer technology combined with regional economic development to build an economic development evaluation model to evaluate and analyze the regional economy. Through comparative analysis, this study selects the entropy weight-TOPSIS model as the comprehensive evaluation model of regional economy, uses the entropy weight method to determine the weight of each index, and then uses the TOPSIS method to conduct comprehensive evaluation. In addition, this study designs a control experiment to analyze the performance of this study model. Moreover, this study uses the model proposed in this study to conduct regional economic evaluation in recent years, and compares it with real data, and observes the test results with statistical charts and table data. The research results show that this research model has a certain effect, which can provide analytical tools for the follow-up economic strategy research and analysis.

Download Full-text

Identification of Health Expenditures Determinants: A Model to Manage the Economic Burden of Cardiovascular Disease

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18094652 ◽

2021 ◽

Vol 18 (9) ◽

pp. 4652

Author(s):

Fiorella Pia Salvatore ◽

Alessia Spada ◽

Francesca Fortunato ◽

Demetris Vrontis ◽

Mariantonietta Fiore

Keyword(s):

Cardiovascular Disease ◽

Hospital Discharge ◽

Random Effects ◽

Economic Burden ◽

Linear Models ◽

Health Management ◽

Model Performance ◽

Real Data ◽

Large Set ◽

Regional Health

The purpose of this paper is to investigate the determinants influencing the costs of cardiovascular disease in the regional health service in Italy’s Apulia region from 2014 to 2016. Data for patients with acute myocardial infarction (AMI), heart failure (HF), and atrial fibrillation (AF) were collected from the hospital discharge registry. Generalized linear models (GLM), and generalized linear mixed models (GLMM) were used to identify the role of random effects in improving the model performance. The study was based on socio-demographic variables and disease-specific variables (diagnosis-related group, hospitalization type, hospital stay, surgery, and economic burden of the hospital discharge form). Firstly, both models indicated an increase in health costs in 2016, and lower spending values for women (p < 0.001) were shown. GLMM indicates a significant increase in health expenditure with increasing age (p < 0.001). Day-hospital has the lowest cost, surgery increases the cost, and AMI is the most expensive pathology, contrary to AF (p < 0.001). Secondly, AIC and BIC assume the lowest values for the GLMM model, indicating the random effects’ relevance in improving the model performance. This study is the first that considers real data to estimate the economic burden of CVD from the regional health service’s perspective. It appears significant for its ability to provide a large set of estimates of the economic burden of CVD, providing information to managers for health management and planning.

Download Full-text

Predicting Stock Market Index Using Hybrid Intelligence Model

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.15.17403 ◽

2018 ◽

Vol 7 (3.15) ◽

pp. 36 ◽

Cited By ~ 1

Author(s):

Sarah Nadirah Mohd Johari ◽

Fairuz Husna Muhamad Farid ◽

Nur Afifah Enara Binti Nasrudin ◽

Nur Sarah Liyana Bistamam ◽

Nur Syamira Syamimi Muhammad Shuhaili

Keyword(s):

Time Series ◽

Stock Prices ◽

Linear Models ◽

Forecast Error ◽

Arima Model ◽

Real Data ◽

Stock Index ◽

Support Vector ◽

Ann Model ◽

Stock Market Index

Predicting financial market changes is an important issue in time series analysis, receiving an increasing attention due to financial crisis. Autoregressive integrated moving average (ARIMA) model has been one of the most widely used linear models in time series forecasting but ARIMA model cannot capture nonlinear patterns easily. Generalized autoregressive conditional heteroscedasticity (GARCH) model applied understanding of volatility depending to the estimation of previous forecast error and current volatility, improving ARIMA model. Support vector machine (SVM) and artificial neural network (ANN) have been successfully applied in solving nonlinear regression estimation problems. This study proposes hybrid methodology that exploits unique strength of GARCH + SVM model, and GARCH + ANN model in forecasting stock index. Real data sets of stock prices FTSE Bursa Malaysia KLCI were used to examine the forecasting accuracy of the proposed model. The results shows that the proposed hybrid model achieves best forecasting compared to other model.

Download Full-text

Multiple Linear Regressions by Maximizing the Likelihood under Assumption of Generalized Gauss-Laplace Distribution of the Error

Computational and Mathematical Methods in Medicine ◽

10.1155/2016/8578156 ◽

2016 ◽

Vol 2016 ◽

pp. 1-8 ◽

Cited By ~ 9

Author(s):

Lorentz Jäntschi ◽

Donatella Bálint ◽

Sorana D. Bolboacă

Keyword(s):

Normal Distribution ◽

Linear Models ◽

Linear Regression Analysis ◽

Likelihood Estimation ◽

Real Data ◽

Multiple Linear Regression Analysis ◽

Laplace Distribution ◽

New Approach ◽

Feature Information ◽

Linear Regressions

Multiple linear regression analysis is widely used to link an outcome with predictors for better understanding of the behaviour of the outcome of interest. Usually, under the assumption that the errors follow a normal distribution, the coefficients of the model are estimated by minimizing the sum of squared deviations. A new approach based on maximum likelihood estimation is proposed for finding the coefficients on linear models with two predictors without any constrictive assumptions on the distribution of the errors. The algorithm was developed, implemented, and tested as proof-of-concept using fourteen sets of compounds by investigating the link between activity/property (as outcome) and structural feature information incorporated by molecular descriptors (as predictors). The results on real data demonstrated that in all investigated cases the power of the error is significantly different by the convenient value of two when the Gauss-Laplace distribution was used to relax the constrictive assumption of the normal distribution of the error. Therefore, the Gauss-Laplace distribution of the error could not be rejected while the hypothesis that the power of the error from Gauss-Laplace distribution is normal distributed also failed to be rejected.

Download Full-text

Conditional or Pseudo Exact Tests with an Application in the Context of Modeling Response Times

Psych ◽

10.3390/psych2040017 ◽

2020 ◽

Vol 2 (4) ◽

pp. 198-208

Author(s):

Clemens Draxler ◽

Stephan Dahm

Keyword(s):

Power Function ◽

Rasch Model ◽

Power Analysis ◽

Response Times ◽

Real Data ◽

Small Sample ◽

Chi Square ◽

Research Context ◽

Large Sample Theory ◽

The Rasch Model

This paper treats a so called pseudo exact or conditional approach of testing assumptions of a psychometric model known as the Rasch model. Draxler and Zessin derived the power function of such tests. They provide an alternative to asymptotic or large sample theory, i.e., chi square tests, since they are also valid in small sample scenarios. This paper suggests an extension and applies it in a research context of investigating the effects of response times. In particular, the interest lies in the examination of the influence of response times on the unidimensionality assumption of the model. A real data example is provided which illustrates its application, including a power analysis of the test, and points to possible drawbacks.

Download Full-text

Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays

PLoS ONE ◽

10.1371/journal.pone.0218073 ◽

2019 ◽

Vol 14 (6) ◽

pp. e0218073 ◽

Cited By ~ 17

Author(s):

Rajiv Movva ◽

Peyton Greenside ◽

Georgi K. Marinov ◽

Surag Nair ◽

Avanti Shrikumar ◽

...

Keyword(s):

Neural Network ◽

Dna Sequences ◽

Genetic Variants ◽

Network Models ◽

Massively Parallel ◽

Neural Network Models ◽

Regulatory Dna Sequences ◽

Reporter Assays ◽

Regulatory Dna

Download Full-text

Comprehensive Analysis of Differential Immunocyte Infiltration and Potential ceRNA Networks Involved in the Development of Atrial Fibrillation

BioMed Research International ◽

10.1155/2020/8021208 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Jiafeng Wu ◽

Huiming Deng ◽

Qianghua Chen ◽

Qiang Wu ◽

Xiaolong Li ◽

...

Keyword(s):

Atrial Fibrillation ◽

Dna Methylation ◽

Left Atrial ◽

Molecular Mechanisms ◽

Linear Models ◽

Core Gene ◽

Differential Analysis ◽

Methylation Analysis ◽

Protein Protein Interaction ◽

Dna Methylation Analysis

This study is aimed at identifying potential molecular mechanisms and candidate biomarkers in the left atrial regions for the diagnosis and treatment of valvular atrial fibrillation (VAF). Multibioinformatics methods, including linear models for microarray analysis (LIMMA), an SVA algorithm, CIBERSORT immune infiltration, and DNA methylation analysis, were employed. In addition, the protein-protein interaction (PPI) network, Gene Ontology (GO), and molecular pathways of differentially expressed genes (DEGs) or differential methylation regions were constructed. In all, compared with the normal rhythm group, 243 different mRNAs (29 downregulated and 214 upregulated) and 26 different lncRNAs (3 downregulated and 23 upregulated) were detected in the left atrium (LA) of atrial fibrillation (AF) patients, and the neutrophil and CD8+ T cell were infiltrated. Additionally, 199 different methylation sites (107 downregulated and 92 upregulated) were also identified based on DNA methylation analysis. After integration, ELOVL2, CCR2, and WEE1 were detected for differentially methylated and differentially transcribed genes. Among them, WEE1 was also a core gene identified by the competing endogenous RNA (ceRNA) network that included WEE1-KRBOX1-AS1-hsa-miR-17-5p, in VAF left atrial tissue. We combined the DNA methylation and transcriptional expression differential analysis and found that WEE1 (cg13365543) may well be a candidate gene regulated by DNA methylation modification. Moreover, KRBOX1-AS1 and WEE1 can compete endogenously and may mediate myocardial tissue infiltration into CD8+ T cells and participate in the AF process.

Download Full-text

powmic: an R package for power assessment in microbiome case–control studies

Bioinformatics ◽

10.1093/bioinformatics/btaa197 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3563-3565

Author(s):

Li Chen

Keyword(s):

Power Analysis ◽

Real Data ◽

Analytical Form ◽

R Package ◽

Case Control ◽

Supplementary Information ◽

Metagenomic Sequencing ◽

Case Control Studies ◽

Simulation Based ◽

Over Dispersion

Abstract Summary Power analysis is essential to decide the sample size of metagenomic sequencing experiments in a case–control study for identifying differentially abundant (DA) microbes. However, the complexity of microbial data characteristics, such as excessive zeros, over-dispersion, compositionality, intrinsically microbial correlations and variable sequencing depths, makes the power analysis particularly challenging because the analytical form is usually unavailable. Here, we develop a simulation-based power assessment strategy and R package powmic, which considers the complexity of microbial data characteristics. A real data example demonstrates the usage of powmic. Availability and implementation powmic R package and online tutorial are available at https://github.com/lichen-lab/powmic. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text