GenoGAM: Genome-wide generalized additive models for ChIP-seq analysis

Mapping Intimacies ◽

10.1101/047464 ◽

2016 ◽

Author(s):

Georg Stricker ◽

Alexander Engelhardt ◽

Daniel Schulz ◽

Matthias Schmid ◽

Achim Tresch ◽

...

Keyword(s):

Ad Hoc ◽

Type I Error ◽

Generalized Additive Models ◽

Additive Model ◽

Additive Models ◽

Supplementary Information ◽

Type I ◽

Smooth Functions ◽

Base Level ◽

Genome Wide

AbstractMotivationChromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) is a widely used approach to study protein-DNA interactions. Often, the quantities of interest are the differential occupancies relative to controls, between genetic backgrounds, treatments, or combinations thereof. Current methods for differential occupancy of ChIP-seq data rely however on binning or sliding window techniques, for which the choice of the window and bin sizes are subjective.ResultsHere, we present GenoGAM (Genome-wide Generalized Additive Model), which brings the well-established and flexible generalized additive models framework to genomic applications using a data parallelism strategy. We model ChIP-Seq read count frequencies as products of smooth functions along chromosomes. Smoothing parameters are objectively estimated from the data by cross-validation, eliminating ad-hoc binning and windowing needed by current approaches. GenoGAM provides base-level and region-level significance testing for full factorial designs. Application to a ChIP-Seq dataset in yeast showed increased sensitivity over existing differential occupancy methods while controlling for type I error rate. By analyzing a set of DNA methylation data and illustrating an extension to a peak caller, we further demonstrate the potential of GenoGAM as a generic statistical modeling tool for genome-wide assays.AvailabilitySoftware is available from Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/[email protected] informationSupplementary information is available at Bioinformatics online.

Download Full-text

Generalized additive models and inflated type I error rates of smoother significance tests

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2010.05.004 ◽

2011 ◽

Vol 55 (1) ◽

pp. 366-374 ◽

Cited By ~ 22

Author(s):

Robin L. Young ◽

Janice Weinberg ◽

Verónica Vieira ◽

Al Ozonoff ◽

Thomas F. Webster

Keyword(s):

Type I Error ◽

Generalized Additive Models ◽

Error Rates ◽

Additive Models ◽

Type I ◽

Significance Tests ◽

Type I Error Rates ◽

Inflated Type

Download Full-text

Predicting Malaria Transmission Dynamics in Dangassa, Mali: A Novel Approach Using Functional Generalized Additive Models

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17176339 ◽

2020 ◽

Vol 17 (17) ◽

pp. 6339

Author(s):

François Freddy Ateba ◽

Manuel Febrero-Bande ◽

Issaka Sagara ◽

Nafomon Sogoba ◽

Mahamoudou Touré ◽

...

Keyword(s):

Wind Speed ◽

Malaria Incidence ◽

Generalized Additive Models ◽

Community Health Center ◽

Additive Model ◽

Additive Models ◽

Functional Regression ◽

Novel Approach ◽

Malaria Incidence Rate ◽

Mean Wind Speed

Mali aims to reach the pre-elimination stage of malaria by the next decade. This study used functional regression models to predict the incidence of malaria as a function of past meteorological patterns to better prevent and to act proactively against impending malaria outbreaks. All data were collected over a five-year period (2012–2017) from 1400 persons who sought treatment at Dangassa’s community health center. Rainfall, temperature, humidity, and wind speed variables were collected. Functional Generalized Spectral Additive Model (FGSAM), Functional Generalized Linear Model (FGLM), and Functional Generalized Kernel Additive Model (FGKAM) were used to predict malaria incidence as a function of the pattern of meteorological indicators over a continuum of the 18 weeks preceding the week of interest. Their respective outcomes were compared in terms of predictive abilities. The results showed that (1) the highest malaria incidence rate occurred in the village 10 to 12 weeks after we observed a pattern of air humidity levels >65%, combined with two or more consecutive rain episodes and a mean wind speed <1.8 m/s; (2) among the three models, the FGLM obtained the best results in terms of prediction; and (3) FGSAM was shown to be a good compromise between FGLM and FGKAM in terms of flexibility and simplicity. The models showed that some meteorological conditions may provide a basis for detection of future outbreaks of malaria. The models developed in this paper are useful for implementing preventive strategies using past meteorological and past malaria incidence.

Download Full-text

Hierarchical generalized additive models: an introduction with mgcv

10.7287/peerj.preprints.27320v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Eric J Pedersen ◽

David L. Miller ◽

Gavin L. Simpson ◽

Noam Ross

Keyword(s):

Hierarchical Model ◽

Generalized Additive Model ◽

Generalized Additive Models ◽

Additive Model ◽

Additive Models ◽

Complex Structures ◽

Ecological Data ◽

Nonlinear Functional ◽

Functional Relationships ◽

Statistical Issues

In this paper, we discuss an extension to two popular approaches to modelling complex structures in ecological data: the generalized additive model (GAM) and the hierarchical model (HGLM). The hierarchical GAM (HGAM), allows modelling of nonlinear functional relationships between covariates and outcomes where the shape of the function itself varies between different grouping levels. We describe the theoretical connection between these models, HGLMs and GAMs, explain how to model different assumptions about the degree of inter-group variability in functional response, and show how HGAMs can be readily fitted using existing GAM software, the mgcv package in R. We also discuss computational and statistical issues with fitting these models, and demonstrate how to fit HGAMs on example data.

Download Full-text

A Longitudinal Analysis of the Impact of Distance Driven on the Probability of Car Accidents

Risks ◽

10.3390/risks8030091 ◽

2020 ◽

Vol 8 (3) ◽

pp. 91

Author(s):

Jean-Philippe Boucher ◽

Roxane Turcotte

Keyword(s):

Poisson Distribution ◽

Fixed Effects ◽

Generalized Additive Models ◽

Additive Models ◽

Panel Count Data ◽

Insurance Company ◽

Smooth Functions ◽

Claim Frequency ◽

The Impact ◽

The Relationship

Using telematics data, we study the relationship between claim frequency and distance driven through different models by observing smooth functions. We used Generalized Additive Models (GAM) for a Poisson distribution, and Generalized Additive Models for Location, Scale, and Shape (GAMLSS) that we generalize for panel count data. To correctly observe the relationship between distance driven and claim frequency, we show that a Poisson distribution with fixed effects should be used because it removes residual heterogeneity that was incorrectly captured by previous models based on GAM and GAMLSS theory. We show that an approximately linear relationship between distance driven and claim frequency can be derived. We argue that this approach can be used to compute the premium surcharge for additional kilometers the insured wants to drive, or as the basis to construct Pay-as-you-drive (PAYD) insurance for self-service vehicles. All models are illustrated using data from a major Canadian insurance company.

Download Full-text

DECENT: differential expression with capture efficiency adjustmeNT for single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz453 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5155-5162 ◽

Cited By ~ 10

Author(s):

Chengzhong Ye ◽

Terence P Speed ◽

Agus Salim

Keyword(s):

Single Cell ◽

Differential Expression ◽

Type I Error ◽

R Package ◽

Supplementary Information ◽

Type I ◽

Common Phenomenon ◽

Rna Seq ◽

Capture Process ◽

Technological Platforms

Abstract Motivation Dropout is a common phenomenon in single-cell RNA-seq (scRNA-seq) data, and when left unaddressed it affects the validity of the statistical analyses. Despite this, few current methods for differential expression (DE) analysis of scRNA-seq data explicitly model the process that gives rise to the dropout events. We develop DECENT, a method for DE analysis of scRNA-seq data that explicitly and accurately models the molecule capture process in scRNA-seq experiments. Results We show that DECENT demonstrates improved DE performance over existing DE methods that do not explicitly model dropout. This improvement is consistently observed across several public scRNA-seq datasets generated using different technological platforms. The gain in improvement is especially large when the capture process is overdispersed. DECENT maintains type I error well while achieving better sensitivity. Its performance without spike-ins is almost as good as when spike-ins are used to calibrate the capture model. Availability and implementation The method is implemented as a publicly available R package available from https://github.com/cz-ye/DECENT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Estimating the impact of vaccination using age–time-dependent incidence rates of hepatitis B

Epidemiology and Infection ◽

10.1017/s0950268807008692 ◽

2007 ◽

Vol 136 (3) ◽

pp. 341-351 ◽

Cited By ~ 9

Author(s):

N. HENS ◽

M. AERTS ◽

Z. SHKEDY ◽

P. KUNG'U KIMANI ◽

M. KOJOUHOROVA ◽

...

Keyword(s):

Hepatitis B ◽

Generalized Additive Models ◽

Additive Model ◽

Penalized Regression ◽

Incidence Rates ◽

Additive Models ◽

Time Dependent ◽

Model Framework ◽

Penalized Regression Splines ◽

The Impact

SUMMARYThe objective of this study was to model the age–time-dependent incidence of hepatitis B while estimating the impact of vaccination. While stochastic models/time-series have been used before to model hepatitis B cases in the absence of knowledge on the number of susceptibles, this paper proposed using a method that fits into the generalized additive model framework. Generalized additive models with penalized regression splines are used to exploit the underlying continuity of both age and time in a flexible non-parametric way. Based on a unique case notification dataset, we have shown that the implemented immunization programme in Bulgaria resulted in a significant decrease in incidence for infants in their first year of life with 82% (79–84%). Moreover, we have shown that conditional on an assumed baseline susceptibility percentage, a smooth force-of-infection profile can be obtained from which two local maxima were observed at ages 9 and 24 years.

Download Full-text

The effect of different sets of critical values on type I error rates in tiled regression for genome-wide association studies

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2016.080030 ◽

2016 ◽

Vol 16 (2) ◽

pp. 111

Author(s):

Heejong Sung ◽

Jeremy A. Sabourin ◽

Alexa J.M. Sorant ◽

Alexander F. Wilson

Keyword(s):

Type I Error ◽

Association Studies ◽

Error Rates ◽

Critical Values ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Type I Error Rates ◽

Genome Wide

Download Full-text

Prioritizing hypothesis tests for high throughput data

Bioinformatics ◽

10.1093/bioinformatics/btv608 ◽

2015 ◽

Vol 32 (6) ◽

pp. 850-858 ◽

Cited By ~ 3

Author(s):

Sangjin Kim ◽

Paul Schliekelman

Keyword(s):

High Throughput ◽

Correction Factor ◽

Type I Error ◽

Supplementary Information ◽

Type I ◽

Hypothesis Tests ◽

High Throughput Data ◽

Independent Information ◽

Discovery Probability

Abstract Motivation: The advent of high throughput data has led to a massive increase in the number of hypothesis tests conducted in many types of biological studies and a concomitant increase in stringency of significance thresholds. Filtering methods, which use independent information to eliminate less promising tests and thus reduce multiple testing, have been widely and successfully applied. However, key questions remain about how to best apply them: When is filtering beneficial and when is it detrimental? How good does the independent information need to be in order for filtering to be effective? How should one choose the filter cutoff that separates tests that pass the filter from those that don’t? Result: We quantify the effect of the quality of the filter information, the filter cutoff and other factors on the effectiveness of the filter and show a number of results: If the filter has a high probability (e.g. 70%) of ranking true positive features highly (e.g. top 10%), then filtering can lead to dramatic increase (e.g. 10-fold) in discovery probability when there is high redundancy in information between hypothesis tests. Filtering is less effective when there is low redundancy between hypothesis tests and its benefit decreases rapidly as the quality of the filter information decreases. Furthermore, the outcome is highly dependent on the choice of filter cutoff. Choosing the cutoff without reference to the data will often lead to a large loss in discovery probability. However, naïve optimization of the cutoff using the data will lead to inflated type I error. We introduce a data-based method for choosing the cutoff that maintains control of the family-wise error rate via a correction factor to the significance threshold. Application of this approach offers as much as a several-fold advantage in discovery probability relative to no filtering, while maintaining type I error control. We also introduce a closely related method of P-value weighting that further improves performance. Availability and implementation: R code for calculating the correction factor is available at http://www.stat.uga.edu/people/faculty/paul-schliekelman. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

Forecasting for smart energy: an accurate and effificient negative binomial additive model

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v20.i2.pp1000-1006 ◽

2020 ◽

Vol 20 (2) ◽

pp. 1000

Author(s):

Yousef-Awwad Daraghmi ◽

Eman Yaser Daraghmi ◽

Motaz Daadoo ◽

Samer Alsaadi

Keyword(s):

Negative Binomial ◽

Load Forecasting ◽

Additive Model ◽

Additive Models ◽

Electric Load ◽

Short Term ◽

Real World Data ◽

Smooth Functions ◽

Data Set ◽

Electric Load Forecasting

<div>Smart energy requires accurate and effificient short-term electric load forecasting to enable effificient</div><div>energy management and active real-time power control. Forecasting accuracy is inflfluenced by the char</div><div>acteristics of electrical load particularly overdispersion, nonlinearity, autocorrelation and seasonal patterns.</div><div>Although several fundamental forecasting methods have been proposed, accurate and effificient forecasting</div><div>methods that can consider all electric load characteristics are still needed. Therefore, we propose a novel</div><div>model for short-term electric load forecasting. The model adopts the negative binomial additive models</div><div>(NBAM) for handling overdispersion and capturing the nonlinearity of electric load. To address the season</div><div>ality, the daily load pattern is classifified into high, moderate, and low seasons, and the autocorrelation of</div><div>load is modeled separately in each season. We also consider the effificiency of forecasting since the NBAM</div><div>captures the behavior of predictors by smooth functions that are estimated via a scoring algorithm which has</div><div>low computational demand. The proposed NBAM is applied to real-world data set from Jericho city, and its</div><div>accuracy and effificiency outperform those of the other models used in this context.</div>

Download Full-text

Modeling tropospheric ozone and particulate matter in Tunis, Tunisia using generalized additive model

Clean Air Journal ◽

10.17159/caj/2021/31/2.8880 ◽

2021 ◽

Vol 31 (2) ◽

Author(s):

Zouhour Hammouda ◽

Leila Hedhili Zaier ◽

Nadege Blond

Keyword(s):

Particulate Matter ◽

Tropospheric Ozone ◽

Generalized Additive Models ◽

Additive Model ◽

Meteorological Condition ◽

Additive Models ◽

Linear Functions ◽

Meteorological Variables ◽

Time Variables ◽

The Impact

The main purpose of this paper is to analyze the sensitivity of tropospheric ozone and particulate matter concentrations to changes in local scale meteorology with the aid of meteorological variables (wind speed, wind direction, relative humidity, solar radiation and temperature) and intensity of traffic using hourly concentration of NOX, which are measured in three different locations in Tunis, (i.e. Gazela, Mannouba and Bab Aliwa). In order to quantify the impact of meteorological conditions and precursor concentrations on air pollution, a general model was developed where the logarithm of the hourly concentrations of O3 and PM10 were modeled as a sum of non-linear functions using the framework of Generalized Additive Models (GAMs). Partial effects of each predictor are presented. We obtain a good fit with R² = 85% for the response variable O3 at Bab Aliwa station. Results show the aggregate impact of meteorological variables in the models explained 29% of the variance in PM10 and 41% in O3. This indicates that local meteorological condition is an active driver of air quality in Tunis. The time variables (hour of the day, day of the week and month) also have an effect. This is especially true for the time variable “month” that contributes significantly to the description of the study area.

Download Full-text