Explanation and Probabilistic Prediction of Hydrological Signatures with Statistical Boosting Algorithms

Hristos Tyralis; Georgia Papacharalampous; Andreas Langousis; Simon Michael Papalexiou

doi:10.3390/rs13030333

Explanation and Probabilistic Prediction of Hydrological Signatures with Statistical Boosting Algorithms

Remote Sensing ◽

10.3390/rs13030333 ◽

2021 ◽

Vol 13 (3) ◽

pp. 333

Author(s):

Hristos Tyralis ◽

Georgia Papacharalampous ◽

Andreas Langousis ◽

Simon Michael Papalexiou

Keyword(s):

Linear Models ◽

Statistical Features ◽

Probabilistic Prediction ◽

Regression Problem ◽

Climatic Indices ◽

Dependent Variables ◽

Boosting Algorithms ◽

Topographic Characteristics ◽

Hydrological Signatures ◽

Better Than

Hydrological signatures, i.e., statistical features of streamflow time series, are used to characterize the hydrology of a region. A relevant problem is the prediction of hydrological signatures in ungauged regions using the attributes obtained from remote sensing measurements at ungauged and gauged regions together with estimated hydrological signatures from gauged regions. The relevant framework is formulated as a regression problem, where the attributes are the predictor variables and the hydrological signatures are the dependent variables. Here we aim to provide probabilistic predictions of hydrological signatures using statistical boosting in a regression setting. We predict 12 hydrological signatures using 28 attributes in 667 basins in the contiguous US. We provide formal assessment of probabilistic predictions using quantile scores. We also exploit the statistical boosting properties with respect to the interpretability of derived models. It is shown that probabilistic predictions at quantile levels 2.5% and 97.5% using linear models as base learners exhibit better performance compared to more flexible boosting models that use both linear models and stumps (i.e., one-level decision trees). On the contrary, boosting models that use both linear models and stumps perform better than boosting with linear models when used for point predictions. Moreover, it is shown that climatic indices and topographic characteristics are the most important attributes for predicting hydrological signatures.

Download Full-text

Averaging Trials Versus Averaging Trial Peaks: Impact on Study Outcomes

Journal of Applied Biomechanics ◽

10.1123/jab.2016-0164 ◽

2017 ◽

Vol 33 (3) ◽

pp. 233-236 ◽

Cited By ~ 7

Author(s):

Kevin D. Dames ◽

Jeremy D. Smith ◽

Gary D. Heise

Keyword(s):

Phase Shift ◽

Average Data ◽

Paired Samples ◽

Individual Trial ◽

Dependent Variables ◽

Data Points ◽

Multiple Trials ◽

Average Profile ◽

Better Than ◽

Peak Value

Gait data are commonly presented as an average of many trials or as an average across participants. Discrete data points (eg, maxima or minima) are identified and used as dependent variables in subsequent statistical analyses. However, the approach used for obtaining average data from multiple trials is inconsistent and unclear in the biomechanics literature. This study compared the statistical outcomes of averaging peaks from multiple trials versus identifying a single peak from an average profile. A series of paired-samples t tests were used to determine whether there were differences in average dependent variables from these 2 methods. Identifying a peak value from the average profile resulted in significantly smaller magnitudes of dependent variables than when peaks from multiple trials were averaged. Disagreement between the 2 methods was due to temporal differences in trial peak locations. Sine curves generated in MATLAB confirmed this misrepresentation of trial peaks in the average profile when a phase shift was introduced. Based on these results, averaging individual trial peaks represents the actual data better than choosing a peak from an average trial profile.

Download Full-text

COMPARISON OF FUZZY SUBTRACTIVE CLUSTERING AND FUZZYC-MEANS

Kursor ◽

10.21107/kursor.v11i1.254 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Annisa Eka Haryati ◽

Sugiyarto Sugiyarto ◽

Rizki Desi Arindra Putri

Keyword(s):

Principal Component Analysis ◽

Fuzzy Clustering ◽

Principal Component ◽

Large Data ◽

Component Analysis ◽

Subtractive Clustering ◽

Validity Index ◽

Fuzzy Clustering Method ◽

Dependent Variables ◽

Better Than

Multivariate statistics have related problems with large data dimensions. One method that can be used is principal component analysis (PCA). Principal component analysis (PCA) is a technique used to reduce data dimensions consisting of several dependent variables while maintaining variance in the data. PCA can be used to stabilize measurements in statistical analysis, one of which is cluster analysis. Fuzzy clustering is a method of grouping based on membership values that includes fuzzy sets as a weighting basis for grouping. In this study, the fuzzy clustering method used is Fuzzy Subtractive Clustering (FSC) and Fuzzy C-Means (FCM) with a combination of the Minkowski Chebysev distance. The purpose of this study was to compare the cluster results obtained from the FSC and FCM using the DBI validity index. The results obtained indicate that the results of clustering using FCM are better than the FSC.

Download Full-text

Simple Linear Models With Polytomous Categorical Dependent Variables: Multinomial and Ordinal Logistic Regression

Regression & Linear Modeling: Best Practices and Modern Methods ◽

10.4135/9781071802724.n6 ◽

2017 ◽

pp. 133-156

Keyword(s):

Logistic Regression ◽

Linear Models ◽

Ordinal Logistic Regression ◽

Dependent Variables

Download Full-text

Simple Linear Models With Categorical Dependent Variables: Binary Logistic Regression

Regression & Linear Modeling: Best Practices and Modern Methods ◽

10.4135/9781071802724.n5 ◽

2017 ◽

pp. 97-132

Keyword(s):

Logistic Regression ◽

Linear Models ◽

Binary Logistic Regression ◽

Dependent Variables

Download Full-text

Simple Linear Models With Continuous Dependent Variables: Simple Regression Analyses

Regression & Linear Modeling: Best Practices and Modern Methods ◽

10.4135/9781071802724.n3 ◽

2017 ◽

pp. 47-70

Keyword(s):

Linear Models ◽

Regression Analyses ◽

Dependent Variables ◽

Simple Regression

Download Full-text

Neural-Based Ensembles and Unorganized Machines to Predict Streamflow Series from Hydroelectric Plants

Energies ◽

10.3390/en13184769 ◽

2020 ◽

Vol 13 (18) ◽

pp. 4769 ◽

Cited By ~ 1

Author(s):

Jônatas Belotti ◽

Hugo Siqueira ◽

Lilian Araujo ◽

Sérgio L. Stevan ◽

Paulo S.G. de Mattos Neto ◽

...

Keyword(s):

Linear Models ◽

Moving Average ◽

Elman Network ◽

Average Model ◽

Learning Machines ◽

12 Steps ◽

Moving Average Model ◽

Hydroelectric Plants ◽

Seasonal Streamflow ◽

Better Than

Estimating future streamflows is a key step in producing electricity for countries with hydroelectric plants. Accurate predictions are particularly important due to environmental and economic impact they lead. In order to analyze the forecasting capability of models regarding monthly seasonal streamflow series, we realized an extensive investigation considering: six versions of unorganized machines—extreme learning machines (ELM) with and without regularization coefficient (RC), and echo state network (ESN) using the reservoirs from Jaeger’s and Ozturk et al., with and without RC. Additionally, we addressed the ELM as the combiner of a neural-based ensemble, an investigation not yet accomplished in such context. A comparative analysis was performed utilizing two linear approaches (autoregressive model (AR) and autoregressive and moving average model (ARMA)), four artificial neural networks (multilayer perceptron, radial basis function, Elman network, and Jordan network), and four ensembles. The tests were conducted at five hydroelectric plants, using horizons of 1, 3, 6, and 12 steps ahead. The results indicated that the unorganized machines and the ELM ensembles performed better than the linear models in all simulations. Moreover, the errors showed that the unorganized machines and the ELM-based ensembles reached the best general performances.

Download Full-text

Generalized ordered linear regression with regularization

Bulletin of the Polish Academy of Sciences Technical Sciences ◽

10.2478/v10175-012-0061-2 ◽

2012 ◽

Vol 60 (3) ◽

pp. 481-489 ◽

Cited By ~ 3

Author(s):

J.M. Łęski ◽

N. Henzel

Keyword(s):

Linear Regression ◽

Linear Models ◽

Linear Regression Analysis ◽

Gradient Algorithm ◽

Weighted Averaging ◽

Criterion Function ◽

Regression Problem ◽

Quadratic Minimization ◽

Minimization Problems ◽

Model Residuals

Abstract Linear regression analysis has become a fundamental tool in experimental sciences. We propose a new method for parameter estimation in linear models. The ’Generalized Ordered Linear Regression with Regularization’ (GOLRR) uses various loss functions (including the ǫ-insensitive ones), ordered weighted averaging of the residuals, and regularization. The algorithm consists in solving a sequence of weighted quadratic minimization problems where the weights used for the next iteration depend not only on the values but also on the order of the model residuals obtained for the current iteration. Such regression problem may be transformed into the iterative reweighted least squares scenario. The conjugate gradient algorithm is used to minimize the proposed criterion function. Finally, numerical examples are given to demonstrate the validity of the method proposed.

Download Full-text

Comparative study between Fenton and Intergrowth 21 charts in a sample of Lebanese premature babies

10.21203/rs.2.20438/v1 ◽

2020 ◽

Author(s):

Marie Samarani ◽

Gianna Restom ◽

Joelle Mardini ◽

Georges Abi Fares ◽

Souheil Hallit ◽

...

Keyword(s):

National Level ◽

University Hospital ◽

Growth Chart ◽

Growth Charts ◽

Anthropometric Data ◽

Prenatal Growth ◽

Premature Babies ◽

Dependent Variables ◽

A Prospective Study ◽

Better Than

Abstract Background: Different charts are used to assess premature growth. The Fenton chart, based on prenatal growth, was used in in the intensive care unit of the Notre Dame des Secours University Hospital to assess premature newborns’ development. Intergrowth21 is a new multidisciplinary, multiethnic growth chart better adapted to premature growth. Our objective was to compare both charts Fenton and Intergrowth21 in order to implement Intergrowth in our unit.Methods: We analyzed 318 files of premature babies born who were admitted to the NICU from 2010 till 2017. Anthropometric data (weight, height and head circumference) converted to percentiles was filled on both charts from birth till 1 month of age.Results: The results of the linear regression taking the weight at birth as the dependent variable, showed that the Fenton scale (R2=0.391) would predict the weight at birth better than the Intergrowth 21 scale (R2=0.257). The same applies for the height at birth and cranial perimeter at birth when taken as dependent variables. When considering the weight and height at 2 weeks, the results showed that the Intergrowth 21 scale would predict those variables better than the Fenton scale, with higher R2 values higher in favor of the Intergrowth 21 scale for both weight (0.384 vs 0.311) and height (0.650 vs 0.585). At 4 weeks, the results showed that the Fenton scale would predict weight (R2=0.655 vs 0.631) and height (R2=0.710 vs 0.643) better than the Intergrowth 21 scale. The results obtained are adjusted over the newborns’ sociodemographic and clinical factors.Conclusion:The results of our study are controversial where the Fenton growth charts are superior to Intergrowth 21 before two weeks of age and at 4 weeks, whereas Intergrowth 21 charts showed higher percentiles for weight and height than Fenton charts at 2 two weeks of age. Further studies following a different design, such as a clinical trial or a prospective study, and conducted in multiple centers should be considered to enroll a more representative Lebanese sample of children and be able to extrapolate our results to the national level.

Download Full-text

Inception in visual cortex: in vivo-silico loops reveal most exciting images

10.1101/506956 ◽

2018 ◽

Cited By ~ 4

Author(s):

Edgar Y. Walker ◽

Fabian H. Sinz ◽

Emmanouil Froudarakis ◽

Paul G. Fahey ◽

Taliah Muhammad ◽

...

Keyword(s):

Linear Models ◽

Sensory Information ◽

Target Cells ◽

New Approach ◽

Spatial Features ◽

Sensory Information Processing ◽

Response Modeling ◽

The Brain ◽

Better Than

Much of our knowledge about sensory processing in the brain is based on quasi-linear models and the stimuli that optimally drive them. However, sensory information processing is nonlinear, even in primary sensory areas, and optimizing sensory input is difficult due to the high-dimensional input space. We developed inception loops, a closed-loop experimental paradigm that combines in vivo recordings with in silico nonlinear response modeling to identify the Most Exciting Images (MEIs) for neurons in mouse V1. When presented back to the brain, MEIs indeed drove their target cells significantly better than the best stimuli identified by linear models. The MEIs exhibited complex spatial features that deviated from the textbook ideal of V1 as a bank of Gabor filters. Inception loops represent a widely applicable new approach to dissect the neural mechanisms of sensation.

Download Full-text

decoupleR: Ensemble of computational methods to infer biological activities from omics data

10.1101/2021.11.04.467271 ◽

2021 ◽

Author(s):

Pau Badia-i-Mompel ◽

Jesús Vélez ◽

Jana Braunger ◽

Celina Geiss ◽

Daniel Dimitrov ◽

...

Keyword(s):

Computational Methods ◽

Statistical Power ◽

Linear Models ◽

Biological Activities ◽

Omics Data ◽

Bioconductor Package ◽

Unified Framework ◽

Knowledge Resources ◽

Consensus Score ◽

Better Than

Summary: Many methods allow us to extract biological activities from omics data using information from prior knowledge resources, reducing the dimensionality for increased statistical power and better interpretability. Here, we present decoupleR, a Bioconductor package containing computational methods to extract these activities within a unified framework. decoupleR allows us to flexibly run any method with a given resource, including methods that leverage mode of regulation and weights of interactions. Using decoupleR, we evaluated the performance of methods on transcriptomic and phospho-proteomic perturbation experiments. Our findings suggest that simple linear models and the consensus score across methods perform better than other methods at predicting perturbed regulators. Availability and Implementation: decoupleR is open source available in Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/decoupleR.html). The code to reproduce the results is in Github (https://github.com/saezlab/decoupleR_manuscript) and the data in Zenodo (https://zenodo.org/record/5645208).

Download Full-text