scholarly journals Out-of-Sample Prediction in Multidimensional P-Spline Models

Mathematics ◽  
2021 ◽  
Vol 9 (15) ◽  
pp. 1761
Author(s):  
Alba Carballo ◽  
María Durbán ◽  
Dae-Jin Lee

The prediction of out-of-sample values is an interesting problem in any regression model. In the context of penalized smoothing using a mixed-model reparameterization, a general framework has been proposed for predicting in additive models but without interaction terms. The aim of this paper is to generalize this work, extending the methodology proposed in the multidimensional case, to models that include interaction terms, i.e., when prediction is carried out in a multidimensional setting. Our method fits the data, predicts new observations at the same time, and uses constraints to ensure a consistent fit or impose further restrictions on predictions. We have also developed this method for the so-called smooth-ANOVA model, which allows us to include interaction terms that can be decomposed into the sum of several smooth functions. We also develop this methodology for the so-called smooth-ANOVA models, which allow us to include interaction terms that can be decomposed as a sum of several smooth functions. To illustrate the method, two real data sets were used, one for predicting the mortality of the U.S. population in a logarithmic scale, and the other for predicting the aboveground biomass of Populus trees as a smooth function of height and diameter. We examine the performance of interaction and the smooth-ANOVA model through simulation studies.

Author(s):  
Renzhe Xu ◽  
Yudong Chen ◽  
Tenglong Xiao ◽  
Jingli Wang ◽  
Xiong Wang

As an important tool to measure the current situation of the whole stock market, the stock index has always been the focus of researchers, especially for its prediction. This paper uses trend types, which are received by clustering price series under multiple time scale, combined with the day-of-the-week effect to construct a categorical feature combination. Based on the historical data of six kinds of Chinese stock indexes, the CatBoost model is used for training and predicting. Experimental results show that the out-of-sample prediction accuracy is 0.55, and the long–short trading strategy can obtain average annualized return of 34.43%, which is a great improvement compared with other classical classification algorithms. Under the rolling back-testing, the model can always obtain stable returns in each period of time from 2012 to 2020. Among them, the SSESC’s long–short strategy has the best performance with an annualized return of 40.85% and a sharp ratio of 1.53. Therefore, the trend information on multiple time-scale features based on feature engineering can be learned by the CatBoost model well, which has a guiding effect on predicting stock index trends.


Author(s):  
Tao Chen ◽  
Ciwei Gao ◽  
Hongxun Hui ◽  
Qiushi Cui ◽  
Huan Long

Lithium-ion battery-based energy storage systems have been widely utilized in many applications such as transportation electrification and smart grids. As a key health status indicator, battery performance would highly rely on its capacity, which is easily influenced by various electrode formulation parameters within a battery. Due to the strongly coupled electrical, chemical, thermal dynamics, predicting battery capacity, and analysing the local effects of interested parameters within battery is significantly important but challenging. This article proposes an effective data-driven method to achieve effective battery capacity prediction, as well as local effects analysis. The solution is derived by using generalized additive models (GAM) with different interaction terms. Comparison study illustrate that the proposed GAM-based solution is capable of not only performing satisfactory battery capacity predictions but also quantifying the local effects of five important battery electrode formulation parameters as well as their interaction terms. Due to data-driven nature and explainability, the proposed method could benefit battery capacity prediction in an efficient manner and facilitate battery control for many other energy storage system applications.


2018 ◽  
Vol 35 (2) ◽  
pp. 208-217 ◽  
Author(s):  
Maurits Kaptein

Purpose This paper aims to examine whether estimates of psychological traits obtained using meta-judgmental measures (as commonly present in customer relationship management database systems) or operative measures are most useful in predicting customer behavior. Design/methodology/approach Using an online experiment (N = 283), the study collects meta-judgmental and operative measures of customers. Subsequently, it compares the out-of-sample prediction error of responses to persuasive messages. Findings The study shows that operative measures – derived directly from measures of customer behavior – are more informative than meta-judgmental measures. Practical implications Using interactive media, it is possible to actively elicit operative measures. This study shows that practitioners seeking to customize their marketing communication should focus on obtaining such psychographic observations. Originality/value While currently both meta-judgmental measures and operative measures are used for customization in interactive marketing, this study directly compares their utility for the prediction of future responses to persuasive messages.


Risks ◽  
2020 ◽  
Vol 8 (3) ◽  
pp. 91
Author(s):  
Jean-Philippe Boucher ◽  
Roxane Turcotte

Using telematics data, we study the relationship between claim frequency and distance driven through different models by observing smooth functions. We used Generalized Additive Models (GAM) for a Poisson distribution, and Generalized Additive Models for Location, Scale, and Shape (GAMLSS) that we generalize for panel count data. To correctly observe the relationship between distance driven and claim frequency, we show that a Poisson distribution with fixed effects should be used because it removes residual heterogeneity that was incorrectly captured by previous models based on GAM and GAMLSS theory. We show that an approximately linear relationship between distance driven and claim frequency can be derived. We argue that this approach can be used to compute the premium surcharge for additional kilometers the insured wants to drive, or as the basis to construct Pay-as-you-drive (PAYD) insurance for self-service vehicles. All models are illustrated using data from a major Canadian insurance company.


1974 ◽  
Vol 20 (12) ◽  
pp. 1520-1527 ◽  
Author(s):  
Per Winkel ◽  
Bernard E Statland ◽  
Henning Bokelund

Abstract We evaluated the variations in some serum constituents in a group of healthy young men for two selected time intervals: short-term day-to-day changes and within-hour changes. In the first case, we used a two-way ANOVA model to compute the main-day effect and the subject-day interaction terms, which were combined to yield the total day-to-day variation. A main-day effect was seen to be statistically significant only for acid phosphatase, while all of the 18 serum constituents except for sodium, calcium, and albumin demonstrated a statistically significant subject-day interaction. For the within-hour biologic variation, a three-way ANOVA model was used to analyze results of duplicate serum samples drawn at 1100 h and 1130 h on two different days. Although a significant main effect of hour was found only for total lipids and alkaline phosphatase, pooling the main effect of hour, subject-hour interaction, and subject-day-hour interaction terms resulted in a chemically significant variation for potassium, total protein, albumin, iron, total lipids, cholesterol, and bilirubin. The relationship of these biological fluctuations is compared to the expected analytical variation in all cases.


Author(s):  
David Easley ◽  
Marcos López de Prado ◽  
Maureen O’Hara ◽  
Zhibai Zhang

Abstract Understanding modern market microstructure phenomena requires large amounts of data and advanced mathematical tools. We demonstrate how machine learning can be applied to microstructural research. We find that microstructure measures continue to provide insights into the price process in current complex markets. Some microstructure features with high explanatory power exhibit low predictive power, while others with less explanatory power have more predictive power. We find that some microstructure-based measures are useful for out-of-sample prediction of various market statistics, leading to questions about market efficiency. We also show how microstructure measures can have important cross-asset effects. Our results are derived using 87 liquid futures contracts across all asset classes.


Author(s):  
Giuseppe Buccheri ◽  
Fulvio Corsi

Abstract Despite their effectiveness, linear models for realized variance neglect measurement errors on integrated variance and exhibit several forms of misspecification due to the inherent nonlinear dynamics of volatility. We propose new extensions of the popular approximate long-memory heterogeneous autoregressive (HAR) model apt to disentangle these effects and quantify their separate impact on volatility forecasts. By combining the asymptotic theory of the realized variance estimator with the Kalman filter and by introducing time-varying HAR parameters, we build new models that account for: (i) measurement errors (HARK), (ii) nonlinear dependencies (SHAR) and (iii) both measurement errors and nonlinearities (SHARK). The proposed models are simply estimated through standard maximum likelihood methods and are shown, both on simulated and real data, to provide better out-of-sample forecasts compared to standard HAR specifications and other competing approaches.


2017 ◽  
Vol 11 (2) ◽  
pp. 390-411 ◽  
Author(s):  
Feng Liu ◽  
David Pitt

AbstractIn this paper we analyse insurance claim frequency data using the bivariate negative binomial regression (BNBR) model. We use general insurance data on claims from simple third-party liability insurance and comprehensive insurance. We find that bivariate regression, with its capacity for modelling correlation between the two observed claim counts, provides both a superior fit and out-of-sample prediction compared with the more common practice of fitting univariate negative binomial regression models separately to each claim type. Noting the complexity of BNBR models and their potential for a large number of parameters, we explore the use of model shrinkage methodology, namely the least absolute shrinkage and selection operator (Lasso) and ridge regression. We find that models estimated using shrinkage methods outperform the ordinary likelihood-based models when being used to make predictions out-of-sample. We find that the Lasso performs better than ridge regression as a method of shrinkage.


1992 ◽  
Vol 24 (1) ◽  
pp. 163-169 ◽  
Author(s):  
Alicia N. Rambaldi ◽  
Hector O. Zapata ◽  
Ralph D. Christy

AbstractA credit scoring function incorporating statistical selection criteria was proposed to evaluate the credit worthiness of agricultural cooperative loans in the Fifth Farm Credit District. In-sample (1981-1986) and out-of-sample (1988) prediction performance of the selected models were evaluated using rank transformation discriminant analysis, logit, and probit. Results indicate superior out-of-sample performance for the management oriented approach relative to classification of unacceptable loans, and poor performance of the rank transformation in out-of-sample prediction.


Sign in / Sign up

Export Citation Format

Share Document