Out-of-Sample Prediction in Multidimensional P-Spline Models

Alba Carballo; María Durbán; Dae-Jin Lee

doi:10.3390/math9151761

Out-of-Sample Prediction in Multidimensional P-Spline Models

Mathematics ◽

10.3390/math9151761 ◽

2021 ◽

Vol 9 (15) ◽

pp. 1761

Author(s):

Alba Carballo ◽

María Durbán ◽

Dae-Jin Lee

Keyword(s):

Mixed Model ◽

Real Data ◽

Additive Models ◽

Multidimensional Case ◽

Logarithmic Scale ◽

Smooth Functions ◽

Interaction Terms ◽

Anova Model ◽

Out Of Sample ◽

Out Of Sample Prediction

The prediction of out-of-sample values is an interesting problem in any regression model. In the context of penalized smoothing using a mixed-model reparameterization, a general framework has been proposed for predicting in additive models but without interaction terms. The aim of this paper is to generalize this work, extending the methodology proposed in the multidimensional case, to models that include interaction terms, i.e., when prediction is carried out in a multidimensional setting. Our method fits the data, predicts new observations at the same time, and uses constraints to ensure a consistent fit or impose further restrictions on predictions. We have also developed this method for the so-called smooth-ANOVA model, which allows us to include interaction terms that can be decomposed into the sum of several smooth functions. We also develop this methodology for the so-called smooth-ANOVA models, which allow us to include interaction terms that can be decomposed as a sum of several smooth functions. To illustrate the method, two real data sets were used, one for predicting the mortality of the U.S. population in a logarithmic scale, and the other for predicting the aboveground biomass of Populus trees as a smooth function of height and diameter. We examine the performance of interaction and the smooth-ANOVA model through simulation studies.

Download Full-text

Predicting the trend of stock index based on feature engineering and CatBoost model

International Journal of Financial Engineering ◽

10.1142/s2424786321500274 ◽

2021 ◽

pp. 2150027

Author(s):

Renzhe Xu ◽

Yudong Chen ◽

Tenglong Xiao ◽

Jingli Wang ◽

Xiong Wang

Keyword(s):

Time Scale ◽

Stock Index ◽

Feature Engineering ◽

Multiple Time ◽

Multiple Time Scale ◽

Back Testing ◽

Price Series ◽

Out Of Sample ◽

Out Of Sample Prediction ◽

Sharp Ratio

As an important tool to measure the current situation of the whole stock market, the stock index has always been the focus of researchers, especially for its prediction. This paper uses trend types, which are received by clustering price series under multiple time scale, combined with the day-of-the-week effect to construct a categorical feature combination. Based on the historical data of six kinds of Chinese stock indexes, the CatBoost model is used for training and predicting. Experimental results show that the out-of-sample prediction accuracy is 0.55, and the long–short trading strategy can obtain average annualized return of 34.43%, which is a great improvement compared with other classical classification algorithms. Under the rolling back-testing, the model can always obtain stable returns in each period of time from 2012 to 2020. Among them, the SSESC’s long–short strategy has the best performance with an annualized return of 40.85% and a sharp ratio of 1.53. Therefore, the trend information on multiple time-scale features based on feature engineering can be learned by the CatBoost model well, which has a guiding effect on predicting stock index trends.

Download Full-text

A generalized additive model-based data-driven solution for lithium-ion battery capacity prediction and local effects analysis

Transactions of the Institute of Measurement and Control ◽

10.1177/01423312211057981 ◽

2021 ◽

pp. 014233122110579

Author(s):

Tao Chen ◽

Ciwei Gao ◽

Hongxun Hui ◽

Qiushi Cui ◽

Huan Long

Keyword(s):

Energy Storage ◽

Lithium Ion Battery ◽

Lithium Ion ◽

Additive Models ◽

Data Driven ◽

Local Effects ◽

Interaction Terms ◽

Battery Capacity ◽

Formulation Parameters ◽

Capacity Prediction

Lithium-ion battery-based energy storage systems have been widely utilized in many applications such as transportation electrification and smart grids. As a key health status indicator, battery performance would highly rely on its capacity, which is easily influenced by various electrode formulation parameters within a battery. Due to the strongly coupled electrical, chemical, thermal dynamics, predicting battery capacity, and analysing the local effects of interested parameters within battery is significantly important but challenging. This article proposes an effective data-driven method to achieve effective battery capacity prediction, as well as local effects analysis. The solution is derived by using generalized additive models (GAM) with different interaction terms. Comparison study illustrate that the proposed GAM-based solution is capable of not only performing satisfactory battery capacity predictions but also quantifying the local effects of five important battery electrode formulation parameters as well as their interaction terms. Due to data-driven nature and explainability, the proposed method could benefit battery capacity prediction in an efficient manner and facilitate battery control for many other energy storage system applications.

Download Full-text

Customizing persuasive messages; the value of operative measures

Journal of Consumer Marketing ◽

10.1108/jcm-11-2016-1996 ◽

2018 ◽

Vol 35 (2) ◽

pp. 208-217 ◽

Cited By ~ 3

Author(s):

Maurits Kaptein

Keyword(s):

Interactive Media ◽

Database Systems ◽

Customer Behavior ◽

Customer Relationship ◽

Persuasive Messages ◽

Psychological Traits ◽

Content Type ◽

Out Of Sample ◽

Out Of Sample Prediction ◽

Operative Measures

Purpose This paper aims to examine whether estimates of psychological traits obtained using meta-judgmental measures (as commonly present in customer relationship management database systems) or operative measures are most useful in predicting customer behavior. Design/methodology/approach Using an online experiment (N = 283), the study collects meta-judgmental and operative measures of customers. Subsequently, it compares the out-of-sample prediction error of responses to persuasive messages. Findings The study shows that operative measures – derived directly from measures of customer behavior – are more informative than meta-judgmental measures. Practical implications Using interactive media, it is possible to actively elicit operative measures. This study shows that practitioners seeking to customize their marketing communication should focus on obtaining such psychographic observations. Originality/value While currently both meta-judgmental measures and operative measures are used for customization in interactive marketing, this study directly compares their utility for the prediction of future responses to persuasive messages.

Download Full-text

A Longitudinal Analysis of the Impact of Distance Driven on the Probability of Car Accidents

Risks ◽

10.3390/risks8030091 ◽

2020 ◽

Vol 8 (3) ◽

pp. 91

Author(s):

Jean-Philippe Boucher ◽

Roxane Turcotte

Keyword(s):

Poisson Distribution ◽

Fixed Effects ◽

Generalized Additive Models ◽

Additive Models ◽

Panel Count Data ◽

Insurance Company ◽

Smooth Functions ◽

Claim Frequency ◽

The Impact ◽

The Relationship

Using telematics data, we study the relationship between claim frequency and distance driven through different models by observing smooth functions. We used Generalized Additive Models (GAM) for a Poisson distribution, and Generalized Additive Models for Location, Scale, and Shape (GAMLSS) that we generalize for panel count data. To correctly observe the relationship between distance driven and claim frequency, we show that a Poisson distribution with fixed effects should be used because it removes residual heterogeneity that was incorrectly captured by previous models based on GAM and GAMLSS theory. We show that an approximately linear relationship between distance driven and claim frequency can be derived. We argue that this approach can be used to compute the premium surcharge for additional kilometers the insured wants to drive, or as the basis to construct Pay-as-you-drive (PAYD) insurance for self-service vehicles. All models are illustrated using data from a major Canadian insurance company.

Download Full-text

Factors Contributing to Intra-Individual Variation of Serum Constituents: 5. Short-Term Day-to-Day and Within-Hour Variation of Serum Constituents in Healthy Subjects

Clinical Chemistry ◽

10.1093/clinchem/20.12.1520 ◽

1974 ◽

Vol 20 (12) ◽

pp. 1520-1527 ◽

Cited By ~ 30

Author(s):

Per Winkel ◽

Bernard E Statland ◽

Henning Bokelund

Keyword(s):

Total Lipids ◽

Short Term ◽

Serum Samples ◽

Interaction Terms ◽

Anova Model ◽

First Case ◽

Main Effect ◽

Relationship Of ◽

The Relationship ◽

Analytical Variation

Abstract We evaluated the variations in some serum constituents in a group of healthy young men for two selected time intervals: short-term day-to-day changes and within-hour changes. In the first case, we used a two-way ANOVA model to compute the main-day effect and the subject-day interaction terms, which were combined to yield the total day-to-day variation. A main-day effect was seen to be statistically significant only for acid phosphatase, while all of the 18 serum constituents except for sodium, calcium, and albumin demonstrated a statistically significant subject-day interaction. For the within-hour biologic variation, a three-way ANOVA model was used to analyze results of duplicate serum samples drawn at 1100 h and 1130 h on two different days. Although a significant main effect of hour was found only for total lipids and alkaline phosphatase, pooling the main effect of hour, subject-hour interaction, and subject-day-hour interaction terms resulted in a chemically significant variation for potassium, total protein, albumin, iron, total lipids, cholesterol, and bilirubin. The relationship of these biological fluctuations is compared to the expected analytical variation in all cases.

Download Full-text

Microstructure in the Machine Age

Review of Financial Studies ◽

10.1093/rfs/hhaa078 ◽

2020 ◽

Cited By ~ 1

Author(s):

David Easley ◽

Marcos López de Prado ◽

Maureen O’Hara ◽

Zhibai Zhang

Keyword(s):

Machine Learning ◽

Market Efficiency ◽

Predictive Power ◽

Explanatory Power ◽

Futures Contracts ◽

Machine Age ◽

Price Process ◽

Out Of Sample ◽

Asset Classes ◽

Out Of Sample Prediction

Abstract Understanding modern market microstructure phenomena requires large amounts of data and advanced mathematical tools. We demonstrate how machine learning can be applied to microstructural research. We find that microstructure measures continue to provide insights into the price process in current complex markets. Some microstructure features with high explanatory power exhibit low predictive power, while others with less explanatory power have more predictive power. We find that some microstructure-based measures are useful for out-of-sample prediction of various market statistics, leading to questions about market efficiency. We also show how microstructure measures can have important cross-asset effects. Our results are derived using 87 liquid futures contracts across all asset classes.

Download Full-text

HARK the SHARK: Realized Volatility Modeling with Measurement Errors and Nonlinear Dependencies*

Journal of Financial Econometrics ◽

10.1093/jjfinec/nbz025 ◽

2019 ◽

Cited By ~ 2

Author(s):

Giuseppe Buccheri ◽

Fulvio Corsi

Keyword(s):

Long Memory ◽

Measurement Errors ◽

Asymptotic Theory ◽

Linear Models ◽

Real Data ◽

Realized Variance ◽

Likelihood Methods ◽

Out Of Sample ◽

Maximum Likelihood Methods ◽

Integrated Variance

Abstract Despite their effectiveness, linear models for realized variance neglect measurement errors on integrated variance and exhibit several forms of misspecification due to the inherent nonlinear dynamics of volatility. We propose new extensions of the popular approximate long-memory heterogeneous autoregressive (HAR) model apt to disentangle these effects and quantify their separate impact on volatility forecasts. By combining the asymptotic theory of the realized variance estimator with the Kalman filter and by introducing time-varying HAR parameters, we build new models that account for: (i) measurement errors (HARK), (ii) nonlinear dependencies (SHAR) and (iii) both measurement errors and nonlinearities (SHARK). The proposed models are simply estimated through standard maximum likelihood methods and are shown, both on simulated and real data, to provide better out-of-sample forecasts compared to standard HAR specifications and other competing approaches.

Download Full-text

Application of bivariate negative binomial regression model in analysing insurance count data

Annals of Actuarial Science ◽

10.1017/s1748499517000070 ◽

2017 ◽

Vol 11 (2) ◽

pp. 390-411 ◽

Cited By ~ 2

Author(s):

Feng Liu ◽

David Pitt

Keyword(s):

Ridge Regression ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Third Party ◽

Negative Binomial Regression Model ◽

Shrinkage Methods ◽

Binomial Regression ◽

Out Of Sample ◽

Selection Operator ◽

Out Of Sample Prediction

AbstractIn this paper we analyse insurance claim frequency data using the bivariate negative binomial regression (BNBR) model. We use general insurance data on claims from simple third-party liability insurance and comprehensive insurance. We find that bivariate regression, with its capacity for modelling correlation between the two observed claim counts, provides both a superior fit and out-of-sample prediction compared with the more common practice of fitting univariate negative binomial regression models separately to each claim type. Noting the complexity of BNBR models and their potential for a large number of parameters, we explore the use of model shrinkage methodology, namely the least absolute shrinkage and selection operator (Lasso) and ridge regression. We find that models estimated using shrinkage methods outperform the ordinary likelihood-based models when being used to make predictions out-of-sample. We find that the Lasso performs better than ridge regression as a method of shrinkage.

Download Full-text

Selecting The “Best” Prediction Model: An Application To Agricultural Cooperatives

Journal of Agricultural and Applied Economics ◽

10.1017/s008130520002608x ◽

1992 ◽

Vol 24 (1) ◽

pp. 163-169 ◽

Cited By ~ 1

Author(s):

Alicia N. Rambaldi ◽

Hector O. Zapata ◽

Ralph D. Christy

Keyword(s):

Credit Scoring ◽

Scoring Function ◽

Poor Performance ◽

Agricultural Cooperatives ◽

Agricultural Cooperative ◽

Out Of Sample ◽

Farm Credit ◽

Rank Transformation ◽

Out Of Sample Prediction ◽

Credit Worthiness

AbstractA credit scoring function incorporating statistical selection criteria was proposed to evaluate the credit worthiness of agricultural cooperative loans in the Fifth Farm Credit District. In-sample (1981-1986) and out-of-sample (1988) prediction performance of the selected models were evaluated using rank transformation discriminant analysis, logit, and probit. Results indicate superior out-of-sample performance for the management oriented approach relative to classification of unacceptable loans, and poor performance of the rank transformation in out-of-sample prediction.

Download Full-text