Regularization Methods Based on the Lq-Likelihood for Linear Models with Heavy-Tailed Errors

Yoshihiro Hirose

doi:10.3390/e22091036

Regularization Methods Based on the Lq-Likelihood for Linear Models with Heavy-Tailed Errors

Entropy ◽

10.3390/e22091036 ◽

2020 ◽

Vol 22 (9) ◽

pp. 1036

Author(s):

Yoshihiro Hirose

Keyword(s):

Linear Model ◽

Power Function ◽

Numerical Experiments ◽

Linear Models ◽

Regularization Methods ◽

Penalized Least Squares ◽

Normal Distributions ◽

Normal Linear Model ◽

Log Likelihood ◽

Heavy Tailed

We propose regularization methods for linear models based on the Lq-likelihood, which is a generalization of the log-likelihood using a power function. Regularization methods are popular for the estimation in the normal linear model. However, heavy-tailed errors are also important in statistics and machine learning. We assume q-normal distributions as the errors in linear models. A q-normal distribution is heavy-tailed, which is defined using a power function, not the exponential function. We find that the proposed methods for linear models with q-normal errors coincide with the ordinary regularization methods that are applied to the normal linear model. The proposed methods can be computed using existing packages because they are penalized least squares methods. We examine the proposed methods using numerical experiments, showing that the methods perform well, even when the error is heavy-tailed. The numerical experiments also illustrate that our methods work well in model selection and generalization, especially when the error is slightly heavy-tailed.

Download Full-text

Analysis of a Linear Model for Non-Synchronous Vibrations Near Stall

International Journal of Turbomachinery Propulsion and Power ◽

10.3390/ijtpp6030026 ◽

2021 ◽

Vol 6 (3) ◽

pp. 26

Author(s):

Christoph Brandstetter ◽

Sina Stapelfeldt

Keyword(s):

Linear Model ◽

Linear Models ◽

Structural Vibration ◽

Experimental Investigations ◽

Vibration Modes ◽

Critical Problem ◽

Safety Critical ◽

Unstable Vibration ◽

Aero Engines ◽

Lock In

Non-synchronous vibrations arising near the stall boundary of compressors are a recurring and potentially safety-critical problem in modern aero-engines. Recent numerical and experimental investigations have shown that these vibrations are caused by the lock-in of circumferentially convected aerodynamic disturbances and structural vibration modes, and that it is possible to predict unstable vibration modes using coupled linear models. This paper aims to further investigate non-synchronous vibrations by casting a reduced model for NSV in the frequency domain and analysing stability for a range of parameters. It is shown how, and why, under certain conditions linear models are able to capture a phenomenon, which has traditionally been associated with aerodynamic non-linearities. The formulation clearly highlights the differences between convective non-synchronous vibrations and flutter and identifies the modifications necessary to make quantitative predictions.

Download Full-text

A Cascaded Unsupervised Model for PoS Tagging

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3447759 ◽

2021 ◽

Vol 20 (1) ◽

pp. 1-23

Author(s):

Necva Bölücü ◽

Burcu Can

Keyword(s):

Linear Model ◽

Language Processing ◽

Bayesian Model ◽

Linear Models ◽

Syntactic Category ◽

Semantic Parsing ◽

Pos Tagging ◽

Part Of Speech ◽

Sentence Level ◽

Log Linear

Part of speech (PoS) tagging is one of the fundamental syntactic tasks in Natural Language Processing, as it assigns a syntactic category to each word within a given sentence or context (such as noun, verb, adjective, etc.). Those syntactic categories could be used to further analyze the sentence-level syntax (e.g., dependency parsing) and thereby extract the meaning of the sentence (e.g., semantic parsing). Various methods have been proposed for learning PoS tags in an unsupervised setting without using any annotated corpora. One of the widely used methods for the tagging problem is log-linear models. Initialization of the parameters in a log-linear model is very crucial for the inference. Different initialization techniques have been used so far. In this work, we present a log-linear model for PoS tagging that uses another fully unsupervised Bayesian model to initialize the parameters of the model in a cascaded framework. Therefore, we transfer some knowledge between two different unsupervised models to leverage the PoS tagging results, where a log-linear model benefits from a Bayesian model’s expertise. We present results for Turkish as a morphologically rich language and for English as a comparably morphologically poor language in a fully unsupervised framework. The results show that our framework outperforms other unsupervised models proposed for PoS tagging.

Download Full-text

Using Mixture of Normal Distributions to Detect Treatment Effects when the Frequentist Method Fails

RAS Oncology & Therapy ◽

10.51520/2766-2586-9 ◽

2021 ◽

Vol 2 (1) ◽

Author(s):

Anthony Orlando

Keyword(s):

Dose Response ◽

Linear Models ◽

Controlled Trial ◽

Response To Treatment ◽

Double Blind ◽

Underlying Assumption ◽

Normal Distributions ◽

Efficacy Measure ◽

Baseline Weight ◽

Mixture Of Normal Distributions

Background: Results from a clinical trial can either support the efficacy and safety of a new compound or fail to provide such evidence. One reason for ‘non[1]positive’ result is due to the underlying assumption of normality and homogeneity of variances, which are quite often violated when analyzing data from clinical trials, despite randomization. A question of interest is can we obtain more informative results when using mixture of normal distributions or linear models (MLMs) in such cases. Introduction: MLM can be used when traditional methods fail. MLMs “search” within the variability in data to identify components or subgroups of individuals (also known as latent classes) who have common intercepts and common slopes of change in a variable/endpoint of interest but whose intercepts and slopes are different from other subsets of patients. Thus, MLMs can be used to identify subgroups of patients exhibiting differential response to treatment within each treatment arm. The purpose of our study was to examine the usefulness of using MLM in such circumstances. Methods: Data of 155 subjects taken from a Multicenter, randomized, double blind, placebo controlled trial that evaluated the efficacy of Cpn10, administered twice weekly subcutaneously to treat Rheumatoid Arthritis was taken to evaluate the usefulness of MLM. The primary efficacy measure ACR20 was analyzed using a 3-step process: first, MLM was used to estimate RA duration using a 3-component model. The second step took the results of the first step to inform the logistic model and its analyses. Model was fitted with an intercept, MLM components, treatment arm, RA duration (linear and quadratic), dose response (modeled as an interaction effect), age and baseline weight. LOCF was used to impute for missing data. Data was analyzed using MLM and SAS v 9.0. Results: The model was a good fit to the data with a likelihood ratio significant at p=0.026, and a significant increase in the -2log L. We also observed low p-values for those variables that were non normal. Overall and for the 75 mg dose, Cpn 10 was efficacious relative to placebo, p<0.050. We also observed that dose response was significant at p><0.15 Conclusion: The use of MLM adds value because it can be used to understand the disease experience or the value of treatment when traditional statistical methods cannot. Key words: Mixture of linear models, normality, entropy.

Download Full-text

Functional Linear Regression

10.1093/oxfordhb/9780199568444.013.2 ◽

2018 ◽

Author(s):

Hervé Cardot ◽

Pascal Sarda

Keyword(s):

Linear Regression ◽

Least Squares ◽

Linear Models ◽

Estimation Error ◽

Asymptotic Properties ◽

Principal Component Regression ◽

Principal Component ◽

Penalized Least Squares ◽

Open Problems ◽

Functional Linear Regression

This article presents a selected bibliography on functional linear regression (FLR) and highlights the key contributions from both applied and theoretical points of view. It first defines FLR in the case of a scalar response and shows how its modelization can also be extended to the case of a functional response. It then considers two kinds of estimation procedures for this slope parameter: projection-based estimators in which regularization is performed through dimension reduction, such as functional principal component regression, and penalized least squares estimators that take into account a penalized least squares minimization problem. The article proceeds by discussing the main asymptotic properties separating results on mean square prediction error and results on L2 estimation error. It also describes some related models, including generalized functional linear models and FLR on quantiles, and concludes with a complementary bibliography and some open problems.

Download Full-text

Stochastic daily precipitation model with a heavy-tailed component

Natural Hazards and Earth System Science ◽

10.5194/nhess-14-2321-2014 ◽

2014 ◽

Vol 14 (9) ◽

pp. 2321-2335 ◽

Cited By ~ 9

Author(s):

N. M. Neykov ◽

P. N. Neytchev ◽

W. Zucchini

Keyword(s):

Linear Models ◽

Daily Precipitation ◽

Daily Rainfall ◽

Binary Logistic Regression ◽

Skewed Distribution ◽

Generalized Pareto ◽

Precipitation Model ◽

Standard Models ◽

Heavy Tailed ◽

Standard Software

Abstract. Stochastic daily precipitation models are commonly used to generate scenarios of climate variability or change on a daily timescale. The standard models consist of two components describing the occurrence and intensity series, respectively. Binary logistic regression is used to fit the occurrence data, and the intensity series is modeled using a continuous-valued right-skewed distribution, such as gamma, Weibull or lognormal. The precipitation series is then modeled using the joint density, and standard software for generalized linear models can be used to perform the computations. A drawback of these precipitation models is that they do not produce a sufficiently heavy upper tail for the distribution of daily precipitation amounts; they tend to underestimate the frequency of large storms. In this study, we adapted the approach of Furrer and Katz (2008) based on hybrid distributions in order to correct for this shortcoming. In particular, we applied hybrid gamma–generalized Pareto (GP) and hybrid Weibull–GP distributions to develop a stochastic precipitation model for daily rainfall at Ihtiman in western Bulgaria. We report the results of simulations designed to compare the models based on the hybrid distributions and those based on the standard distributions. Some potential difficulties are outlined.

Download Full-text

The Normal linear model

Introduction to Hierarchical Bayesian Modeling for Ecological Data - Chapman & Hall/CRC Applied Environmental Statistics ◽

10.1201/b12501-8 ◽

2012 ◽

pp. 125-143

Keyword(s):

Linear Model ◽

Normal Linear Model

Download Full-text

Linear power-flow analysis method for AC-DC electric power networks

10.32920/ryerson.14656500 ◽

2021 ◽

Author(s):

Mohammadreza Vatani

Keyword(s):

Power Systems ◽

Linear Model ◽

Power Flow ◽

Linear Models ◽

Power Converters ◽

Power Balance ◽

Balance Equations ◽

Dc Power ◽

Phase Angles ◽

Dc Power Systems

AC-DC power systems have been operating more than sixty years. Nonlinear bus-wise power balance equations provide accurate model of AC-DC power systems. However, optimization tools for planning and operation require linear version, even if approximate, for creating tractable algorithms, considering modern elements such as DERs (distributed energy resources). Hitherto, linear models of only AC power systems are available, which coincidentally are called DC power flow. To address this drawback, linear bus-wise power balance equations are developed for AC-DC power systems and presented. As a first contribution, while AC and DC lines are represented by susceptance and conductance elements, AC-DC power converters are represented by a proposed linear relationship. As a second contribution, a three-step linear AC-DC power flow method is proposed. The first step solves the whole network considering it as a linear AC network, yielding bus phase angles at all busses. The second step computes attributes of the proposed linear model of all AC-DC power converters. The third step solves the linear model of the AC-DC system including converters, yielding bus phase angles at AC busses and voltage magnitudes at DC busses. The benefit of the proposed linear power flow model of AC-DC power system, while an approximation of the nonlinear model, enables representation of bus-wise power balance of AC-DC systems in complex planning and operational optimization formulations and hence holds the promise of phenomenal progress. The proposed linear AC-DC power systems is tested on numerous IEEE test systems and demonstrated to be fast, reliable, and consistent.

Download Full-text

Experimental validation of wind energy estimation

Thermal Science ◽

10.2298/tsci191207474z ◽

2020 ◽

Vol 24 (6 Part A) ◽

pp. 3795-3806

Author(s):

Predrag Zivkovic ◽

Mladen Tomic ◽

Vukman Bakic

Keyword(s):

Wind Speed ◽

Linear Model ◽

Linear Models ◽

Stokes Equations ◽

Momentum Conservation ◽

Electricity Production ◽

Navier Stokes ◽

Energy Estimation ◽

Navier Stokes Equations ◽

Non Linear

Wind power assessment in complex terrain is a very demanding task. Modeling wind conditions with standard linear models does not sufficiently reproduce wind conditions in complex terrains, especially on leeward sides of terrain slopes, primarily due to the vorticity. A more complex non-linear model, based on Reynolds averaged Navier-Stokes equations has been used. Turbulence was modeled by modified two-equations k-? model for neutral atmospheric boundary-layer conditions, written in general curvelinear non-orthogonal co-ordinate system. The full set of mass and momentum conservation equations as well as turbulence model equations are numerically solved, using the as CFD technique. A comparison of the application of linear model and non-linear model is presented. Considerable discrepancies of estimated wind speed have been obtained using linear and non-linear models. Statistics of annual electricity production vary up to 30% of the model site. Even anemometer measurements directly at a wind turbine?s site do not necessarily deliver the results needed for prediction calculations, as extrapolations of wind speed to hub height is tricky. The results of the simulation are compared by means of the turbine type, quality and quantity of the wind data and capacity factor. Finally, the comparison of the estimated results with the measured data at 10, 30, and 50 m is shown.

Download Full-text

A linear model for leaf area measurement to screen potential leaf material for herbal drug in Adhatoda vasica L.

Journal of Applied and Natural Science ◽

10.31018/jans.v8i1.763 ◽

2016 ◽

Vol 8 (1) ◽

pp. 140-143

Author(s):

J. V. Thaker ◽

R. P. Kuvad ◽

V. S. Thaker

Keyword(s):

Water Content ◽

Leaf Area ◽

Linear Model ◽

Plant Species ◽

Linear Models ◽

Dry Weight ◽

Nondestructive Method ◽

Area Measurement ◽

Adhatoda Vasica ◽

Linear Correlations

Leaf area is an important parameter in physiology and agronomy studies. Linear models for leaf area measurement are developed for plant species as a nondestructive method. The plant Adhatoda vasica L. (a medicinal plant) was selected and the leaves of this plant were used for development of linear model for leaf area using Leaf Area Meter (LAM) software. Planimetric parameters (length, length2, width and width2) and gravimetric (dry weight and water content) parameters are considered for the development of linear model for this plant species. Single factor ANOVA and linear correlations were worked out using these parameters and leaf area. The plant was showed significant relationship with the parameters studied. The best correlation as represented by regression coefficient (R2) was used and improved R2 is worked out. It is observed that with increase in leaf area, water content is also increased and showed best correlation with the leaf area. Thus water content can be taken as a parameter for developing linear model for leaf area is concluded.

Download Full-text

The Application of Log-Linear Model to Selected Poison Patients

ASM Science Journal ◽

10.32802/asmscj.2020.sm26(1.21) ◽

2020 ◽

pp. 1-7

Author(s):

Fatin N.S.A. ◽

Norlida M.N. ◽

Siti Z.M.J.

Keyword(s):

Linear Model ◽

Linear Models ◽

Demographic Data ◽

Contingency Tables ◽

Categorical Variables ◽

State Variables ◽

Exposure Route ◽

Higher Dimensional ◽

Route Of Exposure ◽

Log Linear

Log-linear model is a technique used to analyze the cross-classification categorical data or the contingency table. It is used to obtain the parsimony models that describe the interaction between the categorical variables in contingency tables. Log-linear models are commonly used in evaluating higher dimensional contingency tables that involves more than two categorical variables. This study focuses on analyzing data of poisoned patients from 2012 to 2014 using log-linear model. There are two model analyzed; model for demographic data of patients and model of poisoning information. For the first model, the variables involved are gender, age, race and state. Variables for the second model are circumstance of exposure, type of exposure, location of exposure, route of exposure and types of poison. Both log-linear models are developed to investigate the association between variables in the model. As a result of this study, the best model for demographic data and poisoning information are the model with three-ways interaction. For the best model of demographic data, there is an association between gender, age and race, race, gender and state as well as age, race and state. Meanwhile, the best model for poisoning information reveals that there is relationship between circumstance of exposure, route of exposure and type of poison, location of exposure, route of exposure and type of poison, circumstance of exposure, type of exposure and route of exposure, circumstance of exposure, location of exposure and route of exposure, circumstance of exposure, type of exposure and type of poison and also type of exposure, location of exposure and type of poison. Keywords: log-linear; demographic; gender; age; race; state; circumstance of exposure; type of exposure; location of exposure; route of exposure; types of poison

Download Full-text