Binary classification with covariate selection through ℓ0-penalised empirical risk minimisation

Econometrics Journal ◽

10.1093/ectj/utaa017 ◽

2020 ◽

Cited By ~ 1

Author(s):

Le-Yu Chen ◽

Sokbae Lee

Keyword(s):

Binary Classification ◽

Real Data ◽

Rates Of Convergence ◽

Mixed Integer ◽

Covariate Selection ◽

Empirical Risk ◽

Monte Carlo Experiments ◽

Data Application ◽

Work Trip ◽

Transportation Mode Choice

Summary We consider the problem of binary classification with covariate selection. We construct a classification procedure by minimising the empirical misclassification risk with a penalty on the number of selected covariates. This optimisation problem is equivalent to obtaining an ℓ0-penalised maximum score estimator. We derive probability bounds on the estimated sparsity as well as on the excess misclassification risk. These theoretical results are nonasymptotic and established in a high-dimensional setting. In particular, we show that our method yields a sparse solution whose ℓ0-norm can be arbitrarily close to true sparsity with high probability and obtain the rates of convergence for the excess misclassification risk. We implement the proposed procedure via the method of mixed-integer linear programming. Its numerical performance is illustrated in Monte Carlo experiments and a real data application of the work-trip transportation mode choice.

Download Full-text

Method of determination of the text direction on the image with the use of convolutional neural network

Informatization and communication ◽

10.34219/2078-8320-2020-11-2-96-99 ◽

2020 ◽

pp. 96-99

Author(s):

P.L. Nikolaev

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Deep Neural Network ◽

Binary Classification ◽

Synthetic Data ◽

Real Data ◽

Method Of Determination ◽

Classification Of Images

This article deals with method of binary classification of images with small text on them Classification is based on the fact that the text can have 2 directions – it can be positioned horizontally and read from left to right or it can be turned 180 degrees so the image must be rotated to read the sign. This type of text can be found on the covers of a variety of books, so in case of recognizing the covers, it is necessary first to determine the direction of the text before we will directly recognize it. The article suggests the development of a deep neural network for determination of the text position in the context of book covers recognizing. The results of training and testing of a convolutional neural network on synthetic data as well as the examples of the network functioning on the real data are presented.

Download Full-text

Goodness-of-Fit Tests for Bivariate Time Series of Counts

Econometrics ◽

10.3390/econometrics9010010 ◽

2021 ◽

Vol 9 (1) ◽

pp. 10

Author(s):

Šárka Hudecová ◽

Marie Hušková ◽

Simos G. Meintanis

Keyword(s):

Goodness Of Fit ◽

Probability Generating Function ◽

Parametric Bootstrap ◽

Real Data ◽

Data Sets ◽

Test Statistics ◽

Finite Sample ◽

Generalized Poisson ◽

Goodness Of Fit Tests ◽

Monte Carlo Experiments

This article considers goodness-of-fit tests for bivariate INAR and bivariate Poisson autoregression models. The test statistics are based on an L2-type distance between two estimators of the probability generating function of the observations: one being entirely nonparametric and the second one being semiparametric computed under the corresponding null hypothesis. The asymptotic distribution of the proposed tests statistics both under the null hypotheses as well as under alternatives is derived and consistency is proved. The case of testing bivariate generalized Poisson autoregression and extension of the methods to dimension higher than two are also discussed. The finite-sample performance of a parametric bootstrap version of the tests is illustrated via a series of Monte Carlo experiments. The article concludes with applications on real data sets and discussion.

Download Full-text

Robust Filtering Techniques for RTK Positioning in Harsh Propagation Environments

Sensors ◽

10.3390/s21041250 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1250

Author(s):

Daniel Medina ◽

Haoqing Li ◽

Jordi Vilà-Valls ◽

Pau Closas

Keyword(s):

Intelligent Transportation Systems ◽

Robust Statistics ◽

Real Data ◽

Transportation Systems ◽

Global Navigation Satellite Systems ◽

Mixed Integer ◽

Robust Filtering ◽

Precise Positioning ◽

The Impact ◽

Filtering Techniques

Global navigation satellite systems (GNSSs) play a key role in intelligent transportation systems such as autonomous driving or unmanned systems navigation. In such applications, it is fundamental to ensure a reliable precise positioning solution able to operate in harsh propagation conditions such as urban environments and under multipath and other disturbances. Exploiting carrier phase observations allows for precise positioning solutions at the complexity cost of resolving integer phase ambiguities, a procedure that is particularly affected by non-nominal conditions. This limits the applicability of conventional filtering techniques in challenging scenarios, and new robust solutions must be accounted for. This contribution deals with real-time kinematic (RTK) positioning and the design of robust filtering solutions for the associated mixed integer- and real-valued estimation problem. Families of Kalman filter (KF) approaches based on robust statistics and variational inference are explored, such as the generalized M-based KF or the variational-based KF, aiming to mitigate the impact of outliers or non-nominal measurement behaviors. The performance assessment under harsh propagation conditions is realized using a simulated scenario and real data from a measurement campaign. The proposed robust filtering solutions are shown to offer excellent resilience against outlying observations, with the variational-based KF showcasing the overall best performance in terms of Gaussian efficiency and robustness.

Download Full-text

Multivariate linear time series models

Advances in Applied Probability ◽

10.2307/1427286 ◽

1984 ◽

Vol 16 (3) ◽

pp. 492-561 ◽

Cited By ~ 107

Author(s):

E. J. Hannan ◽

L. Kavalieris

Keyword(s):

Limit Theorems ◽

Analysis Of Algorithms ◽

Linear Time ◽

Asymptotic Properties ◽

Real Data ◽

Rates Of Convergence ◽

Structure Theory ◽

Time Series Models ◽

Transfer Model ◽

Rational Transfer

This paper is in three parts. The first deals with the algebraic and topological structure of spaces of rational transfer function linear systems—ARMAX systems, as they have been called. This structure theory is dominated by the concept of a space of systems of order, or McMillan degree, n, because of the fact that this space, M(n), can be realised as a kind of high-dimensional algebraic surface of dimension n(2s + m) where s and m are the numbers of outputs and inputs. In principle, therefore, the fitting of a rational transfer model to data can be considered as the problem of determining n and then the appropriate element of M(n). However, the fact that M(n) appears to need a large number of coordinate neighbourhoods to cover it complicates the task. The problems associated with this program, as well as theory necessary for the analysis of algorithms to carry out aspects of the program, are also discussed in this first part of the paper, Sections 1 and 2.The second part, Sections 3 and 4, deals with algorithms to carry out the fitting of a model and exhibits these algorithms through simulations and the analysis of real data.The third part of the paper discusses the asymptotic properties of the algorithm. These properties depend on uniform rates of convergence being established for covariances up to some lag increasing indefinitely with the length of record, T. The necessary limit theorems and the analysis of the algorithms are given in Section 5. Many of these results are of interest independent of the algorithms being studied.

Download Full-text

Slash Truncation Positive Normal Distribution and Its Estimation Based on the EM Algorithm

Symmetry ◽

10.3390/sym13112164 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2164

Author(s):

Héctor J. Gómez ◽

Diego I. Gallardo ◽

Karol I. Santoro

Keyword(s):

Expectation Maximization Algorithm ◽

Likelihood Estimation ◽

Real Data ◽

Simulation Studies ◽

R Software ◽

Data Application ◽

The Em Algorithm ◽

Kurtosis Parameter ◽

Computational Implementation ◽

Moments Method

In this paper, we present an extension of the truncated positive normal (TPN) distribution to model positive data with a high kurtosis. The new model is defined as the quotient between two random variables: the TPN distribution (numerator) and the power of a standard uniform distribution (denominator). The resulting model has greater kurtosis than the TPN distribution. We studied some properties of the distribution, such as moments, asymmetry, and kurtosis. Parameter estimation is based on the moments method, and maximum likelihood estimation uses the expectation-maximization algorithm. We performed some simulation studies to assess the recovery parameters and illustrate the model with a real data application related to body weight. The computational implementation of this work was included in the tpn package of the R software.

Download Full-text

Highway Passenger Transport Based Express Parcel Service Network Design: Model and Algorithm

Journal of Advanced Transportation ◽

10.1155/2017/1712325 ◽

2017 ◽

Vol 2017 ◽

pp. 1-17 ◽

Cited By ~ 3

Author(s):

Yuan Jiang ◽

Baofeng Sun ◽

Gendao Li ◽

Zhibin Lin ◽

Changxu Zheng ◽

...

Keyword(s):

Network Design ◽

Network Flow ◽

Programming Model ◽

Real Data ◽

Path Selection ◽

Network Design Problem ◽

Mixed Integer ◽

Passenger Transport ◽

Service Network ◽

Time Space

Highway passenger transport based express parcel service (HPTB-EPS) is an emerging business that uses unutilised room of coach trunk to ship parcels between major cities. While it is reaping more and more express market, the managers are facing difficult decisions to design the service network. This paper investigates the HPTB-EPS network design problem and analyses the time-space characteristics of such network. A mixed-integer programming model is formulated integrating the service decision, frequency, and network flow distribution. To solve the model, a decomposition-based heuristic algorithm is designed by decomposing the problem as three steps: construction of service network, service path selection, and distribution of network flow. Numerical experiment using real data from our partner company demonstrates the effectiveness of our model and algorithm. We found that our solution could reduce the total cost by up to 16.3% compared to the carrier’s solution. The sensitivity analysis demonstrates the robustness and flexibility of the solutions of the model.

Download Full-text

3D Diffraction Imaging and Diffraction Attributes Based on Asymmetric Summation: Real Data Application

10.3997/2214-4609.202010612 ◽

2020 ◽

Author(s):

M. Protasov ◽

V. Tcheverda ◽

V. Shilikov ◽

A. Ledyaev

Keyword(s):

Real Data ◽

Data Application ◽

Diffraction Imaging

Download Full-text

Causal variance decompositions for institutional comparisons in healthcare

Statistical Methods in Medical Research ◽

10.1177/0962280219880571 ◽

2019 ◽

Vol 29 (7) ◽

pp. 1972-1986

Author(s):

Bo Chen ◽

Keith A Lawson ◽

Antonio Finelli ◽

Olli Saarela

Keyword(s):

Random Effect ◽

Real Data ◽

Hospital Performance ◽

Case Mix ◽

Link Functions ◽

Variation In Care ◽

Residual Variation ◽

Data Application ◽

Variance Decompositions ◽

Random Effect Models

There is increasing interest in comparing institutions delivering healthcare in terms of disease-specific quality indicators (QIs) that capture processes or outcomes showing variations in the care provided. Such comparisons can be framed in terms of causal models, where adjusting for patient case-mix is analogous to controlling for confounding, and exposure is being treated in a given hospital, for instance. Our goal here is to help identify good QIs rather than comparing hospitals in terms of an already chosen QI, and so we focus on the presence and magnitude of overall variation in care between the hospitals rather than the pairwise differences between any two hospitals. We consider how the observed variation in care received at patient level can be decomposed into that causally explained by the hospital performance adjusting for the case-mix, the case-mix itself, and residual variation. For this purpose, we derive a three-way variance decomposition, with particular attention to its causal interpretation in terms of potential outcome variables. We propose model-based estimators for the decomposition, accommodating different link functions and either fixed or random effect models. We evaluate their performance in a simulation study and demonstrate their use in a real data application.

Download Full-text

On the Location of Fog Nodes in Fog-Cloud Infrastructures

Sensors ◽

10.3390/s19112445 ◽

2019 ◽

Vol 19 (11) ◽

pp. 2445 ◽

Cited By ~ 8

Author(s):

Rodrigo A. C. da Silva ◽

Nelson L. S. da Fonseca

Keyword(s):

Fog Computing ◽

Real Data ◽

Mixed Integer ◽

End User ◽

Computing Paradigm ◽

Linear Programming Formulation ◽

Cloud Infrastructures ◽

New Applications ◽

Integer Linear Programming Formulation

In the fog computing paradigm, fog nodes are placed on the network edge to meet end-user demands with low latency, providing the possibility of new applications. Although the role of the cloud remains unchanged, a new network infrastructure for fog nodes must be created. The design of such an infrastructure must consider user mobility, which causes variations in workload demand over time in different regions. Properly deciding on the location of fog nodes is important to reduce the costs associated with their deployment and maintenance. To meet these demands, this paper discusses the problem of locating fog nodes and proposes a solution which considers time-varying demands, with two classes of workload in terms of latency. The solution was modeled as a mixed-integer linear programming formulation with multiple criteria. An evaluation with real data showed that an improvement in end-user service can be obtained in conjunction with the minimization of the costs by deploying fewer servers in the infrastructure. Furthermore, results show that costs can be further reduced if a limited blocking of requests is tolerated.

Download Full-text

Branch and Price for Chance-Constrained Bin Packing

INFORMS Journal on Computing ◽

10.1287/ijoc.2019.0894 ◽

2020 ◽

Vol 32 (3) ◽

pp. 547-564

Author(s):

Zheng Zhang ◽

Brian T. Denton ◽

Xiaolan Xie

Keyword(s):

Bin Packing ◽

Real Data ◽

Integer Program ◽

Chance Constraints ◽

Branch And Cut ◽

Mixed Integer ◽

Closed Form Expression ◽

Mixed Integer Program ◽

Branch And Price ◽

Item Size

This article describes two versions of the chance-constrained stochastic bin-packing (CCSBP) problem that consider item-to-bin allocation decisions in the context of chance constraints on the total item size within the bins. The first version is a stochastic CCSBP (SP-CCSBP) problem, which assumes that the distributions of item sizes are known. We present a two-stage stochastic mixed-integer program (SMIP) for this problem and a Dantzig–Wolfe formulation suited to a branch-and-price (B&P) algorithm. We further enhance the formulation using coefficient strengthening and reformulations based on probabilistic packs and covers. The second version is a distributionally robust CCSBP (DR-CCSBP) problem, which assumes that the distributions of item sizes are ambiguous. Based on a closed-form expression for the DR chance constraints, we approximate the DR-CCSBP problem as a mixed-integer program that has significantly fewer integer variables than the SMIP of the SP-CCSBP problem, and our proposed B&P algorithm can directly solve its Dantzig–Wolfe formulation. We also show that the approach for the DR-CCSBP problem, in addition to providing robust solutions, can obtain near-optimal solutions to the SP-CCSBP problem. We implement a series of numerical experiments based on real data in the context of surgery scheduling, and the results demonstrate that our proposed B&P algorithm is computationally more efficient than a standard branch-and-cut algorithm, and it significantly improves upon the performance of a well-known bin-packing heuristic.

Download Full-text