RiskLogitboost Regression for Rare Events in Binary Response: An Econometric Approach

Jessica Pesantez-Narvaez; Montserrat Guillen; Manuela Alcañiz

doi:10.3390/math9050579

RiskLogitboost Regression for Rare Events in Binary Response: An Econometric Approach

Mathematics ◽

10.3390/math9050579 ◽

2021 ◽

Vol 9 (5) ◽

pp. 579

Author(s):

Jessica Pesantez-Narvaez ◽

Montserrat Guillen ◽

Manuela Alcañiz

Keyword(s):

Prediction Error ◽

Rare Events ◽

Learning Algorithm ◽

Rare Event ◽

Generalized Least Squares ◽

Third Party ◽

Binary Response ◽

Data Set ◽

Rare Class ◽

Econometric Approach

A boosting-based machine learning algorithm is presented to model a binary response with large imbalance, i.e., a rare event. The new method (i) reduces the prediction error of the rare class, and (ii) approximates an econometric model that allows interpretability. RiskLogitboost regression includes a weighting mechanism that oversamples or undersamples observations according to their misclassification likelihood and a generalized least squares bias correction strategy to reduce the prediction error. An illustration using a real French third-party liability motor insurance data set is presented. The results show that RiskLogitboost regression improves the rate of detection of rare events compared to some boosting-based and tree-based algorithms and some existing methods designed to treat imbalanced responses.

Download Full-text

Visualization of Predictive Modeling for Big Data Using Various Approaches When There Are Rare Events at Differing Levels

Advances in Data Mining and Database Management - Handbook of Research on Big Data Storage and Visualization Techniques ◽

10.4018/978-1-5225-3142-5.ch021 ◽

2018 ◽

pp. 604-631 ◽

Cited By ~ 1

Author(s):

Alan Olinsky ◽

John Thomas Quinn ◽

Phyllis A. Schumacher

Keyword(s):

Predictive Modeling ◽

Rare Events ◽

Rare Event ◽

Sampling Technique ◽

Large Data ◽

Occurrence Rate ◽

Data Sets ◽

Target Variable ◽

Data Set ◽

Home Mortgage

Many techniques exist for predictive modeling of a bivariate target variable in large data sets. When the target variable represents a rare event with an occurrence in the data set of approximately 10% or less, traditional modeling techniques may fail to identify the rare events. In this chapter, different methods, including oversampling of rare events, undersampling of common events and the Synthetic Minority Over-Sampling Technique are used to improve the prediction outcomes of rare events. The predictive models of decision trees, logistic regression and rule induction are applied with SAS Enterprise Miner (EM) to the revised data. Using a data set of home mortgage applications, misclassification percentages of a target variable with a rare event occurrence rate of 0.8% are obtained by running a multiple comparison node. The percentage is varied from 0.8% up to 50% and the results are compared to see which predictive method worked the best.

Download Full-text

Framework for rare event detection using Artificial Neural Network based context free grammar

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189164 ◽

2020 ◽

Vol 39 (6) ◽

pp. 8463-8475

Author(s):

Palanivel Srinivasan ◽

Manivannan Doraipandian

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Event Detection ◽

Performance Metrics ◽

Rare Events ◽

Rare Event ◽

Video Stream ◽

Context Free Grammar ◽

Artificial Neural ◽

Context Free

Rare event detections are performed using spatial domain and frequency domain-based procedures. Omnipresent surveillance camera footages are increasing exponentially due course the time. Monitoring all the events manually is an insignificant and more time-consuming process. Therefore, an automated rare event detection contrivance is required to make this process manageable. In this work, a Context-Free Grammar (CFG) is developed for detecting rare events from a video stream and Artificial Neural Network (ANN) is used to train CFG. A set of dedicated algorithms are used to perform frame split process, edge detection, background subtraction and convert the processed data into CFG. The developed CFG is converted into nodes and edges to form a graph. The graph is given to the input layer of an ANN to classify normal and rare event classes. Graph derived from CFG using input video stream is used to train ANN Further the performance of developed Artificial Neural Network Based Context-Free Grammar – Rare Event Detection (ACFG-RED) is compared with other existing techniques and performance metrics such as accuracy, precision, sensitivity, recall, average processing time and average processing power are used for performance estimation and analyzed. Better performance metrics values have been observed for the ANN-CFG model compared with other techniques. The developed model will provide a better solution in detecting rare events using video streams.

Download Full-text

How to Assess Prognostic Models for Survival Data: A Case Study in Oncology

Methods of Information in Medicine ◽

10.1055/s-0038-1634384 ◽

2003 ◽

Vol 42 (05) ◽

pp. 564-571 ◽

Cited By ~ 23

Author(s):

M. Schumacher ◽

E. Graf ◽

T. Gerds

Keyword(s):

Test Data ◽

Survival Data ◽

Prediction Error ◽

Classification Scheme ◽

Neural Nets ◽

Brier Score ◽

Data Set ◽

Independent Test ◽

Artificial Neural

Summary Objectives: A lack of generally applicable tools for the assessment of predictions for survival data has to be recognized. Prediction error curves based on the Brier score that have been suggested as a sensible approach are illustrated by means of a case study. Methods: The concept of predictions made in terms of conditional survival probabilities given the patient’s covariates is introduced. Such predictions are derived from various statistical models for survival data including artificial neural networks. The idea of how the prediction error of a prognostic classification scheme can be followed over time is illustrated with the data of two studies on the prognosis of node positive breast cancer patients, one of them serving as an independent test data set. Results and Conclusions: The Brier score as a function of time is shown to be a valuable tool for assessing the predictive performance of prognostic classification schemes for survival data incorporating censored observations. Comparison with the prediction based on the pooled Kaplan Meier estimator yields a benchmark value for any classification scheme incorporating patient’s covariate measurements. The problem of an overoptimistic assessment of prediction error caused by data-driven modelling as it is, for example, done with artificial neural nets can be circumvented by an assessment in an independent test data set.

Download Full-text

Information-Theoretic Generalization Bounds for Meta-Learning and Applications

Entropy ◽

10.3390/e23010126 ◽

2021 ◽

Vol 23 (1) ◽

pp. 126

Author(s):

Sharu Theresa Jose ◽

Osvaldo Simeone

Keyword(s):

Learning Algorithm ◽

Broad Class ◽

Performance Measure ◽

Training Data ◽

Learning To Learn ◽

Data Set ◽

Information Theoretic ◽

Meta Learning ◽

Task Training ◽

Test Sets

Meta-learning, or “learning to learn”, refers to techniques that infer an inductive bias from data corresponding to multiple related tasks with the goal of improving the sample efficiency for new, previously unobserved, tasks. A key performance measure for meta-learning is the meta-generalization gap, that is, the difference between the average loss measured on the meta-training data and on a new, randomly selected task. This paper presents novel information-theoretic upper bounds on the meta-generalization gap. Two broad classes of meta-learning algorithms are considered that use either separate within-task training and test sets, like model agnostic meta-learning (MAML), or joint within-task training and test sets, like reptile. Extending the existing work for conventional learning, an upper bound on the meta-generalization gap is derived for the former class that depends on the mutual information (MI) between the output of the meta-learning algorithm and its input meta-training data. For the latter, the derived bound includes an additional MI between the output of the per-task learning procedure and corresponding data set to capture within-task uncertainty. Tighter bounds are then developed for the two classes via novel individual task MI (ITMI) bounds. Applications of the derived bounds are finally discussed, including a broad class of noisy iterative algorithms for meta-learning.

Download Full-text

Interspecies Chromosome Mapping in Caprimulgiformes, Piciformes, Suliformes, and Trogoniformes (Aves): Cytogenomic Insight into Microchromosome Organization and Karyotype Evolution in Birds

Cells ◽

10.3390/cells10040826 ◽

2021 ◽

Vol 10 (4) ◽

pp. 826

Author(s):

Rafael Kretschmer ◽

Marcelo Santos de Souza ◽

Ivanete de Oliveira Furo ◽

Michael N. Romanov ◽

Ricardo José Gunski ◽

...

Keyword(s):

Convergent Evolution ◽

Karyotype Evolution ◽

Rare Events ◽

Rare Event ◽

Chromosome Mapping ◽

The Other ◽

Other Hand ◽

Interchromosomal Rearrangement ◽

Insight Into ◽

Phalacrocorax Brasilianus

Interchromosomal rearrangements involving microchromosomes are rare events in birds. To date, they have been found mostly in Psittaciformes, Falconiformes, and Cuculiformes, although only a few orders have been analyzed. Hence, cytogenomic studies focusing on microchromosomes in species belonging to different bird orders are essential to shed more light on the avian chromosome and karyotype evolution. Based on this, we performed a comparative chromosome mapping for chicken microchromosomes 10 to 28 using interspecies BAC-based FISH hybridization in five species, representing four Neoaves orders (Caprimulgiformes, Piciformes, Suliformes, and Trogoniformes). Our results suggest that the ancestral microchromosomal syntenies are conserved in Pteroglossus inscriptus (Piciformes), Ramphastos tucanus tucanus (Piciformes), and Trogon surrucura surrucura (Trogoniformes). On the other hand, chromosome reorganization in Phalacrocorax brasilianus (Suliformes) and Hydropsalis torquata (Caprimulgiformes) included fusions involving both macro- and microchromosomes. Fissions in macrochromosomes were observed in P. brasilianus and H. torquata. Relevant hypothetical Neognathae and Neoaves ancestral karyotypes were reconstructed to trace these rearrangements. We found no interchromosomal rearrangement involving microchromosomes to be shared between avian orders where rearrangements were detected. Our findings suggest that convergent evolution involving microchromosomal change is a rare event in birds and may be appropriate in cytotaxonomic inferences in orders where these rearrangements occurred.

Download Full-text

Event detection of different English data sources based on transfer learning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189798 ◽

2021 ◽

pp. 1-11

Author(s):

Yanan Huang ◽

Yuji Miao ◽

Zhenjing Da

Keyword(s):

Transfer Learning ◽

Event Detection ◽

Visual Analysis ◽

Learning Algorithm ◽

Data Sources ◽

Data Set ◽

Data Source ◽

Single Data Source ◽

The Difference ◽

Single Data

The methods of multi-modal English event detection under a single data source and isomorphic event detection of different English data sources based on transfer learning still need to be improved. In order to improve the efficiency of English and data source time detection, based on the transfer learning algorithm, this paper proposes multi-modal event detection under a single data source and isomorphic event detection based on transfer learning for different data sources. Moreover, by stacking multiple classification models, this paper makes each feature merge with each other, and conducts confrontation training through the difference between the two classifiers to further make the distribution of different source data similar. In addition, in order to verify the algorithm proposed in this paper, a multi-source English event detection data set is collected through a data collection method. Finally, this paper uses the data set to verify the method proposed in this paper and compare it with the current most mainstream transfer learning methods. Through experimental analysis, convergence analysis, visual analysis and parameter evaluation, the effectiveness of the algorithm proposed in this paper is demonstrated.

Download Full-text

The Covid-19 lockdown in the United Kingdom and subjective well-being: Have the self-employed suffered more due to hours and income reductions?

International Small Business Journal Researching Entrepreneurship ◽

10.1177/0266242620986763 ◽

2021 ◽

pp. 026624262098676 ◽

Cited By ~ 1

Author(s):

Wei Yue ◽

Marc Cowling

Keyword(s):

United Kingdom ◽

Rare Events ◽

Well Being ◽

The Self ◽

Subjective Well Being ◽

Data Set ◽

The United Kingdom ◽

Hours Of Work ◽

Period Data ◽

The Uk

It is well documented that the self-employed experience higher levels of happiness than waged employees even when their incomes are lower. Given the UK government’s asymmetric treatment of waged workers and the self-employed, we use a unique Covid-19 period data set which covers the months leading up to the March lockdown and the months just after to assess three aspects of the Covid-19 crisis on the self-employed: hours of work reductions, the associated income reductions and the effects of both on subjective well-being. Our findings show the large and disproportionate reductions in hours and income for the self-employed directly contributed to a deterioration in their levels of subjective well-being compared to waged workers. It appears that their resilience was broken when faced with the reality of dealing with rare events, particularly when the UK welfare support response was asymmetric and favouring waged employees.

Download Full-text

Research on the third party logistics mode of cross border e-commerce based on machine learning algorithm

10.1145/3482632.3482716 ◽

2021 ◽

Author(s):

Xiaojiao Zeng ◽

Wei Wang

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Third Party ◽

Machine Learning Algorithm ◽

Third Party Logistics ◽

The Third Party Logistics ◽

The Third ◽

Cross Border ◽

Logistics Mode ◽

The Third Party

Download Full-text

Real-time gaze estimation via pupil center tracking

Paladyn Journal of Behavioral Robotics ◽

10.1515/pjbr-2018-0002 ◽

2018 ◽

Vol 9 (1) ◽

pp. 6-18 ◽

Cited By ~ 2

Author(s):

Dario Cazzato ◽

Fabio Dominio ◽

Roberto Manduchi ◽

Silvia M. Castro

Keyword(s):

Real Time ◽

Learning Algorithm ◽

Natural Environments ◽

Gaze Estimation ◽

Head Pose ◽

Data Set ◽

Gaze Tracking ◽

Illumination Changes ◽

Wide Range ◽

Estimation System

Abstract Automatic gaze estimation not based on commercial and expensive eye tracking hardware solutions can enable several applications in the fields of human computer interaction (HCI) and human behavior analysis. It is therefore not surprising that several related techniques and methods have been investigated in recent years. However, very few camera-based systems proposed in the literature are both real-time and robust. In this work, we propose a real-time user-calibration-free gaze estimation system that does not need person-dependent calibration, can deal with illumination changes and head pose variations, and can work with a wide range of distances from the camera. Our solution is based on a 3-D appearance-based method that processes the images from a built-in laptop camera. Real-time performance is obtained by combining head pose information with geometrical eye features to train a machine learning algorithm. Our method has been validated on a data set of images of users in natural environments, and shows promising results. The possibility of a real-time implementation, combined with the good quality of gaze tracking, make this system suitable for various HCI applications.

Download Full-text

ASSESSING OPENSTREETMAP URBAN NETWORK OF ORAN CITY

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-w8-249-2019 ◽

2019 ◽

Vol XLII-3/W8 ◽

pp. 249-252

Author(s):

B. Meguenni ◽

M. A. Hafid

Keyword(s):

Road Network ◽

Third Party ◽

Spatial Accuracy ◽

Data Set ◽

Urban Network ◽

Urban Networks ◽

Spatial Quality ◽

Gis Environment ◽

The City

Abstract. OpenStreetMap (OSM) uses the Open Database License, it is a collaborative project that collects a rich set of vector data provided by volunteers. It is a global collection of mapping data that can be used for a wide variety of purposes. Many third-party online maps are based on OpenStreetMap data. Currently, more and more large organizations are choosing OSM for their maps. In addition, the analysis of the spatial quality of the OSM data shows that particular care must be taken. However, there are several methods for assessing the quality of the OSM data by comparing the OSM to an authoritative dataset. In this context, it is essential to develop an automatic procedure to improve its spatial quality. This work proposes a quantitative method for comparing the quality of the OSM and an authoritative data set on urban networks in the city of Oran (Algeria). The procedure is based on python modules in a GIS environment and provides measurements of the spatial accuracy and completeness of the OSM road network. The method is applied to assess the quality of the Oran OSM road network data set through a comparison with the official Algerian dataset. The results show that the OSM's Algerian road network is very complete, but with low spatial accuracy.

Download Full-text