Cost-Sensitive Variable Selection for Multi-Class Imbalanced Datasets Using Bayesian Networks

Darío Ramos-López; Ana D. Maldonado

doi:10.3390/math9020156

Cost-Sensitive Variable Selection for Multi-Class Imbalanced Datasets Using Bayesian Networks

Mathematics ◽

10.3390/math9020156 ◽

2021 ◽

Vol 9 (2) ◽

pp. 156

Author(s):

Darío Ramos-López ◽

Ana D. Maldonado

Keyword(s):

Variable Selection ◽

A Priori ◽

Imbalanced Data ◽

Selection Procedure ◽

Fine Tuning ◽

Imbalanced Datasets ◽

Validation Metrics ◽

Classification Errors ◽

The Impact ◽

Sensitive Variable

Multi-class classification in imbalanced datasets is a challenging problem. In these cases, common validation metrics (such as accuracy or recall) are often not suitable. In many of these problems, often real-world problems related to health, some classification errors may be tolerated, whereas others are to be avoided completely. Therefore, a cost-sensitive variable selection procedure for building a Bayesian network classifier is proposed. In it, a flexible validation metric (cost/loss function) encoding the impact of the different classification errors is employed. Thus, the model is learned to optimize the a priori specified cost function. The proposed approach was applied to forecasting an air quality index using current levels of air pollutants and climatic variables from a highly imbalanced dataset. For this problem, the method yielded better results than other standard validation metrics in the less frequent class states. The possibility of fine-tuning the objective validation function can improve the prediction quality in imbalanced data or when asymmetric misclassification costs have to be considered.

Download Full-text

A Comparative Study of Data and Physically Based Gas Turbine Modeling for Long-Term Monitoring Scenarios: Part II — Emission Prediction Utilizing Different Levels of Design Information

Volume 6: Ceramics; Controls, Diagnostics, and Instrumentation; Education; Manufacturing Materials and Metallurgy ◽

10.1115/gt2018-76650 ◽

2018 ◽

Author(s):

Moritz Lipperheide ◽

Thomas Bexten ◽

Manfred Wirsum ◽

Martin Gassner ◽

Stefano Bernero

Keyword(s):

Gas Turbine ◽

A Priori ◽

Model Development ◽

Fine Tuning ◽

Plant Operator ◽

Engine Model ◽

Physically Based ◽

Design Type ◽

The Impact

Reliable engine and emission models allow for an online monitoring of commercial gas turbine operation and help the plant operator and the original equipment manufacturer (OEM) to ensure emission compliance of the aging engine. However, model development and validation require fine-tuning on the particular engines, which may differ in a fleet of a single design type by production, assembly and aging status. For this purpose, Artificial Neural Networks (ANN) offer a good and fast alternative to traditional physically-based engine modeling, because the model creation and adaption is merely an automatized process in commercially available software environments. However, ANN performance depends strongly on the availability of suitable data and a-priori data processing. The present work investigates the impact of specific engine information from the OEM’s design tools on ANN performance. As an alternative to a strictly data-based benchmark approach, engine characteristics were incorporated into ANNs by a pre-processing of the raw measurements with a simplified engine model. The resulting ‘virtual’ measurements, i.e. hot gas temperatures, then served as inputs to ANN training and application during long-term gas turbine operation. When processed input parameters were used for ANNs, overall long-term NOx prediction improved by 55%, and CO prediction by 16% in terms of RMSE, yielding comparable overall RMSE values to the physically-based model.

Download Full-text

Evolutionary Undersampling for Classification with Imbalanced Datasets: Proposals and Taxonomy

Evolutionary Computation ◽

10.1162/evco.2009.17.3.275 ◽

2009 ◽

Vol 17 (3) ◽

pp. 275-306 ◽

Cited By ~ 194

Author(s):

Salvador García ◽

Francisco Herrera

Keyword(s):

Fitness Function ◽

Imbalanced Data ◽

Selection Procedure ◽

Prototype Selection ◽

Imbalanced Datasets ◽

Classification Rate ◽

Minority Class ◽

Good Trade ◽

And Performance ◽

Nonparametric Statistical Procedures

Learning with imbalanced data is one of the recent challenges in machine learning. Various solutions have been proposed in order to find a treatment for this problem, such as modifying methods or the application of a preprocessing stage. Within the preprocessing focused on balancing data, two tendencies exist: reduce the set of examples (undersampling) or replicate minority class examples (oversampling). Undersampling with imbalanced datasets could be considered as a prototype selection procedure with the purpose of balancing datasets to achieve a high classification rate, avoiding the bias toward majority class examples. Evolutionary algorithms have been used for classical prototype selection showing good results, where the fitness function is associated to the classification and reduction rates. In this paper, we propose a set of methods called evolutionary undersampling that take into consideration the nature of the problem and use different fitness functions for getting a good trade-off between balance of distribution of classes and performance. The study includes a taxonomy of the approaches and an overall comparison among our models and state of the art undersampling methods. The results have been contrasted by using nonparametric statistical procedures and show that evolutionary undersampling outperforms the nonevolutionary models when the degree of imbalance is increased.

Download Full-text

Food Safety Security: A new Concept for Enhancing Food Safety Measures

International Journal for Vitamin and Nutrition Research ◽

10.1024/0300-9831/a000114 ◽

2012 ◽

Vol 82 (3) ◽

pp. 216-222 ◽

Cited By ~ 4

Author(s):

Venkatesh Iyengar ◽

Ibrahim Elmadfa

Keyword(s):

Food Safety ◽

Hazard Analysis ◽

Warning System ◽

Shared Responsibility ◽

Fine Tuning ◽

Strategic Action ◽

Surveillance Network ◽

Measurement Systems ◽

Critical Control Points ◽

The Impact

The food safety security (FSS) concept is perceived as an early warning system for minimizing food safety (FS) breaches, and it functions in conjunction with existing FS measures. Essentially, the function of FS and FSS measures can be visualized in two parts: (i) the FS preventive measures as actions taken at the stem level, and (ii) the FSS interventions as actions taken at the root level, to enhance the impact of the implemented safety steps. In practice, along with FS, FSS also draws its support from (i) legislative directives and regulatory measures for enforcing verifiable, timely, and effective compliance; (ii) measurement systems in place for sustained quality assurance; and (iii) shared responsibility to ensure cohesion among all the stakeholders namely, policy makers, regulators, food producers, processors and distributors, and consumers. However, the functional framework of FSS differs from that of FS by way of: (i) retooling the vulnerable segments of the preventive features of existing FS measures; (ii) fine-tuning response systems to efficiently preempt the FS breaches; (iii) building a long-term nutrient and toxicant surveillance network based on validated measurement systems functioning in real time; (iv) focusing on crisp, clear, and correct communication that resonates among all the stakeholders; and (v) developing inter-disciplinary human resources to meet ever-increasing FS challenges. Important determinants of FSS include: (i) strengthening international dialogue for refining regulatory reforms and addressing emerging risks; (ii) developing innovative and strategic action points for intervention {in addition to Hazard Analysis and Critical Control Points (HACCP) procedures]; and (iii) introducing additional science-based tools such as metrology-based measurement systems.

Download Full-text

The Impact of Competitive Quasi-Market on Service Delivery in Benue State University, Makurdi Nigeria

GIS Business ◽

10.26643/gis.v14i4.5120 ◽

2019 ◽

Vol 14 (4) ◽

pp. 85-98

Author(s):

Idoko Peter

Keyword(s):

Service Delivery ◽

A Priori ◽

Least Square ◽

Secondary Source ◽

State University ◽

Ordinary Least Square ◽

Negative Effect ◽

The Impact ◽

The Relationship

This research the impact of competitive quasi market on service delivery in Benue State University, Makurdi Nigeria. Both primary and secondary source of data and information were used for the study and questionnaire was used to extract information from the purposively selected respondents. The population for this study is one hundred and seventy three (173) administrative staff of Benue State University selected at random. The statistical tools employed was the classical ordinary least square (OLS) and the probability value of the estimates was used to tests hypotheses of the study. The result of the study indicates that a positive relationship exist between Competitive quasi marketing in Benue State University, Makurdi Nigeria (CQM) and Transparency in the service delivery (TRSP) and the relationship is statistically significant (p<0.05). Competitive quasi marketing (CQM) has a negative effect on Observe Competence in Benue State University, Makurdi Nigeria (OBCP) and the relationship is not statistically significant (p>0.05). Competitive quasi marketing (CQM) has a positive effect on Innovation in Benue State University, Makurdi Nigeria (INVO) and the relationship is statistically significant (p<0.05) and in line with a priori expectation. This means that a unit increases in Competitive quasi marketing (CQM) will result to a corresponding increase in innovation in Benue State University, Makurdi Nigeria (INVO) by a margin of 22.5%. It was concluded that government monopoly in the provision of certain types of services has greatly affected the quality of service experience in the institution. It was recommended among others that the stakeholders in the market has to be transparent so that the system will be productive to serve the society effectively

Download Full-text

When Do Psychopathic Traits Affect Cooperative Behavior?: An Iterated Prisoner's Dilemma Experimental Study

10.26686/wgtn.12431003 ◽

2020 ◽

Author(s):

M Testori ◽

M Kempf ◽

RB Hoyle ◽

Hedwig Eisenbarth

Keyword(s):

Prisoner's Dilemma ◽

Cooperative Behavior ◽

Prisoner’S Dilemma ◽

Selection Procedure ◽

Psychopathic Traits ◽

Strong Impact ◽

Iterated Prisoner's Dilemma ◽

Human Decision ◽

The Impact ◽

Iterated Prisoner’S Dilemma

© 2019 Hogrefe Publishing. Personality traits have been long recognized to have a strong impact on human decision-making. In this study, a sample of 314 participants took part in an online game to investigate the impact of psychopathic traits on cooperative behavior in an iterated Prisoner's dilemma game. We found that disinhibition decreased the maintenance of cooperation in successive plays, but had no effect on moving toward cooperation after a previous defection or on the overall level of cooperation over rounds. Furthermore, our results underline the crucial importance of a good model selection procedure, showing how a poor choice of statistical model can provide misleading results.

Download Full-text

Frame me if you must: PrEP framing and the impact on adherence to HIV Pre-exposure Prophylaxis

Open Forum Infectious Diseases ◽

10.1093/ofid/ofx163.1113 ◽

2017 ◽

Vol 4 (suppl_1) ◽

pp. S439-S439

Author(s):

Eric Ellorin ◽

Jill Blumenthal ◽

Sonia Jain ◽

Xiaoying Sun ◽

Katya Corado ◽

...

Keyword(s):

A Priori ◽

Social Groups ◽

Continuous Measure ◽

Gay Community ◽

Randomized Controlled ◽

Exposure Prophylaxis ◽

The Impact ◽

Actual Prevalence ◽

Tenofovir Diphosphate

Abstract Background “PrEP whore” has been used both as a pejorative by PrEP opponents in the gay community and, reactively, by PrEP advocates as a method to reclaim the label from stigmatization and “slut-shaming.” The actual prevalence and impact of such PrEP-directed stigma on adherence have been insufficiently studied. Methods CCTG 595 was a randomized controlled PrEP demonstration project in 398 HIV-uninfected MSM and transwomen. Intracellular tenofovir-diphosphate (TFV-DP) levels at weeks 12 and 48 were used as a continuous measure of adherence. At study visits, participants were asked to describe how they perceived others’ reactions to them being on PrEP. These perceptions were categorized a priori as either “positively framed,” “negatively framed,” or both. We used Wilcoxon rank-sum to determine the association between positive and negative framing and TFV-DP levels at weeks 12 and 48. Results By week 4, 29% of participants reported perceiving positive reactions from members of their social groups, 5% negative, and 6% both. Reporting decreased over 48 weeks, but positive reactions were consistently reported more than negative. At week 12, no differences in mean TFV-DP levels were observed in participants with positively-framed reactions compared with those reporting no outcome or only negatively-framed (1338 [IQR, 1036-1609] vs. 1281 [946-1489] fmol/punch, P = 0.17). Additionally, no differences were observed in those with negative reactions vs. those without (1209 [977–1427] vs. 1303 [964–1545], P = 0.58). At week 48, mean TFV-DP levels trended toward being higher among those that report any reaction, regardless if positive (1335 [909–1665] vs. 1179 [841–1455], P = 0.09) or negative (1377 [1054–1603] vs. 1192 [838–1486], P = 0.10) than those reporting no reaction. At week 48, 46% of participants reported experiencing some form of PrEP-directed judgment, 23% reported being called “PrEP whore,” and 21% avoiding disclosing PrEP use. Conclusion Over 48 weeks, nearly half of participants reported some form of judgment or stigmatization as a consequence of PrEP use. However, individuals more frequently perceived positively framed reactions to being on PrEP than negative. Importantly, long-term PrEP adherence does not appear to suffer as a result of negative PrEP framing. Disclosures All authors: No reported disclosures.

Download Full-text

Using Unstated Cases to Correct for COVID-19 Pandemic Outbreak and Its Impact on Easing the Intervention for Qatar

Biology ◽

10.3390/biology10060463 ◽

2021 ◽

Vol 10 (6) ◽

pp. 463

Author(s):

Narjiss Sallahi ◽

Heesoo Park ◽

Fedwa El Mellouhi ◽

Mustapha Rachdi ◽

Idir Ouassou ◽

...

Keyword(s):

Public Health ◽

A Priori ◽

Epidemiological Models ◽

Sir Epidemic Model ◽

Gulf Region ◽

Epidemiological Modeling ◽

Public Health Policies ◽

Sir Epidemic ◽

The Impact ◽

Case Data

Epidemiological Modeling supports the evaluation of various disease management activities. The value of epidemiological models lies in their ability to study various scenarios and to provide governments with a priori knowledge of the consequence of disease incursions and the impact of preventive strategies. A prevalent method of modeling the spread of pandemics is to categorize individuals in the population as belonging to one of several distinct compartments, which represents their health status with regard to the pandemic. In this work, a modified SIR epidemic model is proposed and analyzed with respect to the identification of its parameters and initial values based on stated or recorded case data from public health sources to estimate the unreported cases and the effectiveness of public health policies such as social distancing in slowing the spread of the epidemic. The analysis aims to highlight the importance of unreported cases for correcting the underestimated basic reproduction number. In many epidemic outbreaks, the number of reported infections is likely much lower than the actual number of infections which can be calculated from the model’s parameters derived from reported case data. The analysis is applied to the COVID-19 pandemic for several countries in the Gulf region and Europe.

Download Full-text

News and Idiosyncratic Volatility: The Public Information Processing Hypothesis*

Journal of Financial Econometrics ◽

10.1093/jjfinec/nbaa038 ◽

2020 ◽

Author(s):

Robert F Engle ◽

Martin Klint Hansen ◽

Ahmet K Karagozoglu ◽

Asger Lunde

Keyword(s):

Information Processing ◽

Public Information ◽

Selection Procedure ◽

Autoregressive Process ◽

Return Volatility ◽

Market Outcomes ◽

The Public ◽

Additional Information ◽

Financial News ◽

The Impact

Abstract Motivated by the recent availability of extensive electronic news databases and advent of new empirical methods, there has been renewed interest in investigating the impact of financial news on market outcomes for individual stocks. We develop the information processing hypothesis of return volatility to investigate the relation between firm-specific news and volatility. We propose a novel dynamic econometric specification and test it using time series regressions employing a machine learning model selection procedure. Our empirical results are based on a comprehensive dataset comprised of more than 3 million news items for a sample of 28 large U.S. companies. Our proposed econometric specification for firm-specific return volatility is a simple mixture model with two components: public information and private processing of public information. The public information processing component is defined by the contemporaneous relation with public information and volatility, while the private processing of public information component is specified as a general autoregressive process corresponding to the sequential price discovery mechanism of investors as additional information, previously not publicly available, is generated and incorporated into prices. Our results show that changes in return volatility are related to public information arrival and that including indicators of public information arrival explains on average 26% (9–65%) of changes in firm-specific return volatility.

Download Full-text

Reference values of trace elements in blood and/or plasma in adults living in Belgium

Clinical Chemistry and Laboratory Medicine (CCLM) ◽

10.1515/cclm-2020-1019 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Perrine Hoet ◽

Chantal Jacquerye ◽

Gladys Deumer ◽

Dominique Lison ◽

Vincent Haufroid

Keyword(s):

Trace Elements ◽

Reference Values ◽

Inductively Coupled Plasma ◽

Adult Population ◽

A Priori ◽

Selection Procedure ◽

Anthropogenic Sources ◽

Published Data ◽

Reference Distribution ◽

Belgian Population

AbstractObjectivesTrace elements (TEs) from natural and anthropogenic sources are ubiquitous. Essential or not, their relevance for human health and disease is constantly expanding. Biological monitoring is a widely integrated tool in risk assessment both in occupational and environmental settings. However, the determination of appropriate and accurate reference values in the (specific) population is a prerequisite for a correct interpretation of biomonitoring data. This study aimed at determining the reference distribution for TEs (Al, As, Sb, Be, Bi, Cd, Co, Cu, Mn, Hg, Mo, Ni, Pb, Se, Tl, Sn, V, Zn) in the blood and/or plasma of the adult population in Belgium.MethodsBlood and plasma samples were analyzed for 178 males and 202 females, recruited according to an a priori selection procedure, by inductively coupled plasma mass spectrometry (ICP-MS).ResultsReference values were established with high confidence for AsT, Cd, Cu, HgT, Mn, Mo, Pb, Sn, Se, Tl and Zn. Compared to previously published data in the Belgian population, a decreasing time trend is observed for Zn, Cd and Pb. Globally, the results also indicate that the current exposure levels to TEs in the Belgian population are similar to those from other recent national surveys.ConclusionsThese reference values and limits obtained through validated analytical and statistical methods will be useful for future occupational and/or environmental surveys. They will contribute to decision-making concerning both public health policies but also exposure assessments on an individual scale.

Download Full-text

Iterative Variable Selection for High-Dimensional Data: Prediction of Pathological Response in Triple-Negative Breast Cancer

Mathematics ◽

10.3390/math9030222 ◽

2021 ◽

Vol 9 (3) ◽

pp. 222

Author(s):

Juan C. Laria ◽

M. Carmen Aguilera-Morillo ◽

Enrique Álvarez ◽

Rosa E. Lillo ◽

Sara López-Taruella ◽

...

Keyword(s):

Breast Cancer ◽

Variable Selection ◽

Triple Negative Breast Cancer ◽

Triple Negative ◽

A Priori ◽

Simulated Data ◽

Point Of View ◽

High Dimensional ◽

Whole Genome ◽

Genome Context

Over the last decade, regularized regression methods have offered alternatives for performing multi-marker analysis and feature selection in a whole genome context. The process of defining a list of genes that will characterize an expression profile remains unclear. It currently relies upon advanced statistics and can use an agnostic point of view or include some a priori knowledge, but overfitting remains a problem. This paper introduces a methodology to deal with the variable selection and model estimation problems in the high-dimensional set-up, which can be particularly useful in the whole genome context. Results are validated using simulated data and a real dataset from a triple-negative breast cancer study.

Download Full-text