A Structural SVM Based Approach for Binary Classification under Class Imbalance

Mathematical Problems in Engineering ◽

10.1155/2015/269856 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Fan Cheng ◽

Kang Yang ◽

Lei Zhang

Keyword(s):

Performance Measures ◽

Binary Classification ◽

Empirical Studies ◽

Class Imbalance ◽

Loss Functions ◽

Misclassification Error ◽

Structural Svm ◽

Learning Techniques ◽

Wide Range ◽

Machine Learning Applications

Class imbalance situations, where one class is rare compared to the other, arise frequently in machine learning applications. It is well known that the usual misclassification error is not suitable in such settings. A wide range of performance measures such as AM and QM have been proposed for this problem. However, due to computational difficulties, few learning techniques have been developed to directly optimize for AM or QM metric. To fill the gap, in this paper, we present a general structural SVM framework for directly optimizing AM and QM. We define the loss functions oriented to AM and QM, respectively, and adopt the cutting plane algorithm to solve the outer optimization. For the inner problem of finding the most violated constraint, we propose two efficient algorithms for the AM and QM problem. Empirical studies on the various imbalanced datasets justify the effectiveness of the proposed approach.

Download Full-text

Machine Learning Techniques for Code Smells Detection: A Systematic Mapping Study

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s021819401950013x ◽

2019 ◽

Vol 29 (02) ◽

pp. 285-316 ◽

Cited By ~ 7

Author(s):

Frederico Luiz Caram ◽

Bruno Rafael De Oliveira Rodrigues ◽

Amadeu Silveira Campanelli ◽

Fernando Silva Parreiras

Keyword(s):

Machine Learning ◽

Empirical Studies ◽

Machine Learning Techniques ◽

Support Vector ◽

Systematic Mapping Study ◽

Code Smells ◽

Mapping Study ◽

Learning Techniques ◽

Wide Range ◽

High Level

Code smells or bad smells are an accepted approach to identify design flaws in the source code. Although it has been explored by researchers, the interpretation of programmers is rather subjective. One way to deal with this subjectivity is to use machine learning techniques. This paper provides the reader with an overview of machine learning techniques and code smells found in the literature, aiming at determining which methods and practices are used when applying machine learning for code smells identification and which machine learning techniques have been used for code smells identification. A mapping study was used to identify the techniques used for each smell. We found that the Bloaters was the main kind of smell studied, addressed by 35% of the papers. The most commonly used technique was Genetic Algorithms (GA), used by 22.22% of the papers. Regarding the smells addressed by each technique, there was a high level of redundancy, in a way that the smells are covered by a wide range of algorithms. Nevertheless, Feature Envy stood out, being targeted by 63% of the techniques. When it comes to performance, the best average was provided by Decision Tree, followed by Random Forest, Semi-supervised and Support Vector Machine Classifier techniques. 5 out of the 25 analyzed smells were not handled by any machine learning techniques. Most of them focus on several code smells and in general there is no outperforming technique, except for a few specific smells. We also found a lack of comparable results due to the heterogeneity of the data sources and of the provided results. We recommend the pursuit of further empirical studies to assess the performance of these techniques in a standardized dataset to improve the comparison reliability and replicability.

Download Full-text

VALIDATION ASSESSMENTS ON RESAMPLING METHOD IN IMBALANCED BINARY CLASSIFICATION FOR LINEAR DISCRIMINANT ANALYSIS

Journal of Information and Communication Technology ◽

10.32890/jict.20.1.2021.6358 ◽

2020 ◽

Vol 20 ◽

Author(s):

Ahmad Hakiim Jamaluddin ◽

Nor Idayu Mahat

Keyword(s):

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Performance Measures ◽

Cross Validation ◽

Binary Classification ◽

Class Imbalance ◽

True Negative ◽

Linear Discriminant ◽

Resampling Method ◽

Fold Cross Validation

The curse of class imbalance affects the performance of many conventional classification algorithms including linear discriminant analysis (LDA). The data pre-processing approach through some resampling methods such as random oversampling (ROS) and random undersampling (RUS) is one of the treatments to alleviate such curse. Previous studies have attempted to address the effect of a resampling method on the performance of LDA. However, some studies contradicted with each other based on different performance measures as well as validation strategies. This manuscript attempted to shed more light on the effect of a resampling method (ROS or RUS) on the performance of LDA based on true positive rate and true negative rate through five validation strategies, i.e. leave-one-out cross-validation, k-fold cross-validation, repeated k-fold cross-validation, naive bootstrap, and .632+ bootstrap. 100 two-group bivariate normally distributed simulated and four real data sets with severe class imbalance ratio were utilised. The analysis on the location and dispersion statistics of the performance measures was further enlightened on: (i) the effect of a resampling method on the performance of LDA, and (ii) the enhancement in the learning fairness of LDA on objects regardless of sample size, hence reducing the effect of the curse of class imbalance.

Download Full-text

A Generalization Performance Study Using Deep Learning Networks in Embedded Systems

Sensors ◽

10.3390/s21041031 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1031

Author(s):

Joseba Gorospe ◽

Rubén Mulero ◽

Olatz Arbelaitz ◽

Javier Muguerza ◽

Miguel Ángel Antón

Keyword(s):

Deep Learning ◽

Embedded Systems ◽

Embedded System ◽

General Purpose ◽

Learning Networks ◽

Performance Study ◽

Learning Techniques ◽

Wide Range ◽

Learning Architectures

Deep learning techniques are being increasingly used in the scientific community as a consequence of the high computational capacity of current systems and the increase in the amount of data available as a result of the digitalisation of society in general and the industrial world in particular. In addition, the immersion of the field of edge computing, which focuses on integrating artificial intelligence as close as possible to the client, makes it possible to implement systems that act in real time without the need to transfer all of the data to centralised servers. The combination of these two concepts can lead to systems with the capacity to make correct decisions and act based on them immediately and in situ. Despite this, the low capacity of embedded systems greatly hinders this integration, so the possibility of being able to integrate them into a wide range of micro-controllers can be a great advantage. This paper contributes with the generation of an environment based on Mbed OS and TensorFlow Lite to be embedded in any general purpose embedded system, allowing the introduction of deep learning architectures. The experiments herein prove that the proposed system is competitive if compared to other commercial systems.

Download Full-text

Prediction of Clinical Risk Factors of Diabetes Using Multiple Machine Learning Techniques Resolving Class Imbalance

2020 23rd International Conference on Computer and Information Technology (ICCIT) ◽

10.1109/iccit51783.2020.9392694 ◽

2020 ◽

Author(s):

Kazi Amit Hasan ◽

Md. Al Mehedi Hasan

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Class Imbalance ◽

Clinical Risk Factors ◽

Machine Learning Techniques ◽

Clinical Risk ◽

Learning Techniques

Download Full-text

A developmentally descriptive method for quantifying shape in gastropod shells

Journal of The Royal Society Interface ◽

10.1098/rsif.2019.0721 ◽

2020 ◽

Vol 17 (163) ◽

pp. 20190721

Author(s):

J. Larsson ◽

A. M. Westram ◽

S. Bengmark ◽

T. Lundh ◽

R. K. Butlin

Keyword(s):

Empirical Studies ◽

Three Dimensional ◽

Shell Shape ◽

Large Set ◽

Littorina Saxatilis ◽

Shape Variation ◽

Gastropod Shell ◽

Geometric Morphometric ◽

Wide Range ◽

Descriptive Method

The growth of snail shells can be described by simple mathematical rules. Variation in a few parameters can explain much of the diversity of shell shapes seen in nature. However, empirical studies of gastropod shell shape variation typically use geometric morphometric approaches, which do not capture this growth pattern. We have developed a way to infer a set of developmentally descriptive shape parameters based on three-dimensional logarithmic helicospiral growth and using landmarks from two-dimensional shell images as input. We demonstrate the utility of this approach, and compare it to the geometric morphometric approach, using a large set of Littorina saxatilis shells in which locally adapted populations differ in shape. Our method can be modified easily to make it applicable to a wide range of shell forms, which would allow for investigations of the similarities and differences between and within many different species of gastropods.

Download Full-text

Personality trait differences across types of entrepreneurs: a systematic literature review

Review of Managerial Science ◽

10.1007/s11846-021-00466-9 ◽

2021 ◽

Author(s):

Florentine U. Salmony ◽

Dominik K. Kanbach

Keyword(s):

Literature Review ◽

Personality Traits ◽

Systematic Literature Review ◽

Empirical Studies ◽

Academic Research ◽

Future Research ◽

Rural Farmers ◽

Start Up ◽

Wide Range ◽

Research Integration

AbstractThe personality traits that define entrepreneurs have been of significant interest to academic research for several decades. However, previous studies have used vastly different definitions of the term “entrepreneur”, meaning their subjects have ranged from rural farmers to tech-industry start-up founders. Consequently, most research has investigated disparate sub-types of entrepreneurs, which may not allow for inferences to be made regarding the general entrepreneurial population. Despite this, studies have frequently extrapolated results from narrow sub-types to entrepreneurs in general. This variation in entrepreneur samples reduces the comparability of empirical studies and calls into question the reviews that pool results without systematic differentiation between sub-types. The present study offers a novel account by differentiating between the definitions of “entrepreneur” used in studies on entrepreneurs’ personality traits. We conduct a systematic literature review across 95 studies from 1985 to 2020. We uncover three main themes across the previous studies. First, previous research applied a wide range of definitions of the term “entrepreneur”. Second, we identify several inconsistent findings across studies, which may at least partially be due to the use of heterogeneous entrepreneur samples. Third, the few studies that distinguished between various types of entrepreneurs revealed differences between them. Our systematic differentiation between entrepreneur sub-types and our research integration offer a novel perspective that has, to date, been widely neglected in academic research. Future research should use clearly defined entrepreneurial samples and conduct more systematic investigations into the differences between entrepreneur sub-types.

Download Full-text

Quality loss functions and performance measures for a mixed bivariate response

Journal of Manufacturing Systems ◽

10.1016/s0278-6125(02)80095-4 ◽

2002 ◽

Vol 21 (6) ◽

pp. 476

Keyword(s):

Performance Measures ◽

Loss Functions ◽

Quality Loss ◽

And Performance

Download Full-text

Development of a Hybrid Artificial Neural Network and Genetic Algorithm Model for Regime Identification of Slurry Transport in Pipelines

Chemical Product and Process Modeling ◽

10.2202/1934-2659.1343 ◽

2009 ◽

Vol 4 (1) ◽

Cited By ~ 2

Author(s):

Sandip K Lahiri ◽

Kartik Chandra Ghanta

Keyword(s):

Neural Network ◽

Genetic Algorithm ◽

Artificial Neural Network ◽

Operating Conditions ◽

Misclassification Error ◽

Wide Range ◽

Artificial Neural ◽

Artificial Neural Network Ann ◽

Pipeline Design ◽

Hybrid Artificial Neural Network

Four distinct regimes were found existent (namely sliding bed, saltation, heterogeneous suspension and homogeneous suspension) in slurry flow in pipeline depending upon the average velocity of flow. In the literature, few numbers of correlations has been proposed for identification of these regimes in slurry pipelines. Regime identification is important for slurry pipeline design as they are the prerequisite to apply different pressure drop correlation in different regime. However, available correlations fail to predict the regime over a wide range of conditions. Based on a databank of around 800 measurements collected from the open literature, a method has been proposed to identify the regime using artificial neural network (ANN) modeling. The method incorporates hybrid artificial neural network and genetic algorithm technique (ANN-GA) for efficient tuning of ANN meta parameters. Statistical analysis showed that the proposed method has an average misclassification error of 0.03%. A comparison with selected correlations in the literature showed that the developed ANN-GA method noticeably improved prediction of regime over a wide range of operating conditions, physical properties, and pipe diameters.

Download Full-text

Deep Learning for Laryngopharyngeal Reflux Diagnosis

Applied Sciences ◽

10.3390/app11114753 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4753

Author(s):

Gen Ye ◽

Chen Du ◽

Tong Lin ◽

Yan Yan ◽

Jack Jiang

Keyword(s):

Deep Learning ◽

Speech Processing ◽

Data Augmentation ◽

Laryngopharyngeal Reflux ◽

Ph Monitoring ◽

Binary Classification ◽

Classification Problem ◽

Learning Approaches ◽

Learning Techniques ◽

Auc Value

(1) Background: Deep learning has become ubiquitous due to its impressive performance in various domains, such as varied as computer vision, natural language and speech processing, and game-playing. In this work, we investigated the performance of recent deep learning approaches on the laryngopharyngeal reflux (LPR) diagnosis task. (2) Methods: Our dataset is composed of 114 subjects with 37 pH-positive cases and 77 control cases. In contrast to prior work based on either reflux finding score (RFS) or pH monitoring, we directly take laryngoscope images as inputs to neural networks, as laryngoscopy is the most common and simple diagnostic method. The diagnosis task is formulated as a binary classification problem. We first tested a powerful backbone network that incorporates residual modules, attention mechanism and data augmentation. Furthermore, recent methods in transfer learning and few-shot learning were investigated. (3) Results: On our dataset, the performance is the best test classification accuracy is 73.4%, while the best AUC value is 76.2%. (4) Conclusions: This study demonstrates that deep learning techniques can be applied to classify LPR images automatically. Although the number of pH-positive images used for training is limited, deep network can still be capable of learning discriminant features with the advantage of technique.

Download Full-text

Assessment of needs for care among patients with schizophrenic disorders 15 and 17 years after first onset of psychosis

Epidemiologia e psichiatria sociale Monograph Supplement ◽

10.1017/s1827433100000800 ◽

1997 ◽

Vol 6 (S1) ◽

pp. 21-28 ◽

Cited By ~ 1

Author(s):

Durk Wiersma ◽

Fokko J. Nienhuis ◽

Cees J. Slooff ◽

Robert Giel ◽

Aant De Jong

Keyword(s):

Mental Disorders ◽

Empirical Studies ◽

Signs And Symptoms ◽

Expected Improvement ◽

Chronic Patients ◽

Cross Sectional ◽

Wide Range ◽

Social Disablement ◽

First Onset

Severe and long term mental disorders, like schizophrenia, show in general a wide range of psychiatric signs and symptoms, psychological and physiological impairments and social disablement (Shepherd, 1994; Wing, 1982) reflecting a variety of mental health needs. Many studies provide only a cross-sectional view of the clinical and social problems of the patient population, for example at intake or admission to a mental hospital. Longitudinal studies following patients after discharge for some period of months or years show in general the expected improvement of functioning (e.g. Nienhuis et al., 1994), but as far as only chronic patients are concerned such a positive change is much less noted. The concept of chronicity of mental disorders would presume that after some time needs are fairly predictable and stable and do not change much over time. Our investigation on the long-term course of schizophrenia (Wiersma et al., 1996; 1997) enables us to study over a period of two years, from 15 to 17 years since first onset of psychosis, the stability or variability of needs in schizophrenic disorder. We are not aware of empirical studies on changes in needs among patients with long-term disorders.

Download Full-text