Capped ℓ1-norm regularized least squares classification with label noise

2021 ◽  
pp. 1-13
Author(s):  
Zhi Yang ◽  
Haitao Gan ◽  
Xuan Li ◽  
Cong Wu

Since label noise can hurt the performance of supervised learning (SL), how to train a good classifier to deal with label noise is an emerging and meaningful topic in machine learning field. Although many related methods have been proposed and achieved promising performance, they have the following drawbacks: (1) they can lead to data waste and even performance degradation if the mislabeled instances are removed; and (2) the negative effect of the extremely mislabeled instances cannot be completely eliminated. To address these problems, we propose a novel method based on the capped ℓ1 norm and a graph-based regularizer to deal with label noise. In the proposed algorithm, we utilize the capped ℓ1 norm instead of the ℓ1 norm. The used norm can inherit the advantage of the ℓ1 norm, which is robust to label noise to some extent. Moreover, the capped ℓ1 norm can adaptively find extremely mislabeled instances and eliminate the corresponding negative influence. Additionally, the proposed algorithm makes full use of the mislabeled instances under the graph-based framework. It can avoid wasting collected instance information. The solution of our algorithm can be achieved through an iterative optimization approach. We report the experimental results on several UCI datasets that include both binary and multi-class problems. The results verified the effectiveness of the proposed algorithm in comparison to existing state-of-the-art classification methods.

Algorithms ◽  
2019 ◽  
Vol 12 (5) ◽  
pp. 99 ◽  
Author(s):  
Kleopatra Pirpinia ◽  
Peter A. N. Bosman ◽  
Jan-Jakob Sonke ◽  
Marcel van Herk ◽  
Tanja Alderliesten

Current state-of-the-art medical deformable image registration (DIR) methods optimize a weighted sum of key objectives of interest. Having a pre-determined weight combination that leads to high-quality results for any instance of a specific DIR problem (i.e., a class solution) would facilitate clinical application of DIR. However, such a combination can vary widely for each instance and is currently often manually determined. A multi-objective optimization approach for DIR removes the need for manual tuning, providing a set of high-quality trade-off solutions. Here, we investigate machine learning for a multi-objective class solution, i.e., not a single weight combination, but a set thereof, that, when used on any instance of a specific DIR problem, approximates such a set of trade-off solutions. To this end, we employed a multi-objective evolutionary algorithm to learn sets of weight combinations for three breast DIR problems of increasing difficulty: 10 prone-prone cases, 4 prone-supine cases with limited deformations and 6 prone-supine cases with larger deformations and image artefacts. Clinically-acceptable results were obtained for the first two problems. Therefore, for DIR problems with limited deformations, a multi-objective class solution can be machine learned and used to compute straightforwardly multiple high-quality DIR outcomes, potentially leading to more efficient use of DIR in clinical practice.


2018 ◽  
Vol 35 (14) ◽  
pp. 2458-2465 ◽  
Author(s):  
Johanna Schwarz ◽  
Dominik Heider

Abstract Motivation Clinical decision support systems have been applied in numerous fields, ranging from cancer survival toward drug resistance prediction. Nevertheless, clinical decision support systems typically have a caveat: many of them are perceived as black-boxes by non-experts and, unfortunately, the obtained scores cannot usually be interpreted as class probability estimates. In probability-focused medical applications, it is not sufficient to perform well with regards to discrimination and, consequently, various calibration methods have been developed to enable probabilistic interpretation. The aims of this study were (i) to develop a tool for fast and comparative analysis of different calibration methods, (ii) to demonstrate their limitations for the use on clinical data and (iii) to introduce our novel method GUESS. Results We compared the performances of two different state-of-the-art calibration methods, namely histogram binning and Bayesian Binning in Quantiles, as well as our novel method GUESS on both, simulated and real-world datasets. GUESS demonstrated calibration performance comparable to the state-of-the-art methods and always retained accurate class discrimination. GUESS showed superior calibration performance in small datasets and therefore may be an optimal calibration method for typical clinical datasets. Moreover, we provide a framework (CalibratR) for R, which can be used to identify the most suitable calibration method for novel datasets in a timely and efficient manner. Using calibrated probability estimates instead of original classifier scores will contribute to the acceptance and dissemination of machine learning based classification models in cost-sensitive applications, such as clinical research. Availability and implementation GUESS as part of CalibratR can be downloaded at CRAN.


Author(s):  
Alex Sumarsono ◽  
Farnaz Ganjeizadeh ◽  
Ryan Tomasi

Hyperspectral imagery (HSI) contains hundreds of narrow contiguous bands of spectral signals. These signals, which form spectral signatures, provide a wealth of information that can be used to characterize material substances. In recent years machine learning has been used extensively to classify HSI data. While many excellent HSI classifiers have been proposed and deployed, the focus has been more on the design of the algorithms. This paper presents a novel data preprocessing method (LRSP) to improve classification accuracy by applying stochastic perturbations to the low-rank constituent of the dataset. The proposed architecture is composed of a low-rank and sparse decomposition, a degradation function and a constraint least squares filter. Experimental results confirm that popular state-of-the-art HSI classifiers can produce better classification results if supplied by LRSP-altered datasets rather than the original HSI datasets. 


2021 ◽  
Vol 11 (6) ◽  
pp. 7824-7835
Author(s):  
H. Alalawi ◽  
M. Alsuwat ◽  
H. Alhakami

The importance of classification algorithms has increased in recent years. Classification is a branch of supervised learning with the goal of predicting class labels categorical of new cases. Additionally, with Coronavirus (COVID-19) propagation since 2019, the world still faces a great challenge in defeating COVID-19 even with modern methods and technologies. This paper gives an overview of classification algorithms to provide the readers with an understanding of the concept of the state-of-the-art classification algorithms and their applications used in the COVID-19 diagnosis and detection. It also describes some of the research published on classification algorithms, the existing gaps in the research, and future research directions. This article encourages both academics and machine learning learners to further strengthen the basis of classification methods.


Text classification and clustering approach is essential for big data environments. In supervised learning applications many classification algorithms have been proposed. In the era of big data, a large volume of training data is available in many machine learning works. However, there is a possibility of mislabeled or unlabeled data that are not labeled properly. Some labels may be incorrect resulted in label noise which in turn regress learning performance of a classifier. A general approach to address label noise is to apply noise filtering techniques to identify and remove noise before learning. A range of noise filtering approaches have been developed to improve the classifiers performance. This paper proposes noise filtering approach in text data during the training phase. Many supervised learning algorithms generates high error rates due to noise in training dataset, our work eliminates such noise and provides accurate classification system.


Informatics ◽  
2021 ◽  
Vol 8 (3) ◽  
pp. 59
Author(s):  
Alexander Chowdhury ◽  
Jacob Rosenthal ◽  
Jonathan Waring ◽  
Renato Umeton

Machine learning has become an increasingly ubiquitous technology, as big data continues to inform and influence everyday life and decision-making. Currently, in medicine and healthcare, as well as in most other industries, the two most prevalent machine learning paradigms are supervised learning and transfer learning. Both practices rely on large-scale, manually annotated datasets to train increasingly complex models. However, the requirement of data to be manually labeled leaves an excess of unused, unlabeled data available in both public and private data repositories. Self-supervised learning (SSL) is a growing area of machine learning that can take advantage of unlabeled data. Contrary to other machine learning paradigms, SSL algorithms create artificial supervisory signals from unlabeled data and pretrain algorithms on these signals. The aim of this review is two-fold: firstly, we provide a formal definition of SSL, divide SSL algorithms into their four unique subsets, and review the state of the art published in each of those subsets between the years of 2014 and 2020. Second, this work surveys recent SSL algorithms published in healthcare, in order to provide medical experts with a clearer picture of how they can integrate SSL into their research, with the objective of leveraging unlabeled data.


2018 ◽  
Author(s):  
Rikifumi Ota ◽  
Takahiro Ide ◽  
Tatsuo Michiue

AbstractCell segmentation is crucial in the study of morphogenesis in developing embryos, but it is limited in its accuracy. In this study we provide a novel method for cell segmentation using machine-learning, termed Cell Segmenter using Machine Learning (CSML). CSML performed better than state-of-the-art methods, such as RACE and watershed, in the segmentation of ectodermal cells in the Xenopus embryo. CSML required only one whole embryo image for training a Fully Convolutional Network classifier, and it took 20 seconds per each image to return a segmented image. To validate its accuracy, we compared it to other methods in assessing several indicators of cell shape. We also examined the generality by measuring its performance in segmenting independent images. Our data demonstrates the superiority of CSML, and we expect this application to significantly improve efficiency in cell shape studies.


2021 ◽  
Vol 18 (6) ◽  
pp. 7727-7742
Author(s):  
Haitao Gan ◽  
◽  
Zhi Yang ◽  
Ji Wang ◽  
Bing Li ◽  
...  

<abstract><p>In the past few years, Safe Semi-Supervised Learning (S3L) has received considerable attentions in machine learning field. Different researchers have proposed many S3L methods for safe exploitation of risky unlabeled samples which result in performance degradation of Semi-Supervised Learning (SSL). Nevertheless, there exist some shortcomings: (1) Risk degrees of the unlabeled samples are in advance defined by analyzing prediction differences between Supervised Learning (SL) and SSL; (2) Negative impacts of labeled samples on learning performance are not investigated. Therefore, it is essential to design a novel method to adaptively estimate importance and risk of both unlabeled and labeled samples. For this purpose, we present $ \ell_{1} $-norm based S3L which can simultaneously reach the safe exploitation of the labeled and unlabeled samples in this paper. In order to solve the proposed ptimization problem, we utilize an effective iterative approach. In each iteration, one can adaptively estimate the weights of both labeled and unlabeled samples. The weights can reflect the importance or risk of the labeled and unlabeled samples. Hence, the negative effects of the labeled and unlabeled samples are expected to be reduced. Experimental performance on different datasets verifies that the proposed S3L method can obtain comparable performance with the existing SL, SSL and S3L methods and achieve the expected goal.</p></abstract>


2021 ◽  
Vol 11 (19) ◽  
pp. 9023
Author(s):  
Najam-ur Rehman ◽  
Muhammad Sultan Zia ◽  
Talha Meraj ◽  
Hafiz Tayyab Rauf ◽  
Robertas Damaševičius ◽  
...  

Chest diseases can be dangerous and deadly. They include many chest infections such as pneumonia, asthma, edema, and, lately, COVID-19. COVID-19 has many similar symptoms compared to pneumonia, such as breathing hardness and chest burden. However, it is a challenging task to differentiate COVID-19 from other chest diseases. Several related studies proposed a computer-aided COVID-19 detection system for the single-class COVID-19 detection, which may be misleading due to similar symptoms of other chest diseases. This paper proposes a framework for the detection of 15 types of chest diseases, including the COVID-19 disease, via a chest X-ray modality. Two-way classification is performed in proposed Framework. First, a deep learning-based convolutional neural network (CNN) architecture with a soft-max classifier is proposed. Second, transfer learning is applied using fully-connected layer of proposed CNN that extracted deep features. The deep features are fed to the classical Machine Learning (ML) classification methods. However, the proposed framework improves the accuracy for COVID-19 detection and increases the predictability rates for other chest diseases. The experimental results show that the proposed framework, when compared to other state-of-the-art models for diagnosing COVID-19 and other chest diseases, is more robust, and the results are promising.


2021 ◽  
Author(s):  
Yipkei Kwok ◽  
David L. Sullivan

Recent machine learning-based caching algorithm have shown promise. Among them, Learning-FromOPT (LFO) is the state-of-the-art supervised learning caching algorithm. LFO has a parameter named Window Size, which defines how often the algorithm generates a new machine-learning model. While using a small window size allows the algorithm to be more adaptive to changes in request behaviors, experimenting with LFO revealed that the performance of LFO suffers dramatically with small window sizes. This paper proposes LFO2, an improved LFO algorithm, which achieves high object hit ratios (OHR) with small window sizes. This results show a 9% OHR increase with LFO2. As the next step, the machine-learning parameters will be investigated for tuning opportunities to further enhance performance.


Sign in / Sign up

Export Citation Format

Share Document