Semantic and Generalized Entropy Loss Functions for Semi-Supervised Deep Learning

Krzysztof Gajowniczek; Yitao Liang; Tal Friedman; Tomasz Ząbkowski; Guy Van den Broeck

doi:10.3390/e22030334

Semantic and Generalized Entropy Loss Functions for Semi-Supervised Deep Learning

Entropy ◽

10.3390/e22030334 ◽

2020 ◽

Vol 22 (3) ◽

pp. 334

Author(s):

Krzysztof Gajowniczek ◽

Yitao Liang ◽

Tal Friedman ◽

Tomasz Ząbkowski ◽

Guy Van den Broeck

Keyword(s):

Neural Network ◽

Supervised Learning ◽

Loss Function ◽

Practical Importance ◽

Loss Functions ◽

Generalized Entropy ◽

Entropy Loss ◽

Additional Information ◽

Label Information ◽

Benchmark Datasets

The increasing size of modern datasets combined with the difficulty of obtaining real label information (e.g., class) has made semi-supervised learning a problem of considerable practical importance in modern data analysis. Semi-supervised learning is supervised learning with additional information on the distribution of the examples or, simultaneously, an extension of unsupervised learning guided by some constraints. In this article we present a methodology that bridges between artificial neural network output vectors and logical constraints. In order to do this, we present a semantic loss function and a generalized entropy loss function (Rényi entropy) that capture how close the neural network is to satisfying the constraints on its output. Our methods are intended to be generally applicable and compatible with any feedforward neural network. Therefore, the semantic loss and generalized entropy loss are simply a regularization term that can be directly plugged into an existing loss function. We evaluate our methodology over an artificially simulated dataset and two commonly used benchmark datasets which are MNIST and Fashion-MNIST to assess the relation between the analyzed loss functions and the influence of the various input and tuning parameters on the classification accuracy. The experimental evaluation shows that both losses effectively guide the learner to achieve (near-) state-of-the-art results on semi-supervised multiclass classification.

Download Full-text

Generalized Entropy Loss Function in Neural Network: Variable’s Importance and Sensitivity Analysis

Proceedings of the 21st EANN (Engineering Applications of Neural Networks) 2020 Conference - Proceedings of the International Neural Networks Society ◽

10.1007/978-3-030-48791-1_42 ◽

2020 ◽

pp. 535-545

Author(s):

Krzysztof Gajowniczek ◽

Tomasz Ząbkowski

Keyword(s):

Neural Network ◽

Sensitivity Analysis ◽

Loss Function ◽

Generalized Entropy ◽

Entropy Loss

Download Full-text

Self-Supervised Contextual Data Augmentation for Natural Language Processing

Symmetry ◽

10.3390/sym11111393 ◽

2019 ◽

Vol 11 (11) ◽

pp. 1393

Author(s):

Dongju Park ◽

Chang Wook Ahn

Keyword(s):

Supervised Learning ◽

Language Processing ◽

Recurrent Neural Networks ◽

Question Answering ◽

Data Augmentation ◽

Language Model ◽

Contextual Data ◽

External Data ◽

Label Information ◽

Benchmark Datasets

In this paper, we propose a novel data augmentation method with respect to the target context of the data via self-supervised learning. Instead of looking for the exact synonyms of masked words, the proposed method finds words that can replace the original words considering the context. For self-supervised learning, we can employ the masked language model (MLM), which masks a specific word within a sentence and obtains the original word. The MLM learns the context of a sentence through asymmetrical inputs and outputs. However, without using the existing MLM, we propose a label-masked language model (LMLM) that can include label information for the mask tokens used in the MLM to effectively use the MLM in data with label information. The augmentation method performs self-supervised learning using LMLM and then implements data augmentation through the trained model. We demonstrate that our proposed method improves the classification accuracy of recurrent neural networks and convolutional neural network-based classifiers through several experiments for text classification benchmark datasets, including the Stanford Sentiment Treebank-5 (SST5), the Stanford Sentiment Treebank-2 (SST2), the subjectivity (Subj), the Multi-Perspective Question Answering (MPQA), the Movie Reviews (MR), and the Text Retrieval Conference (TREC) datasets. In addition, since the proposed method does not use external data, it can eliminate the time spent collecting external data, or pre-training using external data.

Download Full-text

Deep Learning with Dynamically Weighted Loss Function for Sensor-Based Prognostics and Health Management

Sensors ◽

10.3390/s20030723 ◽

2020 ◽

Vol 20 (3) ◽

pp. 723 ◽

Cited By ~ 6

Author(s):

Divish Rengasamy ◽

Mina Jafari ◽

Benjamin Rothwell ◽

Xin Chen ◽

Grazziela P. Figueredo

Keyword(s):

Neural Network ◽

Deep Learning ◽

Loss Function ◽

Short Term Memory ◽

Health Management ◽

Remaining Useful Life ◽

Loss Functions ◽

Pressure System ◽

Failure Data ◽

Prognostic And Health Management

Deep learning has been employed to prognostic and health management of automotive and aerospace with promising results. Literature in this area has revealed that most contributions regarding deep learning is largely focused on the model’s architecture. However, contributions regarding improvement of different aspects in deep learning, such as custom loss function for prognostic and health management are scarce. There is therefore an opportunity to improve upon the effectiveness of deep learning for the system’s prognostics and diagnostics without modifying the models’ architecture. To address this gap, the use of two different dynamically weighted loss functions, a newly proposed weighting mechanism and a focal loss function for prognostics and diagnostics task are investigated. A dynamically weighted loss function is expected to modify the learning process by augmenting the loss function with a weight value corresponding to the learning error of each data instance. The objective is to force deep learning models to focus on those instances where larger learning errors occur in order to improve their performance. The two loss functions used are evaluated using four popular deep learning architectures, namely, deep feedforward neural network, one-dimensional convolutional neural network, bidirectional gated recurrent unit and bidirectional long short-term memory on the commercial modular aero-propulsion system simulation data from NASA and air pressure system failure data for Scania trucks. Experimental results show that dynamically-weighted loss functions helps us achieve significant improvement for remaining useful life prediction and fault detection rate over non-weighted loss function predictions.

Download Full-text

Graph Self Supervised Learning: the BT, the HSIC, and the VICReg

10.31219/osf.io/tvmdu ◽

2021 ◽

Author(s):

Sayan Nag

Keyword(s):

Neural Networks ◽

Supervised Learning ◽

Loss Function ◽

Data Augmentation ◽

Learning Strategy ◽

Loss Functions ◽

Augmentation Strategies ◽

Batch Sizes ◽

Graph Neural Networks ◽

The Impact

Self-supervised learning and pre-training strategies have developed over the last few years especially for Convolutional Neural Networks (CNNs). Recently application of such methods can also be noticed for Graph Neural Networks (GNNs). In this paper, we have used a graph based self-supervised learning strategy with different loss functions (Barlow Twins[? ], HSIC[? ], VICReg[? ]) which have shown promising results when applied with CNNs previously. We have also proposed a hybrid loss function combining the advantages of VICReg and HSIC and called it as VICRegHSIC. The performance of these aforementioned methods have been compared when applied to two different datasets namely MUTAG and PROTEINS. Moreover, the impact of different batch sizes, projector dimensions and data augmentation strategies have also been explored. The results are preliminary and we will be continuing to explore with other datasets.

Download Full-text

Building robust neural networks using different loss functions

Analysis and data processing systems ◽

10.17212/2782-2001-2021-2-67-82 ◽

2021 ◽

pp. 67-82

Author(s):

Maria Sivak ◽

◽

Vladimir Timofeev ◽

Keyword(s):

Neural Network ◽

Neural Networks ◽

Loss Function ◽

Back Propagation ◽

Loss Functions ◽

Back Propagation Algorithm ◽

Error Back Propagation ◽

Propagation Algorithm ◽

Error Back Propagation Algorithm ◽

Parameter Values

The paper considers the problem of building robust neural networks using different robust loss functions. Applying such neural networks is reasonably when working with noisy data, and it can serve as an alternative to data preprocessing and to making neural network architecture more complex. In order to work adequately, the error back-propagation algorithm requires a loss function to be continuously or two-times differentiable. According to this requirement, two five robust loss functions were chosen (Andrews, Welsch, Huber, Ramsey and Fair). Using the above-mentioned functions in the error back-propagation algorithm instead of the quadratic one allows obtaining an entirely new class of neural networks. For investigating the properties of the built networks a number of computational experiments were carried out. Different values of outliers’ fraction and various numbers of epochs were considered. The first step included adjusting the obtained neural networks, which lead to choosing such values of internal loss function parameters that resulted in achieving the highest accuracy of a neural network. To determine the ranges of parameter values, a preliminary study was pursued. The results of the first stage allowed giving recommendations on choosing the best parameter values for each of the loss functions under study. The second stage dealt with comparing the investigated robust networks with each other and with the classical one. The analysis of the results shows that using the robust technique leads to a significant increase in neural network accuracy and in a learning rate.

Download Full-text

PAC-Bayes Unleashed: Generalisation Bounds with Unbounded Losses

Entropy ◽

10.3390/e23101330 ◽

2021 ◽

Vol 23 (10) ◽

pp. 1330

Author(s):

Maxime Haddouche ◽

Benjamin Guedj ◽

Omar Rivasplata ◽

John Shawe-Taylor

Keyword(s):

Linear Regression ◽

Supervised Learning ◽

Loss Function ◽

Loss Functions ◽

Learning Problems ◽

Regression Problem ◽

Learning Framework ◽

Actual Computation ◽

Linear Regression Problem

We present new PAC-Bayesian generalisation bounds for learning problems with unbounded loss functions. This extends the relevance and applicability of the PAC-Bayes learning framework, where most of the existing literature focuses on supervised learning problems with a bounded loss function (typically assumed to take values in the interval [0;1]). In order to relax this classical assumption, we propose to allow the range of the loss to depend on each predictor. This relaxation is captured by our new notion of HYPothesis-dependent rangE (HYPE). Based on this, we derive a novel PAC-Bayesian generalisation bound for unbounded loss functions, and we instantiate it on a linear regression problem. To make our theory usable by the largest audience possible, we include discussions on actual computation, practicality and limitations of our assumptions.

Download Full-text

MPCE: A Maximum Probability Based Cross Entropy Loss Function for Neural Network Classification

IEEE Access ◽

10.1109/access.2019.2946264 ◽

2019 ◽

Vol 7 ◽

pp. 146331-146341 ◽

Cited By ~ 4

Author(s):

Yangfan Zhou ◽

Xin Wang ◽

Mingchuan Zhang ◽

Junlong Zhu ◽

Ruijuan Zheng ◽

...

Keyword(s):

Neural Network ◽

Loss Function ◽

Cross Entropy ◽

Maximum Probability ◽

Entropy Loss ◽

Neural Network Classification

Download Full-text

Comparing Class-Aware and Pairwise Loss Functions for Deep Metric Learning in Wildlife Re-Identification

Sensors ◽

10.3390/s21186109 ◽

2021 ◽

Vol 21 (18) ◽

pp. 6109

Author(s):

Nkosikhona Dlamini ◽

Terence L. van Zyl

Keyword(s):

Neural Network ◽

Neural Networks ◽

Loss Function ◽

Network Architecture ◽

Loss Functions ◽

Similarity Learning ◽

Neural Network Architecture ◽

Shot Classification ◽

Public Datasets ◽

The Impact

Similarity learning using deep convolutional neural networks has been applied extensively in solving computer vision problems. This attraction is supported by its success in one-shot and zero-shot classification applications. The advances in similarity learning are essential for smaller datasets or datasets in which few class labels exist per class such as wildlife re-identification. Improving the performance of similarity learning models comes with developing new sampling techniques and designing loss functions better suited to training similarity in neural networks. However, the impact of these advances is tested on larger datasets, with limited attention given to smaller imbalanced datasets such as those found in unique wildlife re-identification. To this end, we test the advances in loss functions for similarity learning on several animal re-identification tasks. We add two new public datasets, Nyala and Lions, to the challenge of animal re-identification. Our results are state of the art on all public datasets tested except Pandas. The achieved Top-1 Recall is 94.8% on the Zebra dataset, 72.3% on the Nyala dataset, 79.7% on the Chimps dataset and, on the Tiger dataset, it is 88.9%. For the Lion dataset, we set a new benchmark at 94.8%. We find that the best performing loss function across all datasets is generally the triplet loss; however, there is only a marginal improvement compared to the performance achieved by Proxy-NCA models. We demonstrate that no single neural network architecture combined with a loss function is best suited for all datasets, although VGG-11 may be the most robust first choice. Our results highlight the need for broader experimentation and exploration of loss functions and neural network architecture for the more challenging task, over classical benchmarks, of wildlife re-identification.

Download Full-text

Fully Convolved Neural Network-Based Retinal Vessel Segmentation with Entropy Loss Function

Proceedings of International Conference on Artificial Intelligence, Smart Grid and Smart City Applications ◽

10.1007/978-3-030-24051-6_21 ◽

2020 ◽

pp. 217-225

Author(s):

V. Sathananthavathi ◽

G. Indumathi ◽

A. Swetha Ranjani

Keyword(s):

Neural Network ◽

Loss Function ◽

Retinal Vessel ◽

Vessel Segmentation ◽

Entropy Loss ◽

Retinal Vessel Segmentation

Download Full-text

Relative Distribution Entropy Loss Function in CNN Image Retrieval

Entropy ◽

10.3390/e22030321 ◽

2020 ◽

Vol 22 (3) ◽

pp. 321

Author(s):

Pingping Liu ◽

Lida Shi ◽

Zhuang Miao ◽

Baixin Jin ◽

Qiuzhan Zhou

Keyword(s):

Image Retrieval ◽

Loss Function ◽

Euclidean Distance ◽

Metric Learning ◽

Loss Functions ◽

Relative Distribution ◽

Weighted Distance ◽

Entropy Loss ◽

Image Descriptors ◽

Entropy Weighted

Convolutional neural networks (CNN) is the most mainstream solution in the field of image retrieval. Deep metric learning is introduced into the field of image retrieval, focusing on the construction of pair-based loss function. However, most pair-based loss functions of metric learning merely take common vector similarity (such as Euclidean distance) of the final image descriptors into consideration, while neglecting other distribution characters of these descriptors. In this work, we propose relative distribution entropy (RDE) to describe the internal distribution attributes of image descriptors. We combine relative distribution entropy with the Euclidean distance to obtain the relative distribution entropy weighted distance (RDE-distance). Moreover, the RDE-distance is fused with the contrastive loss and triplet loss to build the relative distributed entropy loss functions. The experimental results demonstrate that our method attains the state-of-the-art performance on most image retrieval benchmarks.

Download Full-text