scholarly journals Estimation of Pedestrian Pose Orientation Using Soft Target Training Based on Teacher–Student Framework

Sensors ◽  
2019 ◽  
Vol 19 (5) ◽  
pp. 1147 ◽  
Author(s):  
DuYeong Heo ◽  
Jae Nam ◽  
Byoung Ko

Semi-supervised learning is known to achieve better generalisation than a model learned solely from labelled data. Therefore, we propose a new method for estimating a pedestrian pose orientation using a soft-target method, which is a type of semi-supervised learning method. Because a convolutional neural network (CNN) based pose orientation estimation requires large numbers of parameters and operations, we apply the teacher–student algorithm to generate a compressed student model with high accuracy and compactness resembling that of the teacher model by combining a deep network with a random forest. After the teacher model is generated using hard target data, the softened outputs (soft-target data) of the teacher model are used for training the student model. Moreover, the orientation of the pedestrian has specific shape patterns, and a wavelet transform is applied to the input image as a pre-processing step owing to its good spatial frequency localisation property and the ability to preserve both the spatial information and gradient information of an image. For a benchmark dataset considering real driving situations based on a single camera, we used the TUD and KITTI datasets. We applied the proposed algorithm to various driving images in the datasets, and the results indicate that its classification performance with regard to the pose orientation is better than that of other state-of-the-art methods based on a CNN. In addition, the computational speed of the proposed student model is faster than that of other deep CNNs owing to the shorter model structure with a smaller number of parameters.

2019 ◽  
Vol 2019 ◽  
pp. 1-10
Author(s):  
Diehao Kong ◽  
Xuefeng Yan

Autoencoders are used for fault diagnosis in chemical engineering. To improve their performance, experts have paid close attention to regularized strategies and the creation of new and effective cost functions. However, existing methods are modified on the basis of only one model. This study provides a new perspective for strengthening the fault diagnosis model, which attempts to gain useful information from a model (teacher model) and applies it to a new model (student model). It pretrains the teacher model by fitting ground truth labels and then uses a sample-wise strategy to transfer knowledge from the teacher model. Finally, the knowledge and the ground truth labels are used to train the student model that is identical to the teacher model in terms of structure. The current student model is then used as the teacher of next student model. After step-by-step teacher-student reconfiguration and training, the optimal model is selected for fault diagnosis. Besides, knowledge distillation is applied in training procedures. The proposed method is applied to several benchmarked problems to prove its effectiveness.


Author(s):  
Chao Ye ◽  
Huaidong Zhang ◽  
Xuemiao Xu ◽  
Weiwei Cai ◽  
Jing Qin ◽  
...  

Deep neural networks have been shown to be very powerful tools for object detection in various scenes. Their remarkable performance, however, heavily depends on the availability of a large number of high quality labeled data, which are time-consuming and costly to acquire for scenes with densely packed objects. We present a novel semi-supervised approach to addressing this problem, which is designed based on a common teacher-student model, integrated with a novel intersection-over-union (IoU) aware consistency loss and a new proposal consistency loss. The IoU-aware consistency loss evaluates the IoU over the prediction pairs of the teacher model and the student model, which enforces the prediction of the student model to approach closely to that of the teacher model. The IoU-aware consistency loss also reweights the importance of different prediction pairs to suppress the low-confident pairs. The proposal consistency loss ensures proposal consistency between the two models, making it possible to involve the region proposal network in the training process with unlabeled data. We also construct a new dataset, namely RebarDSC, containing 2,125 rebar images annotated with 350,348 bounding boxes in total (164.9 annotations per image average), to evaluate the proposed method. Extensive experiments are conducted over both the RebarDSC dataset and the famous large public dataset SKU-110K. Experimental results corroborate that the proposed method is able to improve the object detection performance in densely packed scenes, consistently outperforming state-of-the-art approaches. Dataset is available in https://github.com/Armin1337/RebarDSC.


Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 991
Author(s):  
Yuta Nakahara ◽  
Toshiyasu Matsushima

In information theory, lossless compression of general data is based on an explicit assumption of a stochastic generative model on target data. However, in lossless image compression, researchers have mainly focused on the coding procedure that outputs the coded sequence from the input image, and the assumption of the stochastic generative model is implicit. In these studies, there is a difficulty in discussing the difference between the expected code length and the entropy of the stochastic generative model. We solve this difficulty for a class of images, in which they have non-stationarity among segments. In this paper, we propose a novel stochastic generative model of images by redefining the implicit stochastic generative model in a previous coding procedure. Our model is based on the quadtree so that it effectively represents the variable block size segmentation of images. Then, we construct the Bayes code optimal for the proposed stochastic generative model. It requires the summation of all possible quadtrees weighted by their posterior. In general, its computational cost increases exponentially for the image size. However, we introduce an efficient algorithm to calculate it in the polynomial order of the image size without loss of optimality. As a result, the derived algorithm has a better average coding rate than that of JBIG.


Author(s):  
Hannah Garcia Doherty ◽  
Roberto Arnaiz Burgueño ◽  
Roeland P. Trommel ◽  
Vasileios Papanastasiou ◽  
Ronny I. A. Harmanny

Abstract Identification of human individuals within a group of 39 persons using micro-Doppler (μ-D) features has been investigated. Deep convolutional neural networks with two different training procedures have been used to perform classification. Visualization of the inner network layers revealed the sections of the input image most relevant when determining the class label of the target. A convolutional block attention module is added to provide a weighted feature vector in the channel and feature dimension, highlighting the relevant μ-D feature-filled areas in the image and improving classification performance.


2013 ◽  
Vol 427-429 ◽  
pp. 2309-2312
Author(s):  
Hai Bin Mei ◽  
Ming Hua Zhang

Alert classifiers built with the supervised classification technique require large amounts of labeled training alerts. Preparing for such training data is very difficult and expensive. Thus accuracy and feasibility of current classifiers are greatly restricted. This paper employs semi-supervised learning to build alert classification model to reduce the number of needed labeled training alerts. Alert context properties are also introduced to improve the classification performance. Experiments have demonstrated the accuracy and feasibility of our approach.


2012 ◽  
Vol 2012 ◽  
pp. 1-20 ◽  
Author(s):  
Gulshan Kumar ◽  
Krishan Kumar

In supervised learning-based classification, ensembles have been successfully employed to different application domains. In the literature, many researchers have proposed different ensembles by considering different combination methods, training datasets, base classifiers, and many other factors. Artificial-intelligence-(AI-) based techniques play prominent role in development of ensemble for intrusion detection (ID) and have many benefits over other techniques. However, there is no comprehensive review of ensembles in general and AI-based ensembles for ID to examine and understand their current research status to solve the ID problem. Here, an updated review of ensembles and their taxonomies has been presented in general. The paper also presents the updated review of various AI-based ensembles for ID (in particular) during last decade. The related studies of AI-based ensembles are compared by set of evaluation metrics driven from (1) architecture & approach followed; (2) different methods utilized in different phases of ensemble learning; (3) other measures used to evaluate classification performance of the ensembles. The paper also provides the future directions of the research in this area. The paper will help the better understanding of different directions in which research of ensembles has been done in general and specifically: field of intrusion detection systems (IDSs).


Algorithms ◽  
2020 ◽  
Vol 13 (6) ◽  
pp. 133 ◽  
Author(s):  
Gábor Kertész

Image based instance recognition is a difficult problem, in some cases even for the human eye. While latest developments in computer vision—mostly driven by deep learning—have shown that high performance models for classification or categorization can be engineered, the problem of discriminating similar objects with a low number of samples remain challenging. Advances from multi-class classification are applied for object matching problems, as the feature extraction techniques are the same; nature-inspired multi-layered convolutional nets learn the representations, and the output of such a model maps them to a multidimensional encoding space. A metric based loss brings same instance embeddings close to each other. While these solutions achieve high classification performance, low efficiency is caused by memory cost of high parameter number, which is in a relationship with input image size. Upon shrinking the input, the model requires less trainable parameters, while performance decreases. This drawback is tackled by using compressed feature extraction, e.g., projections. In this paper, a multi-directional image projection transformation with fixed vector lengths (MDIPFL) is applied for one-shot recognition tasks, trained on Siamese and Triplet architectures. Results show, that MDIPFL based approach achieves decent performance, despite of the significantly lower number of parameters.


Sign in / Sign up

Export Citation Format

Share Document