KDAS-ReID: Architecture Search for Person Re-Identification via Distilled Knowledge with Dynamic Temperature

Zhou Lei; Kangkang Yang; Kai Jiang; Shengbo Chen

doi:10.3390/a14050137

KDAS-ReID: Architecture Search for Person Re-Identification via Distilled Knowledge with Dynamic Temperature

Algorithms ◽

10.3390/a14050137 ◽

2021 ◽

Vol 14 (5) ◽

pp. 137

Author(s):

Zhou Lei ◽

Kangkang Yang ◽

Kai Jiang ◽

Shengbo Chen

Keyword(s):

State Of The Art ◽

Identification Algorithm ◽

Student Model ◽

Deep Convolutional Neural Networks ◽

Fast Speed ◽

Training Stage ◽

Knowledge Distillation ◽

And Training ◽

Better Than ◽

Teacher Model

Person re-Identification(Re-ID) based on deep convolutional neural networks (CNNs) achieves remarkable success with its fast speed. However, prevailing Re-ID models are usually built upon backbones that manually design for classification. In order to automatically design an effective Re-ID architecture, we propose a pedestrian re-identification algorithm based on knowledge distillation, called KDAS-ReID. When the knowledge of the teacher model is transferred to the student model, the importance of knowledge in the teacher model will gradually decrease with the improvement of the performance of the student model. Therefore, instead of applying the distillation loss function directly, we consider using dynamic temperatures during the search stage and training stage. Specifically, we start searching and training at a high temperature and gradually reduce the temperature to 1 so that the student model can better learn from the teacher model through soft targets. Extensive experiments demonstrate that KDAS-ReID performs not only better than other state-of-the-art Re-ID models on three benchmarks, but also better than the teacher model based on the ResNet-50 backbone.

Download Full-text

Relieving the Incompatibility of Network Representation and Classification for Long-Tailed Data Distribution

Computational Intelligence and Neuroscience ◽

10.1155/2021/6702625 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Hao Hu ◽

Mengya Gao ◽

Mingsheng Wu

Keyword(s):

Large Scale ◽

Deep Neural Networks ◽

State Of The Art ◽

Data Distribution ◽

Distribution Problem ◽

Imbalanced Dataset ◽

Network Representation ◽

Knowledge Distillation ◽

Rare Classes ◽

And Training

In the real-world scenario, data often have a long-tailed distribution and training deep neural networks on such an imbalanced dataset has become a great challenge. The main problem caused by a long-tailed data distribution is that common classes will dominate the training results and achieve a very low accuracy on the rare classes. Recent work focuses on improving the network representation ability to overcome the long-tailed problem, while it always ignores adapting the network classifier to a long-tailed case, which will cause the “incompatibility” problem of network representation and network classifier. In this paper, we use knowledge distillation to solve the long-tailed data distribution problem and fully optimize the network representation and classifier simultaneously. We propose multiexperts knowledge distillation with class-balanced sampling to jointly learn high-quality network representation and classifier. Also, a channel activation-based knowledge distillation method is also proposed to improve the performance further. State-of-the-art performance on several large-scale long-tailed classification datasets shows the superior generalization of our method.

Download Full-text

Novel Model Based on Stacked Autoencoders with Sample-Wise Strategy for Fault Diagnosis

Mathematical Problems in Engineering ◽

10.1155/2019/8985657 ◽

2019 ◽

Vol 2019 ◽

pp. 1-10

Author(s):

Diehao Kong ◽

Xuefeng Yan

Keyword(s):

Fault Diagnosis ◽

Chemical Engineering ◽

Ground Truth ◽

Student Model ◽

Teacher Student ◽

Stacked Autoencoders ◽

Knowledge Distillation ◽

New Perspective ◽

Current Student ◽

Teacher Model

Autoencoders are used for fault diagnosis in chemical engineering. To improve their performance, experts have paid close attention to regularized strategies and the creation of new and effective cost functions. However, existing methods are modified on the basis of only one model. This study provides a new perspective for strengthening the fault diagnosis model, which attempts to gain useful information from a model (teacher model) and applies it to a new model (student model). It pretrains the teacher model by fitting ground truth labels and then uses a sample-wise strategy to transfer knowledge from the teacher model. Finally, the knowledge and the ground truth labels are used to train the student model that is identical to the teacher model in terms of structure. The current student model is then used as the teacher of next student model. After step-by-step teacher-student reconfiguration and training, the optimal model is selected for fault diagnosis. Besides, knowledge distillation is applied in training procedures. The proposed method is applied to several benchmarked problems to prove its effectiveness.

Download Full-text

Policy Search by Target Distribution Learning for Continuous Control

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6156 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6770-6777

Author(s):

Chuheng Zhang ◽

Yuanqi Li ◽

Jian Li

Keyword(s):

State Of The Art ◽

Gradient Methods ◽

Continuous Control ◽

Policy Network ◽

Current Policy ◽

Training Process ◽

Target Distribution ◽

Policy Gradient ◽

And Training ◽

Better Than

It is known that existing policy gradient methods (such as vanilla policy gradient, PPO, A2C) may suffer from overly large gradients when the current policy is close to deterministic, leading to an unstable training process. We show that such instability can happen even in a very simple environment. To address this issue, we propose a new method, called target distribution learning (TDL), for policy improvement in reinforcement learning. TDL alternates between proposing a target distribution and training the policy network to approach the target distribution. TDL is more effective in constraining the KL divergence between updated policies, and hence leads to more stable policy improvements over iterations. Our experiments show that TDL algorithms perform comparably to (or better than) state-of-the-art algorithms for most continuous control tasks in the MuJoCo environment while being more stable in training.

Download Full-text

Knowledge distillation in deep learning and its applications

PeerJ Computer Science ◽

10.7717/peerj-cs.474 ◽

2021 ◽

Vol 7 ◽

pp. e474

Author(s):

Abdolmaged Alkhulaifi ◽

Fahad Alsahli ◽

Irfan Ahmad

Keyword(s):

Deep Learning ◽

Mobile Phones ◽

Learning Models ◽

Student Model ◽

Embedded Devices ◽

Research Directions ◽

Resource Limited ◽

Knowledge Distillation ◽

Teacher Model

Deep learning based models are relatively large, and it is hard to deploy such models on resource-limited devices such as mobile phones and embedded devices. One possible solution is knowledge distillation whereby a smaller model (student model) is trained by utilizing the information from a larger model (teacher model). In this paper, we present an outlook of knowledge distillation techniques applied to deep learning models. To compare the performances of different techniques, we propose a new metric called distillation metric which compares different knowledge distillation solutions based on models' sizes and accuracy scores. Based on the survey, some interesting conclusions are drawn and presented in this paper including the current challenges and possible research directions.

Download Full-text

Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/362 ◽

2021 ◽

Author(s):

Taehyeon Kim ◽

Jaehoon Oh ◽

Nak Yil Kim ◽

Sangwook Cho ◽

Se-Young Yun

Keyword(s):

Mean Squared Error ◽

Probability Distributions ◽

Student Model ◽

Kl Divergence ◽

Squared Error ◽

Leibler Divergence ◽

Temperature Scaling ◽

Knowledge Distillation ◽

The Mean ◽

Teacher Model

Knowledge distillation (KD), transferring knowledge from a cumbersome teacher model to a lightweight student model, has been investigated to design efficient neural architectures. Generally, the objective function of KD is the Kullback-Leibler (KL) divergence loss between the softened probability distributions of the teacher model and the student model with the temperature scaling hyperparameter τ. Despite its widespread use, few studies have discussed how such softening influences generalization. Here, we theoretically show that the KL divergence loss focuses on the logit matching when τ increases and the label matching when τ goes to 0 and empirically show that the logit matching is positively correlated to performance improvement in general. From this observation, we consider an intuitive KD loss function, the mean squared error (MSE) between the logit vectors, so that the student model can directly learn the logit of the teacher model. The MSE loss outperforms the KL divergence loss, explained by the penultimate layer representations difference between the two losses. Furthermore, we show that sequential distillation can improve performance and that KD, using the KL divergence loss with small τ particularly, mitigates the label noise. The code to reproduce the experiments is publicly available online at https://github.com/jhoon-oh/kd_data/.

Download Full-text

Androgen Receptor Binding Category Prediction with Deep Neural Networks and Structure-, Ligand-, and Statistically Based Features

Molecules ◽

10.3390/molecules26051285 ◽

2021 ◽

Vol 26 (5) ◽

pp. 1285

Author(s):

Alfonso T. García-Sosa

Keyword(s):

Neural Networks ◽

Androgen Receptor ◽

Logistic Model ◽

Deep Neural Networks ◽

State Of The Art ◽

Protein Structures ◽

Training Set ◽

Multivariate Logistic Model ◽

And Training ◽

Better Than

Substances that can modify the androgen receptor pathway in humans and animals are entering the environment and food chain with the proven ability to disrupt hormonal systems and leading to toxicity and adverse effects on reproduction, brain development, and prostate cancer, among others. State-of-the-art databases with experimental data of human, chimp, and rat effects by chemicals have been used to build machine-learning classifiers and regressors and to evaluate these on independent sets. Different featurizations, algorithms, and protein structures lead to different results, with deep neural networks (DNNs) on user-defined physicochemically relevant features developed for this work outperforming graph convolutional, random forest, and large featurizations. The results show that these user-provided structure-, ligand-, and statistically based features and specific DNNs provided the best results as determined by AUC (0.87), MCC (0.47), and other metrics and by their interpretability and chemical meaning of the descriptors/features. In addition, the same features in the DNN method performed better than in a multivariate logistic model: validation MCC = 0.468 and training MCC = 0.868 for the present work compared to evaluation set MCC = 0.2036 and training set MCC = 0.5364 for the multivariate logistic regression on the full, unbalanced set. Techniques of this type may improve AR and toxicity description and prediction, improving assessment and design of compounds. Source code and data are available on github.

Download Full-text

Deep Unsupervised Hashing for Large-Scale Cross-Modal Retrieval Using Knowledge Distillation Model

Computational Intelligence and Neuroscience ◽

10.1155/2021/5107034 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Mingyong Li ◽

Qiqi Li ◽

Lirong Tang ◽

Shuang Peng ◽

Yan Ma ◽

...

Keyword(s):

Large Scale ◽

Data Retrieval ◽

Multimedia Data ◽

Search Performance ◽

Similarity Matrix ◽

Student Model ◽

Deep Hashing ◽

Knowledge Distillation ◽

Semantic Alignment ◽

Teacher Model

Cross-modal hashing encodes heterogeneous multimedia data into compact binary code to achieve fast and flexible retrieval across different modalities. Due to its low storage cost and high retrieval efficiency, it has received widespread attention. Supervised deep hashing significantly improves search performance and usually yields more accurate results, but requires a lot of manual annotation of the data. In contrast, unsupervised deep hashing is difficult to achieve satisfactory performance due to the lack of reliable supervisory information. To solve this problem, inspired by knowledge distillation, we propose a novel unsupervised knowledge distillation cross-modal hashing method based on semantic alignment (SAKDH), which can reconstruct the similarity matrix using the hidden correlation information of the pretrained unsupervised teacher model, and the reconstructed similarity matrix can be used to guide the supervised student model. Specifically, firstly, the teacher model adopted an unsupervised semantic alignment hashing method, which can construct a modal fusion similarity matrix. Secondly, under the supervision of teacher model distillation information, the student model can generate more discriminative hash codes. Experimental results on two extensive benchmark datasets (MIRFLICKR-25K and NUS-WIDE) show that compared to several representative unsupervised cross-modal hashing methods, the mean average precision (MAP) of our proposed method has achieved a significant improvement. It fully reflects its effectiveness in large-scale cross-modal data retrieval.

Download Full-text

Online Knowledge Distillation with Diverse Peers

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5746 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3430-3437

Author(s):

Defang Chen ◽

Jian-Ping Mei ◽

Can Wang ◽

Yan Feng ◽

Chun Chen

Keyword(s):

Knowledge Transfer ◽

State Of The Art ◽

High Capacity ◽

Group Leader ◽

Student Model ◽

Aggregation Functions ◽

Knowledge Distillation ◽

Group Members ◽

Student Models ◽

Soft Targets

Distillation is an effective knowledge-transfer technique that uses predicted distributions of a powerful teacher model as soft targets to train a less-parameterized student model. A pre-trained high capacity teacher, however, is not always available. Recently proposed online variants use the aggregated intermediate predictions of multiple student models as targets to train each student model. Although group-derived targets give a good recipe for teacher-free distillation, group members are homogenized quickly with simple aggregation functions, leading to early saturated solutions. In this work, we propose Online Knowledge Distillation with Diverse peers (OKDDip), which performs two-level distillation during training with multiple auxiliary peers and one group leader. In the first-level distillation, each auxiliary peer holds an individual set of aggregation weights generated with an attention-based mechanism to derive its own targets from predictions of other auxiliary peers. Learning from distinct target distributions helps to boost peer diversity for effectiveness of group-based distillation. The second-level distillation is performed to transfer the knowledge in the ensemble of auxiliary peers further to the group leader, i.e., the model used for inference. Experimental results show that the proposed framework consistently gives better performance than state-of-the-art approaches without sacrificing training or inference complexity, demonstrating the effectiveness of the proposed two-level distillation framework.

Download Full-text

Androgen Receptor Binding Category Prediction with Deep Neural Networks and Structure-, Ligand-, and Statistically-Based Features

10.20944/preprints202102.0318.v3 ◽

2021 ◽

Author(s):

Alfonso T. García-Sosa

Keyword(s):

Neural Networks ◽

Androgen Receptor ◽

Logistic Model ◽

Deep Neural Networks ◽

State Of The Art ◽

Protein Structures ◽

Training Set ◽

Multivariate Logistic Model ◽

And Training ◽

Better Than

Substances that can modify the androgen receptor pathway in humans and animals are entering the environment and food chain with the proven ability to disrupt hormonal systems and leading to toxicity and adverse effects on reproduction, brain development, and prostate cancer, among others. State-of-the-art databases with experimental data of human, chimp, and rat effects by chemicals have been used to build machine learning classifiers and regressors and evaluate these on independent sets. Different featurizations, algorithms, and protein structures lead to dif- ferent results, with deep neural networks (DNNs) on user-defined physicochemically-relevant features developed for this work outperforming graph convolutional, random forest, and large featurizations. The results show that these user-provided structure-, ligand-, and statistically-based features and specific DNNs provided the best results as determined by AUC (0.87), MCC (0.47), and other metrics and by their interpretability and chemical meaning of the descriptors/features. In addition, the same features in the DNN method performed better than in a multivariate logistic model: validation MCC = 0.468 and training MCC = 0.868 for the present work compared to evalu- ation set MCC = 0.2036 and training set MCC = 0.5364 for the multivariate logistic regression on the full, unbalanced set. Techniques of this type may improve AR and toxicity description and predic- tion, improving assessment and design of compounds. Source code and data are available at https://github.com/AlfonsoTGarcia-Sosa/ML

Download Full-text

Dynamic Residual Dense Network for Image Denoising

Sensors ◽

10.3390/s19173809 ◽

2019 ◽

Vol 19 (17) ◽

pp. 3809 ◽

Cited By ~ 8

Author(s):

Yuda Song ◽

Yunfang Zhu ◽

Xin Du

Keyword(s):

Real World ◽

State Of The Art ◽

Computational Cost ◽

Dynamic Network ◽

Input Image ◽

Dense Network ◽

Deep Convolutional Neural Networks ◽

Great Performance ◽

Image Noise Reduction ◽

Better Than

Deep convolutional neural networks have achieved great performance on various image restoration tasks. Specifically, the residual dense network (RDN) has achieved great results on image noise reduction by cascading multiple residual dense blocks (RDBs) to make full use of the hierarchical feature. However, the RDN only performs well in denoising on a single noise level, and the computational cost of the RDN increases significantly with the increase in the number of RDBs, and this only slightly improves the effect of denoising. To overcome this, we propose the dynamic residual dense network (DRDN), a dynamic network that can selectively skip some RDBs based on the noise amount of the input image. Moreover, the DRDN allows modifying the denoising strength to manually get the best outputs, which can make the network more effective for real-world denoising. Our proposed DRDN can perform better than the RDN and reduces the computational cost by 40 – 50 % . Furthermore, we surpass the state-of-the-art CBDNet by 1.34 dB on the real-world noise benchmark.

Download Full-text