“In-Network Ensemble”: Deep Ensemble Learning with Diversified Knowledge Distillation

Xingjian Li; Haoyi Xiong; Zeyu Chen; Jun Huan; Cheng-Zhong Xu; Dejing Dou

doi:10.1145/3473464

“In-Network Ensemble”: Deep Ensemble Learning with Diversified Knowledge Distillation

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3473464 ◽

2021 ◽

Vol 12 (5) ◽

pp. 1-19

Author(s):

Xingjian Li ◽

Haoyi Xiong ◽

Zeyu Chen ◽

Jun Huan ◽

Cheng-Zhong Xu ◽

...

Keyword(s):

Supervised Learning ◽

Transfer Learning ◽

Ensemble Learning ◽

Large Scale ◽

Ensemble Classifier ◽

Network Architectures ◽

Deep Convolutional Neural Networks ◽

Knowledge Distillation ◽

Real World Datasets ◽

Improved Accuracy

Ensemble learning is a widely used technique to train deep convolutional neural networks (CNNs) for improved robustness and accuracy. While existing algorithms usually first train multiple diversified networks and then assemble these networks as an aggregated classifier, we propose a novel learning paradigm, namely, “In-Network Ensemble” ( INE ) that incorporates the diversity of multiple models through training a SINGLE deep neural network. Specifically, INE segments the outputs of the CNN into multiple independent classifiers, where each classifier is further fine-tuned with better accuracy through a so-called diversified knowledge distillation process . We then aggregate the fine-tuned independent classifiers using an Averaging-and-Softmax operator to obtain the final ensemble classifier. Note that, in the supervised learning settings, INE starts the CNN training from random, while, under the transfer learning settings, it also could start with a pre-trained model to incorporate the knowledge learned from additional datasets. Extensive experiments have been done using eight large-scale real-world datasets, including CIFAR, ImageNet, and Stanford Cars, among others, as well as common deep network architectures such as VGG, ResNet, and Wide ResNet. We have evaluated the method under two tasks: supervised learning and transfer learning. The results show that INE outperforms the state-of-the-art algorithms for deep ensemble learning with improved accuracy.

Download Full-text

Cooperative Hybrid Semi-Supervised Learning for Text Sentiment Classification

Symmetry ◽

10.3390/sym11020133 ◽

2019 ◽

Vol 11 (2) ◽

pp. 133 ◽

Cited By ~ 2

Author(s):

Yang Li ◽

Ying Lv ◽

Suge Wang ◽

Jiye Liang ◽

Juanzi Li ◽

...

Keyword(s):

Supervised Learning ◽

Large Scale ◽

Ensemble Classifier ◽

Sentiment Classification ◽

Training Dataset ◽

Support Vector ◽

Seed Selection ◽

Training Strategy ◽

Whole Process ◽

Self Learning

A large-scale and high-quality training dataset is an important guarantee to learn an ideal classifier for text sentiment classification. However, manually constructing such a training dataset with sentiment labels is a labor-intensive and time-consuming task. Therefore, based on the idea of effectively utilizing unlabeled samples, a synthetical framework that covers the whole process of semi-supervised learning from seed selection, iterative modification of the training text set, to the co-training strategy of the classifier is proposed in this paper for text sentiment classification. To provide an important basis for selecting the seed texts and modifying the training text set, three kinds of measures—the cluster similarity degree of an unlabeled text, the cluster uncertainty degree of a pseudo-label text to a learner, and the reliability degree of a pseudo-label text to a learner—are defined. With these measures, a seed selection method based on Random Swap clustering, a hybrid modification method of the training text set based on active learning and self-learning, and an alternately co-training strategy of the ensemble classifier of the Maximum Entropy and Support Vector Machine are proposed and combined into our framework. The experimental results on three Chinese datasets (COAE2014, COAE2015, and a Hotel review, respectively) and five English datasets (Books, DVD, Electronics, Kitchen, and MR, respectively) in the real world verify the effectiveness of the proposed framework.

Download Full-text

TRANSFER LEARNING FOR CYTOCHROME P450 ISOZYME SELECTIVITY PREDICTION

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720011005434 ◽

2011 ◽

Vol 09 (04) ◽

pp. 521-540 ◽

Cited By ~ 1

Author(s):

REIJI TERAMOTO ◽

TSUYOSHI KATO

Keyword(s):

Prediction Model ◽

Supervised Learning ◽

Transfer Learning ◽

Large Scale ◽

Predictive Performance ◽

The Other ◽

Activity Data ◽

Supervised Learning Algorithms ◽

P450 Isozymes ◽

Isozyme Activity

In the drug discovery process, the metabolic fate of drugs is crucially important to prevent drug–drug interactions. Therefore, P450 isozyme selectivity prediction is an important task for screening drugs of appropriate metabolism profiles. Recently, large-scale activity data of five P450 isozymes (CYP1A2 CYP2C9, CYP3A4, CYP2D6, and CYP2C19) have been obtained using quantitative high-throughput screening with a bioluminescence assay. Although some isozymes share similar selectivities, conventional supervised learning algorithms independently learn a prediction model from each P450 isozyme. They are unable to exploit the other P450 isozyme activity data to improve the predictive performance of each P450 isozyme's selectivity. To address this issue, we apply transfer learning that uses activity data of the other isozymes to learn a prediction model from multiple P450 isozymes. After using the large-scale P450 isozyme selectivity dataset for five P450 isozymes, we evaluate the model's predictive performance. Experimental results show that, overall, our algorithm outperforms conventional supervised learning algorithms such as support vector machine (SVM), Weighted k-nearest neighbor classifier, Bagging, Adaboost, and latent semantic indexing (LSI). Moreover, our results show that the predictive performance of our algorithm is improved by exploiting the multiple P450 isozyme activity data in the learning process. Our algorithm can be an effective tool for P450 selectivity prediction for new chemical entities using multiple P450 isozyme activity data.

Download Full-text

Multi-Multimodality Integrated Stack-Ensemble Learning for the Prediction of Gleason Grade and Prognostic Outcome in Prostate Cancer: A Proof-of-Concept Study

10.21203/rs.3.rs-512084/v1 ◽

2021 ◽

Author(s):

Jie Bao ◽

Ying Hou ◽

Rui Zhi ◽

Ximing Wang ◽

Haibin Shi ◽

...

Keyword(s):

Prostate Cancer ◽

Transfer Learning ◽

Ensemble Learning ◽

Predictive Value ◽

Large Scale ◽

External Validation ◽

Specific Antigen ◽

Gleason Grade ◽

Validation Data ◽

Prognostic Outcome

Abstract Purpose To develop a generalizable model, namely PRISK, for the prediction of Gleason grade and prognostic outcome in prostate cancer (PCa) with multiple clinical factors and multiparametric (mp) MRI using stack-ensemble learning. Methods PRISK is developed to primarily assess PCa Gleason grade between benign (pG0), 3 + 3 (pG1), 3 + 4 (pG2), 4 + 3 (pG3) and ≥ 4 + 4 (pG4) and secondly predict the biochemical recurrence (BCR) after radical prostatectomy (RP). PRISK was developed with a stacked-ensemble learning of large-scale clinical identifications and mpMRI data in 671 training datasets, and was validated in 232 internal and 539 external datasets. Results The stacked-ensemble learning of mpMRI delivered a Radiomics-score and 5 transfer learning signatures from 5 deep transfer learning embedders. The PRISK, build with 10 clinical and imaging embedded predictors, achieved an area under the roc curve of 0.783, 0.798 and 0.762 in training, internal and external validation data for classifying Gleason grade, respectively. Specially, combined use of prostate-specific antigen (PSA), PI-RADS and Radiomics-score had excellent negative predictive value (94.1%) for clinical insignificant disease (pG0-1) and high positive predictive value (79.8%) for high-risk PCa (pG4). PSA ≥ 20 ng/ml (odds ratio [OR], 1.58; 95% confidence intervals [CIs], 1.20–2.08; p = 0.001) and PRISK ≥ G3 (OR, 1.45; 95% CI, 1.12–1.88; p = 0.005) were independent predictors of BCR, with a C-index of 0.76 (95% CI, 0.73–0.79) for predicting BCR by Cox analysis. Conclusions We concluded that the PRISK can offer a noninvasive alternative to stratify PCa Gleason grade. This enables a step towards PCa risk stratification.

Download Full-text

Deep convolutional neural networks with ensemble learning and transfer learning for capacity estimation of lithium-ion batteries

Applied Energy ◽

10.1016/j.apenergy.2019.114296 ◽

2020 ◽

Vol 260 ◽

pp. 114296 ◽

Cited By ~ 13

Author(s):

Sheng Shen ◽

Mohammadkazem Sadoughi ◽

Meng Li ◽

Zhengdao Wang ◽

Chao Hu

Keyword(s):

Neural Networks ◽

Lithium Ion Batteries ◽

Transfer Learning ◽

Ensemble Learning ◽

Convolutional Neural Networks ◽

Lithium Ion ◽

Capacity Estimation ◽

Deep Convolutional Neural Networks

Download Full-text

Joint regression and learning from pairwise rankings for personalized image aesthetic assessment

Computational Visual Media ◽

10.1007/s41095-021-0207-y ◽

2021 ◽

Author(s):

Jin Zhou ◽

Qing Zhang ◽

Jian-Hao Fan ◽

Wei Sun ◽

Wei-Shi Zheng

Keyword(s):

Large Scale ◽

Assessment Model ◽

Generic Model ◽

Small Subset ◽

Deep Convolutional Neural Networks ◽

Personal Taste ◽

Hinge Loss ◽

Novel Approach ◽

Large Scale Dataset ◽

Image Pairs

AbstractRecent image aesthetic assessment methods have achieved remarkable progress due to the emergence of deep convolutional neural networks (CNNs). However, these methods focus primarily on predicting generally perceived preference of an image, making them usually have limited practicability, since each user may have completely different preferences for the same image. To address this problem, this paper presents a novel approach for predicting personalized image aesthetics that fit an individual user’s personal taste. We achieve this in a coarse to fine manner, by joint regression and learning from pairwise rankings. Specifically, we first collect a small subset of personal images from a user and invite him/her to rank the preference of some randomly sampled image pairs. We then search for the K-nearest neighbors of the personal images within a large-scale dataset labeled with average human aesthetic scores, and use these images as well as the associated scores to train a generic aesthetic assessment model by CNN-based regression. Next, we fine-tune the generic model to accommodate the personal preference by training over the rankings with a pairwise hinge loss. Experiments demonstrate that our method can effectively learn personalized image aesthetic preferences, clearly outperforming state-of-the-art methods. Moreover, we show that the learned personalized image aesthetic benefits a wide variety of applications.

Download Full-text

Time-Efficient Ensemble Learning with Sample Exchange for Edge Computing

ACM Transactions on Internet Technology ◽

10.1145/3409265 ◽

2021 ◽

Vol 21 (3) ◽

pp. 1-17

Author(s):

Wu Chen ◽

Yong Yu ◽

Keke Gai ◽

Jiamou Liu ◽

Kim-Kwang Raymond Choo

Keyword(s):

Ensemble Learning ◽

Real World ◽

Interaction Mechanism ◽

Training Model ◽

Edge Computing ◽

Learning Techniques ◽

Multi Agent ◽

Real World Datasets ◽

Entire Dataset ◽

Exchange Data

In existing ensemble learning algorithms (e.g., random forest), each base learner’s model needs the entire dataset for sampling and training. However, this may not be practical in many real-world applications, and it incurs additional computational costs. To achieve better efficiency, we propose a decentralized framework: Multi-Agent Ensemble. The framework leverages edge computing to facilitate ensemble learning techniques by focusing on the balancing of access restrictions (small sub-dataset) and accuracy enhancement. Specifically, network edge nodes (learners) are utilized to model classifications and predictions in our framework. Data is then distributed to multiple base learners who exchange data via an interaction mechanism to achieve improved prediction. The proposed approach relies on a training model rather than conventional centralized learning. Findings from the experimental evaluations using 20 real-world datasets suggest that Multi-Agent Ensemble outperforms other ensemble approaches in terms of accuracy even though the base learners require fewer samples (i.e., significant reduction in computation costs).

Download Full-text

Improving Semi-Supervised Learning for Audio Classification with FixMatch

Electronics ◽

10.3390/electronics10151807 ◽

2021 ◽

Vol 10 (15) ◽

pp. 1807

Author(s):

Sascha Grollmisch ◽

Estefanía Cano

Keyword(s):

Neural Networks ◽

Supervised Learning ◽

Transfer Learning ◽

Data Transfer ◽

State Of The Art ◽

Training Data ◽

Audio Classification ◽

Image Domain ◽

Full Dataset ◽

Audio Data

Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement.

Download Full-text

Algorithm Selection Framework for Legalization Using Deep Convolutional Neural Networks and Transfer Learning

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ◽

10.1109/tcad.2021.3079126 ◽

2021 ◽

pp. 1-1

Author(s):

Renan Netto ◽

Sheiny Fabre ◽

Tiago Augusto Fontana ◽

Vinicius Livramento ◽

Laercio L. Pilla ◽

...

Keyword(s):

Neural Networks ◽

Transfer Learning ◽

Convolutional Neural Networks ◽

Deep Convolutional Neural Networks ◽

Algorithm Selection ◽

Selection Framework

Download Full-text

A Survey on Contrastive Self-Supervised Learning

Technologies ◽

10.3390/technologies9010002 ◽

2020 ◽

Vol 9 (1) ◽

pp. 2

Author(s):

Ashish Jaiswal ◽

Ashwin Ramesh Babu ◽

Mohammad Zaki Zadeh ◽

Debapriya Banerjee ◽

Fillia Makedon

Keyword(s):

Computer Vision ◽

Supervised Learning ◽

Language Processing ◽

Large Scale ◽

Performance Comparison ◽

Extensive Review ◽

Future Directions ◽

Dominant Component ◽

Supervised Methods ◽

The Cost

Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as supervision and use the learned representations for several downstream tasks. Specifically, contrastive learning has recently become a dominant component in self-supervised learning for computer vision, natural language processing (NLP), and other domains. It aims at embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples. This paper provides an extensive review of self-supervised methods that follow the contrastive approach. The work explains commonly used pretext tasks in a contrastive learning setup, followed by different architectures that have been proposed so far. Next, we present a performance comparison of different methods for multiple downstream tasks such as image classification, object detection, and action recognition. Finally, we conclude with the limitations of the current methods and the need for further techniques and future directions to make meaningful progress.

Download Full-text

Multilevel Recognition of UAV-to-Ground Targets Based on Micro-Doppler Signatures and Transfer Learning of Deep Convolutional Neural Networks

IEEE Transactions on Instrumentation and Measurement ◽

10.1109/tim.2020.3034616 ◽

2021 ◽

Vol 70 ◽

pp. 1-11

Author(s):

Lingzhi Zhu ◽

Shuning Zhang ◽

Xun Wang ◽

Si Chen ◽

Huichang Zhao ◽

...

Keyword(s):

Neural Networks ◽

Transfer Learning ◽

Convolutional Neural Networks ◽

Deep Convolutional Neural Networks

Download Full-text