Knowledge Distillation for Semantic Segmentation Using Channel and Spatial Correlations and Adaptive Cross Entropy

Sangyong Park; Yong Seok Heo

doi:10.3390/s20164616

Knowledge Distillation for Semantic Segmentation Using Channel and Spatial Correlations and Adaptive Cross Entropy

Sensors ◽

10.3390/s20164616 ◽

2020 ◽

Vol 20 (16) ◽

pp. 4616

Author(s):

Sangyong Park ◽

Yong Seok Heo

Keyword(s):

Spatial Correlation ◽

Loss Function ◽

Spatial Information ◽

Ground Truth ◽

Semantic Segmentation ◽

Cross Entropy ◽

Spatial Correlations ◽

Feature Maps ◽

Teacher Networks ◽

Knowledge Distillation

In this paper, we propose an efficient knowledge distillation method to train light networks using heavy networks for semantic segmentation. Most semantic segmentation networks that exhibit good accuracy are based on computationally expensive networks. These networks are not suitable for mobile applications using vision sensors, because computational resources are limited in these environments. In this view, knowledge distillation, which transfers knowledge from heavy networks acting as teachers to light networks as students, is suitable methodology. Although previous knowledge distillation approaches have been proven to improve the performance of student networks, most methods have some limitations. First, they tend to use only the spatial correlation of feature maps and ignore the relational information of their channels. Second, they can transfer false knowledge when the results of the teacher networks are not perfect. To address these two problems, we propose two loss functions: a channel and spatial correlation (CSC) loss function and an adaptive cross entropy (ACE) loss function. The former computes the full relationship of both the channel and spatial information in the feature map, and the latter adaptively exploits one-hot encodings using the ground truth labels and the probability maps predicted by the teacher network. To evaluate our method, we conduct experiments on scene parsing datasets: Cityscapes and Camvid. Our method presents significantly better performance than previous methods.

Download Full-text

Novel Method of Semantic Segmentation Applicable to Augmented Reality

Sensors ◽

10.3390/s20061737 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1737 ◽

Cited By ~ 1

Author(s):

Tae-young Ko ◽

Seung-ho Lee

Keyword(s):

Augmented Reality ◽

Spatial Information ◽

Ground Truth ◽

Semantic Segmentation ◽

Frame Rate ◽

Feature Maps ◽

Residual Network ◽

Feature Map ◽

Pascal Voc ◽

Novel Method

This paper proposes a novel method of semantic segmentation, consisting of modified dilated residual network, atrous pyramid pooling module, and backpropagation, that is applicable to augmented reality (AR). In the proposed method, the modified dilated residual network extracts a feature map from the original images and maintains spatial information. The atrous pyramid pooling module places convolutions in parallel and layers feature maps in a pyramid shape to extract objects occupying small areas in the image; these are converted into one channel using a 1 × 1 convolution. Backpropagation compares the semantic segmentation obtained through convolution from the final feature map with the ground truth provided by a database. Losses can be reduced by applying backpropagation to the modified dilated residual network to change the weighting. The proposed method was compared with other methods on the Cityscapes and PASCAL VOC 2012 databases. The proposed method achieved accuracies of 82.8 and 89.8 mean intersection over union (mIOU) and frame rates of 61 and 64.3 frames per second (fps) for the Cityscapes and PASCAL VOC 2012 databases, respectively. These results prove the applicability of the proposed method for implementing natural AR applications at actual speeds because the frame rate is greater than 60 fps.

Download Full-text

Learning Fully Dense Neural Networks for Image Semantic Segmentation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019283 ◽

2019 ◽

Vol 33 ◽

pp. 9283-9290 ◽

Cited By ~ 3

Author(s):

Mingmin Zhen ◽

Jinglu Wang ◽

Lei Zhou ◽

Tian Fang ◽

Long Quan

Keyword(s):

Neural Network ◽

Neural Networks ◽

Loss Function ◽

Spatial Information ◽

Semantic Segmentation ◽

The Other ◽

Feature Maps ◽

Spatial Reconstruction ◽

Benchmark Datasets ◽

The One

Semantic segmentation is pixel-wise classification which retains critical spatial information. The “feature map reuse” has been commonly adopted in CNN based approaches to take advantage of feature maps in the early layers for the later spatial reconstruction. Along this direction, we go a step further by proposing a fully dense neural network with an encoderdecoder structure that we abbreviate as FDNet. For each stage in the decoder module, feature maps of all the previous blocks are adaptively aggregated to feedforward as input. On the one hand, it reconstructs the spatial boundaries accurately. On the other hand, it learns more efficiently with the more efficient gradient backpropagation. In addition, we propose the boundary-aware loss function to focus more attention on the pixels near the boundary, which boosts the “hard examples” labeling. We have demonstrated the best performance of the FDNet on the two benchmark datasets: PASCAL VOC 2012, NYUDv2 over previous works when not considering training on other datasets.

Download Full-text

A Densely Connected Network Based on U-Net for Medical Image Segmentation

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3446618 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-14

Author(s):

Zhenzhen Yang ◽

Pengfei Xu ◽

Yongpeng Yang ◽

Bing-Kun Bao

Keyword(s):

Feature Extraction ◽

Image Segmentation ◽

Loss Function ◽

Network Architecture ◽

Medical Image ◽

Medical Image Segmentation ◽

Cross Entropy ◽

Loss Functions ◽

Feature Maps ◽

Different Levels

The U-Net has become the most popular structure in medical image segmentation in recent years. Although its performance for medical image segmentation is outstanding, a large number of experiments demonstrate that the classical U-Net network architecture seems to be insufficient when the size of segmentation targets changes and the imbalance happens between target and background in different forms of segmentation. To improve the U-Net network architecture, we develop a new architecture named densely connected U-Net (DenseUNet) network in this article. The proposed DenseUNet network adopts a dense block to improve the feature extraction capability and employs a multi-feature fuse block fusing feature maps of different levels to increase the accuracy of feature extraction. In addition, in view of the advantages of the cross entropy and the dice loss functions, a new loss function for the DenseUNet network is proposed to deal with the imbalance between target and background. Finally, we test the proposed DenseUNet network and compared it with the multi-resolutional U-Net (MultiResUNet) and the classic U-Net networks on three different datasets. The experimental results show that the DenseUNet network has significantly performances compared with the MultiResUNet and the classic U-Net networks.

Download Full-text

DsUnet: A New Network Structure for Detection and Segmentation of Ultrasound Breast Lesions

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2020.2914 ◽

2020 ◽

Vol 10 (3) ◽

pp. 661-666 ◽

Cited By ~ 2

Author(s):

Shaoguo Cui ◽

Moyu Chen ◽

Chang Liu

Keyword(s):

Loss Function ◽

Performance Metrics ◽

Poor Quality ◽

Breast Ultrasound ◽

Cross Entropy ◽

Breast Lesions ◽

Dice Coefficient ◽

Feature Maps ◽

Convolutional Network ◽

Semantic Classification

Breast cancer is one of the leading causes of death among the women worldwide. The clinical medical system urgently needs an accurate and automatic breast segmentation method in order to detect the breast ultrasound lesions. Recently, some studies show that deep learning methods based on fully convolutional network, have demonstrated a competitive performance in breast ultrasound segmentation. However, some features are missed in the Unet in case of down-sampling that results in a low segmentation accuracy. Furthermore, there is a semantic gap between the feature maps of decoder and encoder in Unet, so the simple fusion of high and low level features is not conducive to the semantic classification of pixels. In addition, the poor quality of breast ultrasound also affects the accuracy of image segmentation. To solve these problems, we propose a new end-toend network model called Dense skip Unet (DsUnet), which consists of the Unet backbone, short skip connection and deep supervision. The proposed method can effectively avoid the missing of feature information caused by down-sampling and implement the fusion of multilevel semantic information. We used a new loss function to optimize the DsUnet, which is composed of a binary cross-entropy and dice coefficient. We employed the True Positive Fraction (TPF), False Positives per image (FPs) and F -measure as performance metrics for evaluating various methods. In this paper, we adopted the UDIAT 212 dataset and the experimental results validate that our new approach achieved better performance than other existing methods in detecting and segmenting the ultrasound breast lesions. When we used the DsUnet model and new loss function (binary cross-entropy + dice coefficient), the best performance indexes can be achieved, i.e., 0.87 in TPF, 0.13 in FPs/image and 0.86 in F-measure.

Download Full-text

MS-AFF: A Novel Semantic Segmentation Approach for Buried Object Based on Multi-scale Attentional Feature Fusion

10.21203/rs.3.rs-193757/v1 ◽

2021 ◽

Author(s):

Chao Lu ◽

Fansheng Chen ◽

Xiaofeng Su ◽

Dan Zeng

Keyword(s):

Deep Learning ◽

Spatial Information ◽

Feature Fusion ◽

Infrared Image ◽

Semantic Segmentation ◽

Target Object ◽

Infrared Images ◽

Feature Maps ◽

Multi Scale ◽

Visible Images

Abstract Infrared technology is a widely used in precision guidance and mine detection since it can capture the heat radiated outward from the target object. We use infrared (IR) thermography to get the infrared image of the buried obje cts. Compared to the visible images, infrared images present poor resolution, low contrast, and fuzzy visual effect, which make it difficult to segment the target object, specifically in the complex backgrounds. In this condition, traditional segmentation methods cannot perform well in infrared images since they are easily disturbed by the noise and non-target objects in the images. With the advance of deep convolutional neural network (CNN), the deep learning-based methods have made significant improvements in semantic segmentation task. However, few of them research Infrared image semantic segmentation, which is a more challenging scenario compared to visible images. Moreover, the lack of an Infrared image dataset is also a problem for current methods based on deep learning. We raise a multi-scale attentional feature fusion (MS-AFF) module for infrared image semantic segmentation to solve this problem. Precisely, we integrate a series of feature maps from different levels by an atrous spatial pyramid structure. In this way, the model can obtain rich representation ability on the infrared images. Besides, a global spatial information attention module is employed to let the model focus on the target region and reduce disturbance in infrared images' background. In addition, we propose an infrared segmentation dataset based on the infrared thermal imaging system. Extensive experiments conducted in the infrared image segmentation dataset show the superiority of our method.

Download Full-text

Building Outline Extraction Directly Using the U2-Net Semantic Segmentation Model from High-Resolution Aerial Images and a Comparison Study

Remote Sensing ◽

10.3390/rs13163187 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3187

Author(s):

Xinchun Wei ◽

Xing Li ◽

Wei Liu ◽

Lianpeng Zhang ◽

Dayu Cheng ◽

...

Keyword(s):

Edge Detection ◽

Loss Function ◽

Semantic Segmentation ◽

Cross Entropy ◽

Aerial Images ◽

Building Extraction ◽

Precise Position ◽

Entropy Loss ◽

Imbalance Problem ◽

Outline Extraction

Deep learning techniques have greatly improved the efficiency and accuracy of building extraction using remote sensing images. However, high-quality building outline extraction results that can be applied to the field of surveying and mapping remain a significant challenge. In practice, most building extraction tasks are manually executed. Therefore, an automated procedure of a building outline with a precise position is required. In this study, we directly used the U2-net semantic segmentation model to extract the building outline. The extraction results showed that the U2-net model can provide the building outline with better accuracy and a more precise position than other models based on comparisons with semantic segmentation models (Segnet, U-Net, and FCN) and edge detection models (RCF, HED, and DexiNed) applied for two datasets (Nanjing and Wuhan University (WHU)). We also modified the binary cross-entropy loss function in the U2-net model into a multiclass cross-entropy loss function to directly generate the binary map with the building outline and background. We achieved a further refined outline of the building, thus showing that with the modified U2-net model, it is not necessary to use non-maximum suppression as a post-processing step, as in the other edge detection models, to refine the edge map. Moreover, the modified model is less affected by the sample imbalance problem. Finally, we created an image-to-image program to further validate the modified U2-net semantic segmentation model for building outline extraction.

Download Full-text

GENERATION OF GROUND TRUTH DATASETS FOR THE ANALYSIS OF 3D POINT CLOUDS IN URBAN SCENES ACQUIRED VIA DIFFERENT SENSORS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-2009-2018 ◽

2018 ◽

Vol XLII-3 ◽

pp. 2009-2015 ◽

Cited By ~ 1

Author(s):

Y. Xu ◽

Z. Sun ◽

R. Boerner ◽

T. Koch ◽

L. Hoegner ◽

...

Keyword(s):

Point Cloud ◽

Spatial Information ◽

Ground Truth ◽

Semantic Segmentation ◽

Point Clouds ◽

Urban Scenes ◽

3D Space ◽

3D Point Clouds ◽

Testing Site ◽

Voxel Grid

In this work, we report a novel way of generating ground truth dataset for analyzing point cloud from different sensors and the validation of algorithms. Instead of directly labeling large amount of 3D points requiring time consuming manual work, a multi-resolution 3D voxel grid for the testing site is generated. Then, with the help of a set of basic labeled points from the reference dataset, we can generate a 3D labeled space of the entire testing site with different resolutions. Specifically, an octree-based voxel structure is applied to voxelize the annotated reference point cloud, by which all the points are organized by 3D grids of multi-resolutions. When automatically annotating the new testing point clouds, a voting based approach is adopted to the labeled points within multiple resolution voxels, in order to assign a semantic label to the 3D space represented by the voxel. Lastly, robust line- and plane-based fast registration methods are developed for aligning point clouds obtained via various sensors. Benefiting from the labeled 3D spatial information, we can easily create new annotated 3D point clouds of different sensors of the same scene directly by considering the corresponding labels of 3D space the points located, which would be convenient for the validation and evaluation of algorithms related to point cloud interpretation and semantic segmentation.

Download Full-text

Adversarial Erasing method based on graph neural network

Journal of Physics Conference Series ◽

10.1088/1742-6596/2083/4/042083 ◽

2021 ◽

Vol 2083 (4) ◽

pp. 042083

Author(s):

Shuhan Liu

Keyword(s):

Neural Network ◽

Neural Networks ◽

Loss Function ◽

Deep Neural Networks ◽

Ground Truth ◽

Semantic Segmentation ◽

Data Sets ◽

Recent Developments ◽

Weakly Supervised ◽

Network Weight

Abstract Semantic segmentation is a traditional task that requires a large number of pixel-level ground truth label data sets, which is time-consuming and expensive. Recent developments in weakly-supervised settings have shown that reasonable performance can be obtained using only image-level labels. Classification is often used as an agent task to train deep neural networks and extract attention maps from them. The classification task only needs less supervision information to obtain the most discriminative part of the object. For this purpose, we propose a new end-to-end counter-wipe network. Compared with the baseline network, we propose a method to apply the graph neural network to obtain the first CAM. It is proposed to train the joint loss function to avoid the network weight sharing and cause the network to fall into a saddle point. Our experiments on the Pascal VOC2012 dataset show that 64.9% segmentation performance is obtained, which is an improvement of 2.1% compared to our baseline.

Download Full-text

A System for Weeds and Crops Identification—Reaching over 10 FPS on Raspberry Pi with the Usage of MobileNets, DenseNet and Custom Modifications

Sensors ◽

10.3390/s19173787 ◽

2019 ◽

Vol 19 (17) ◽

pp. 3787

Author(s):

Łukasz Chechliński ◽

Barbara Siemiątkowska ◽

Michał Majewski

Keyword(s):

Low Cost ◽

Ground Truth ◽

Semantic Segmentation ◽

Research Area ◽

Raspberry Pi ◽

University Of Technology ◽

Knowledge Distillation ◽

Important Research Area ◽

Crop Area ◽

Mobile Computers

Automated weeding is an important research area in agrorobotics. Weeds can be removed mechanically or with the precise usage of herbicides. Deep Learning techniques achieved state of the art results in many computer vision tasks, however their deployment on low-cost mobile computers is still challenging. The described system contains several novelties, compared both with its previous version and related work. It is a part of a project of the automatic weeding machine, developed by the Warsaw University of Technology and MCMS Warka Ltd. Obtained models reach satisfying accuracy (detecting 47–67% of weed area, misclasifing as weed 0.1–0.9% of crop area) at over 10 FPS on the Raspberry Pi 3B+ computer. It was tested for four different plant species at different growth stadiums and lighting conditions. The system performing semantic segmentation is based on Convolutional Neural Networks. Its custom architecture combines U-Net, MobileNets, DenseNet and ResNet concepts. Amount of needed manual ground truth labels was significantly decreased by the usage of the knowledge distillation process, learning final model which mimics an ensemble of complex models on a large database of unlabeled data. Further decrease of the inference time was obtained by two custom modifications: in the usage of separable convolutions in DenseNet block and in the number of channels in each layer. In the authors’ opinion, the described novelties can be easily transferred to other agrorobotics tasks.

Download Full-text

Combining Deep Semantic Segmentation Network and Graph Convolutional Neural Network for Semantic Segmentation of Remote Sensing Imagery

Remote Sensing ◽

10.3390/rs13010119 ◽

2020 ◽

Vol 13 (1) ◽

pp. 119

Author(s):

Song Ouyang ◽

Yansheng Li

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Convolutional Neural Network ◽

Spatial Information ◽

Spatial Relationship ◽

Semantic Segmentation ◽

Extraction Ability ◽

Feature Maps ◽

High Level ◽

Graph Nodes

Although the deep semantic segmentation network (DSSN) has been widely used in remote sensing (RS) image semantic segmentation, it still does not fully mind the spatial relationship cues between objects when extracting deep visual features through convolutional filters and pooling layers. In fact, the spatial distribution between objects from different classes has a strong correlation characteristic. For example, buildings tend to be close to roads. In view of the strong appearance extraction ability of DSSN and the powerful topological relationship modeling capability of the graph convolutional neural network (GCN), a DSSN-GCN framework, which combines the advantages of DSSN and GCN, is proposed in this paper for RS image semantic segmentation. To lift the appearance extraction ability, this paper proposes a new DSSN called the attention residual U-shaped network (AttResUNet), which leverages residual blocks to encode feature maps and the attention module to refine the features. As far as GCN, the graph is built, where graph nodes are denoted by the superpixels and the graph weight is calculated by considering the spectral information and spatial information of the nodes. The AttResUNet is trained to extract the high-level features to initialize the graph nodes. Then the GCN combines features and spatial relationships between nodes to conduct classification. It is worth noting that the usage of spatial relationship knowledge boosts the performance and robustness of the classification module. In addition, benefiting from modeling GCN on the superpixel level, the boundaries of objects are restored to a certain extent and there are less pixel-level noises in the final classification result. Extensive experiments on two publicly open datasets show that DSSN-GCN model outperforms the competitive baseline (i.e., the DSSN model) and the DSSN-GCN when adopting AttResUNet achieves the best performance, which demonstrates the advance of our method.

Download Full-text