scholarly journals Knowledge Distillation for Semantic Segmentation Using Channel and Spatial Correlations and Adaptive Cross Entropy

Sensors ◽  
2020 ◽  
Vol 20 (16) ◽  
pp. 4616
Author(s):  
Sangyong Park ◽  
Yong Seok Heo

In this paper, we propose an efficient knowledge distillation method to train light networks using heavy networks for semantic segmentation. Most semantic segmentation networks that exhibit good accuracy are based on computationally expensive networks. These networks are not suitable for mobile applications using vision sensors, because computational resources are limited in these environments. In this view, knowledge distillation, which transfers knowledge from heavy networks acting as teachers to light networks as students, is suitable methodology. Although previous knowledge distillation approaches have been proven to improve the performance of student networks, most methods have some limitations. First, they tend to use only the spatial correlation of feature maps and ignore the relational information of their channels. Second, they can transfer false knowledge when the results of the teacher networks are not perfect. To address these two problems, we propose two loss functions: a channel and spatial correlation (CSC) loss function and an adaptive cross entropy (ACE) loss function. The former computes the full relationship of both the channel and spatial information in the feature map, and the latter adaptively exploits one-hot encodings using the ground truth labels and the probability maps predicted by the teacher network. To evaluate our method, we conduct experiments on scene parsing datasets: Cityscapes and Camvid. Our method presents significantly better performance than previous methods.

Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1737 ◽  
Author(s):  
Tae-young Ko ◽  
Seung-ho Lee

This paper proposes a novel method of semantic segmentation, consisting of modified dilated residual network, atrous pyramid pooling module, and backpropagation, that is applicable to augmented reality (AR). In the proposed method, the modified dilated residual network extracts a feature map from the original images and maintains spatial information. The atrous pyramid pooling module places convolutions in parallel and layers feature maps in a pyramid shape to extract objects occupying small areas in the image; these are converted into one channel using a 1 × 1 convolution. Backpropagation compares the semantic segmentation obtained through convolution from the final feature map with the ground truth provided by a database. Losses can be reduced by applying backpropagation to the modified dilated residual network to change the weighting. The proposed method was compared with other methods on the Cityscapes and PASCAL VOC 2012 databases. The proposed method achieved accuracies of 82.8 and 89.8 mean intersection over union (mIOU) and frame rates of 61 and 64.3 frames per second (fps) for the Cityscapes and PASCAL VOC 2012 databases, respectively. These results prove the applicability of the proposed method for implementing natural AR applications at actual speeds because the frame rate is greater than 60 fps.


Author(s):  
Mingmin Zhen ◽  
Jinglu Wang ◽  
Lei Zhou ◽  
Tian Fang ◽  
Long Quan

Semantic segmentation is pixel-wise classification which retains critical spatial information. The “feature map reuse” has been commonly adopted in CNN based approaches to take advantage of feature maps in the early layers for the later spatial reconstruction. Along this direction, we go a step further by proposing a fully dense neural network with an encoderdecoder structure that we abbreviate as FDNet. For each stage in the decoder module, feature maps of all the previous blocks are adaptively aggregated to feedforward as input. On the one hand, it reconstructs the spatial boundaries accurately. On the other hand, it learns more efficiently with the more efficient gradient backpropagation. In addition, we propose the boundary-aware loss function to focus more attention on the pixels near the boundary, which boosts the “hard examples” labeling. We have demonstrated the best performance of the FDNet on the two benchmark datasets: PASCAL VOC 2012, NYUDv2 over previous works when not considering training on other datasets.


Author(s):  
Zhenzhen Yang ◽  
Pengfei Xu ◽  
Yongpeng Yang ◽  
Bing-Kun Bao

The U-Net has become the most popular structure in medical image segmentation in recent years. Although its performance for medical image segmentation is outstanding, a large number of experiments demonstrate that the classical U-Net network architecture seems to be insufficient when the size of segmentation targets changes and the imbalance happens between target and background in different forms of segmentation. To improve the U-Net network architecture, we develop a new architecture named densely connected U-Net (DenseUNet) network in this article. The proposed DenseUNet network adopts a dense block to improve the feature extraction capability and employs a multi-feature fuse block fusing feature maps of different levels to increase the accuracy of feature extraction. In addition, in view of the advantages of the cross entropy and the dice loss functions, a new loss function for the DenseUNet network is proposed to deal with the imbalance between target and background. Finally, we test the proposed DenseUNet network and compared it with the multi-resolutional U-Net (MultiResUNet) and the classic U-Net networks on three different datasets. The experimental results show that the DenseUNet network has significantly performances compared with the MultiResUNet and the classic U-Net networks.


2020 ◽  
Vol 10 (3) ◽  
pp. 661-666 ◽  
Author(s):  
Shaoguo Cui ◽  
Moyu Chen ◽  
Chang Liu

Breast cancer is one of the leading causes of death among the women worldwide. The clinical medical system urgently needs an accurate and automatic breast segmentation method in order to detect the breast ultrasound lesions. Recently, some studies show that deep learning methods based on fully convolutional network, have demonstrated a competitive performance in breast ultrasound segmentation. However, some features are missed in the Unet in case of down-sampling that results in a low segmentation accuracy. Furthermore, there is a semantic gap between the feature maps of decoder and encoder in Unet, so the simple fusion of high and low level features is not conducive to the semantic classification of pixels. In addition, the poor quality of breast ultrasound also affects the accuracy of image segmentation. To solve these problems, we propose a new end-toend network model called Dense skip Unet (DsUnet), which consists of the Unet backbone, short skip connection and deep supervision. The proposed method can effectively avoid the missing of feature information caused by down-sampling and implement the fusion of multilevel semantic information. We used a new loss function to optimize the DsUnet, which is composed of a binary cross-entropy and dice coefficient. We employed the True Positive Fraction (TPF), False Positives per image (FPs) and F -measure as performance metrics for evaluating various methods. In this paper, we adopted the UDIAT 212 dataset and the experimental results validate that our new approach achieved better performance than other existing methods in detecting and segmenting the ultrasound breast lesions. When we used the DsUnet model and new loss function (binary cross-entropy + dice coefficient), the best performance indexes can be achieved, i.e., 0.87 in TPF, 0.13 in FPs/image and 0.86 in F-measure.


2021 ◽  
Author(s):  
Chao Lu ◽  
Fansheng Chen ◽  
Xiaofeng Su ◽  
Dan Zeng

Abstract Infrared technology is a widely used in precision guidance and mine detection since it can capture the heat radiated outward from the target object. We use infrared (IR) thermography to get the infrared image of the buried obje cts. Compared to the visible images, infrared images present poor resolution, low contrast, and fuzzy visual effect, which make it difficult to segment the target object, specifically in the complex backgrounds. In this condition, traditional segmentation methods cannot perform well in infrared images since they are easily disturbed by the noise and non-target objects in the images. With the advance of deep convolutional neural network (CNN), the deep learning-based methods have made significant improvements in semantic segmentation task. However, few of them research Infrared image semantic segmentation, which is a more challenging scenario compared to visible images. Moreover, the lack of an Infrared image dataset is also a problem for current methods based on deep learning. We raise a multi-scale attentional feature fusion (MS-AFF) module for infrared image semantic segmentation to solve this problem. Precisely, we integrate a series of feature maps from different levels by an atrous spatial pyramid structure. In this way, the model can obtain rich representation ability on the infrared images. Besides, a global spatial information attention module is employed to let the model focus on the target region and reduce disturbance in infrared images' background. In addition, we propose an infrared segmentation dataset based on the infrared thermal imaging system. Extensive experiments conducted in the infrared image segmentation dataset show the superiority of our method.


2021 ◽  
Vol 13 (16) ◽  
pp. 3187
Author(s):  
Xinchun Wei ◽  
Xing Li ◽  
Wei Liu ◽  
Lianpeng Zhang ◽  
Dayu Cheng ◽  
...  

Deep learning techniques have greatly improved the efficiency and accuracy of building extraction using remote sensing images. However, high-quality building outline extraction results that can be applied to the field of surveying and mapping remain a significant challenge. In practice, most building extraction tasks are manually executed. Therefore, an automated procedure of a building outline with a precise position is required. In this study, we directly used the U2-net semantic segmentation model to extract the building outline. The extraction results showed that the U2-net model can provide the building outline with better accuracy and a more precise position than other models based on comparisons with semantic segmentation models (Segnet, U-Net, and FCN) and edge detection models (RCF, HED, and DexiNed) applied for two datasets (Nanjing and Wuhan University (WHU)). We also modified the binary cross-entropy loss function in the U2-net model into a multiclass cross-entropy loss function to directly generate the binary map with the building outline and background. We achieved a further refined outline of the building, thus showing that with the modified U2-net model, it is not necessary to use non-maximum suppression as a post-processing step, as in the other edge detection models, to refine the edge map. Moreover, the modified model is less affected by the sample imbalance problem. Finally, we created an image-to-image program to further validate the modified U2-net semantic segmentation model for building outline extraction.


Author(s):  
Y. Xu ◽  
Z. Sun ◽  
R. Boerner ◽  
T. Koch ◽  
L. Hoegner ◽  
...  

In this work, we report a novel way of generating ground truth dataset for analyzing point cloud from different sensors and the validation of algorithms. Instead of directly labeling large amount of 3D points requiring time consuming manual work, a multi-resolution 3D voxel grid for the testing site is generated. Then, with the help of a set of basic labeled points from the reference dataset, we can generate a 3D labeled space of the entire testing site with different resolutions. Specifically, an octree-based voxel structure is applied to voxelize the annotated reference point cloud, by which all the points are organized by 3D grids of multi-resolutions. When automatically annotating the new testing point clouds, a voting based approach is adopted to the labeled points within multiple resolution voxels, in order to assign a semantic label to the 3D space represented by the voxel. Lastly, robust line- and plane-based fast registration methods are developed for aligning point clouds obtained via various sensors. Benefiting from the labeled 3D spatial information, we can easily create new annotated 3D point clouds of different sensors of the same scene directly by considering the corresponding labels of 3D space the points located, which would be convenient for the validation and evaluation of algorithms related to point cloud interpretation and semantic segmentation.


2021 ◽  
Vol 2083 (4) ◽  
pp. 042083
Author(s):  
Shuhan Liu

Abstract Semantic segmentation is a traditional task that requires a large number of pixel-level ground truth label data sets, which is time-consuming and expensive. Recent developments in weakly-supervised settings have shown that reasonable performance can be obtained using only image-level labels. Classification is often used as an agent task to train deep neural networks and extract attention maps from them. The classification task only needs less supervision information to obtain the most discriminative part of the object. For this purpose, we propose a new end-to-end counter-wipe network. Compared with the baseline network, we propose a method to apply the graph neural network to obtain the first CAM. It is proposed to train the joint loss function to avoid the network weight sharing and cause the network to fall into a saddle point. Our experiments on the Pascal VOC2012 dataset show that 64.9% segmentation performance is obtained, which is an improvement of 2.1% compared to our baseline.


Sensors ◽  
2019 ◽  
Vol 19 (17) ◽  
pp. 3787
Author(s):  
Łukasz Chechliński ◽  
Barbara Siemiątkowska ◽  
Michał Majewski

Automated weeding is an important research area in agrorobotics. Weeds can be removed mechanically or with the precise usage of herbicides. Deep Learning techniques achieved state of the art results in many computer vision tasks, however their deployment on low-cost mobile computers is still challenging. The described system contains several novelties, compared both with its previous version and related work. It is a part of a project of the automatic weeding machine, developed by the Warsaw University of Technology and MCMS Warka Ltd. Obtained models reach satisfying accuracy (detecting 47–67% of weed area, misclasifing as weed 0.1–0.9% of crop area) at over 10 FPS on the Raspberry Pi 3B+ computer. It was tested for four different plant species at different growth stadiums and lighting conditions. The system performing semantic segmentation is based on Convolutional Neural Networks. Its custom architecture combines U-Net, MobileNets, DenseNet and ResNet concepts. Amount of needed manual ground truth labels was significantly decreased by the usage of the knowledge distillation process, learning final model which mimics an ensemble of complex models on a large database of unlabeled data. Further decrease of the inference time was obtained by two custom modifications: in the usage of separable convolutions in DenseNet block and in the number of channels in each layer. In the authors’ opinion, the described novelties can be easily transferred to other agrorobotics tasks.


2020 ◽  
Vol 13 (1) ◽  
pp. 119
Author(s):  
Song Ouyang ◽  
Yansheng Li

Although the deep semantic segmentation network (DSSN) has been widely used in remote sensing (RS) image semantic segmentation, it still does not fully mind the spatial relationship cues between objects when extracting deep visual features through convolutional filters and pooling layers. In fact, the spatial distribution between objects from different classes has a strong correlation characteristic. For example, buildings tend to be close to roads. In view of the strong appearance extraction ability of DSSN and the powerful topological relationship modeling capability of the graph convolutional neural network (GCN), a DSSN-GCN framework, which combines the advantages of DSSN and GCN, is proposed in this paper for RS image semantic segmentation. To lift the appearance extraction ability, this paper proposes a new DSSN called the attention residual U-shaped network (AttResUNet), which leverages residual blocks to encode feature maps and the attention module to refine the features. As far as GCN, the graph is built, where graph nodes are denoted by the superpixels and the graph weight is calculated by considering the spectral information and spatial information of the nodes. The AttResUNet is trained to extract the high-level features to initialize the graph nodes. Then the GCN combines features and spatial relationships between nodes to conduct classification. It is worth noting that the usage of spatial relationship knowledge boosts the performance and robustness of the classification module. In addition, benefiting from modeling GCN on the superpixel level, the boundaries of objects are restored to a certain extent and there are less pixel-level noises in the final classification result. Extensive experiments on two publicly open datasets show that DSSN-GCN model outperforms the competitive baseline (i.e., the DSSN model) and the DSSN-GCN when adopting AttResUNet achieves the best performance, which demonstrates the advance of our method.


Sign in / Sign up

Export Citation Format

Share Document