scholarly journals Pedestrian Detection at Night in Infrared Images Using an Attention-Guided Encoder-Decoder Convolutional Neural Network

2020 ◽  
Vol 10 (3) ◽  
pp. 809 ◽  
Author(s):  
Yunfan Chen ◽  
Hyunchul Shin

Pedestrian-related accidents are much more likely to occur during nighttime when visible (VI) cameras are much less effective. Unlike VI cameras, infrared (IR) cameras can work in total darkness. However, IR images have several drawbacks, such as low-resolution, noise, and thermal energy characteristics that can differ depending on the weather. To overcome these drawbacks, we propose an IR camera system to identify pedestrians at night that uses a novel attention-guided encoder-decoder convolutional neural network (AED-CNN). In AED-CNN, encoder-decoder modules are introduced to generate multi-scale features, in which new skip connection blocks are incorporated into the decoder to combine the feature maps from the encoder and decoder module. This new architecture increases context information which is helpful for extracting discriminative features from low-resolution and noisy IR images. Furthermore, we propose an attention module to re-weight the multi-scale features generated by the encoder-decoder module. The attention mechanism effectively highlights pedestrians while eliminating background interference, which helps to detect pedestrians under various weather conditions. Empirical experiments on two challenging datasets fully demonstrate that our method shows superior performance. Our approach significantly improves the precision of the state-of-the-art method by 5.1% and 23.78% on the Keimyung University (KMU) and Computer Vision Center (CVC)-09 pedestrian dataset, respectively.

2021 ◽  
Vol 13 (23) ◽  
pp. 4743
Author(s):  
Wei Yuan ◽  
Wenbo Xu

The segmentation of remote sensing images by deep learning technology is the main method for remote sensing image interpretation. However, the segmentation model based on a convolutional neural network cannot capture the global features very well. A transformer, whose self-attention mechanism can supply each pixel with a global feature, makes up for the deficiency of the convolutional neural network. Therefore, a multi-scale adaptive segmentation network model (MSST-Net) based on a Swin Transformer is proposed in this paper. Firstly, a Swin Transformer is used as the backbone to encode the input image. Then, the feature maps of different levels are decoded separately. Thirdly, the convolution is used for fusion, so that the network can automatically learn the weight of the decoding results of each level. Finally, we adjust the channels to obtain the final prediction map by using the convolution with a kernel of 1 × 1. By comparing this with other segmentation network models on a WHU building data set, the evaluation metrics, mIoU, F1-score and accuracy are all improved. The network model proposed in this paper is a multi-scale adaptive network model that pays more attention to the global features for remote sensing segmentation.


Author(s):  
K. Chen ◽  
M. Weinmann ◽  
X. Sun ◽  
M. Yan ◽  
S. Hinz ◽  
...  

<p><strong>Abstract.</strong> In this paper, we address the semantic segmentation of aerial imagery based on the use of multi-modal data given in the form of true orthophotos and the corresponding Digital Surface Models (DSMs). We present the Deeply-supervised Shuffling Convolutional Neural Network (DSCNN) representing a multi-scale extension of the Shuffling Convolutional Neural Network (SCNN) with deep supervision. Thereby, we take the advantage of the SCNN involving the shuffling operator to effectively upsample feature maps and then fuse multiscale features derived from the intermediate layers of the SCNN, which results in the Multi-scale Shuffling Convolutional Neural Network (MSCNN). Based on the MSCNN, we derive the DSCNN by introducing additional losses into the intermediate layers of the MSCNN. In addition, we investigate the impact of using different sets of hand-crafted radiometric and geometric features derived from the true orthophotos and the DSMs on the semantic segmentation task. For performance evaluation, we use a commonly used benchmark dataset. The achieved results reveal that both multi-scale fusion and deep supervision contribute to an improvement in performance. Furthermore, the use of a diversity of hand-crafted radiometric and geometric features as input for the DSCNN does not provide the best numerical results, but smoother and improved detections for several objects.</p>


2021 ◽  
Vol 303 ◽  
pp. 01058
Author(s):  
Meng-Di Deng ◽  
Rui-Sheng Jia ◽  
Hong-Mei Sun ◽  
Xing-Li Zhang

The resolution of seismic section images can directly affect the subsequent interpretation of seismic data. In order to improve the spatial resolution of low-resolution seismic section images, a super-resolution reconstruction method based on multi-scale convolution is proposed. This method designs a multi-scale convolutional neural network to learn high-low resolution image feature pairs, and realizes mapping learning from low-resolution seismic section images to high-resolution seismic section images. This multi-scale convolutional neural network model consists of four convolutional layers and a sub-pixel convolutional layer. Convolution operations are used to learn abundant seismic section image features, and sub-pixel convolution layer is used to reconstruct high-resolution seismic section image. The experimental results show that the proposed method is superior to the comparison method in peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). In the total training time and reconstruction time, our method is about 22% less than the FSRCNN method and about 18% less than the ESPCN method.


2021 ◽  
Author(s):  
Yu Hao ◽  
Biao Zhang ◽  
Xiaohua Wan ◽  
Rui Yan ◽  
Zhiyong Liu ◽  
...  

Motivation: Cryo-electron tomography (Cryo-ET) with sub-tomogram averaging (STA) is indispensable when studying macromolecule structures and functions in their native environments. However, current tomographic reconstructions suffer the low signal-to-noise (SNR) ratio and the missing wedge artifacts. Hence, automatic and accurate macromolecule localization and classification become the bottleneck problem for structural determination by STA. Here, we propose a 3D multi-scale dense convolutional neural network (MSDNet) for voxel-wise annotations of tomograms. Weighted focal loss is adopted as a loss function to solve the class imbalance. The proposed network combines 3D hybrid dilated convolutions (HDC) and dense connectivity to ensure an accurate performance with relatively few trainable parameters. 3D HDC expands the receptive field without losing resolution or learning extra parameters. Dense connectivity facilitates the re-use of feature maps to generate fewer intermediate feature maps and trainable parameters. Then, we design a 3D MSDNet based approach for fully automatic macromolecule localization and classification, called VP-Detector (Voxel-wise Particle Detector). VP-Detector is efficient because classification performs on the pre-calculated coordinates instead of a sliding window. Results: We evaluated the VP-Detector on simulated tomograms. Compared to the state-of-the-art methods, our method achieved a competitive performance on localization with the highest F1-score. We also demonstrated that the weighted focal loss improves the classification of hard classes. We trained the network on a part of training sets to prove the availability of training on relatively small datasets. Moreover, the experiment shows that VP-Detector has a fast particle detection speed, which costs less than 14 minutes on a test tomogram.


2020 ◽  
Vol 10 (5) ◽  
pp. 1023-1032
Author(s):  
Lin Qi ◽  
Haoran Zhang ◽  
Xuehao Cao ◽  
Xuyang Lyu ◽  
Lisheng Xu ◽  
...  

Accurate segmentation of the blood pool of left ventricle (LV) and myocardium (or left ventricular epicardium, MYO) from cardiac magnetic resonance (MR) can help doctors to quantify LV ejection fraction and myocardial deformation. To reduce doctor’s burden of manual segmentation, in this study, we propose an automated and concurrent segmentation method of the LV and MYO. First, we employ a convolutional neural network (CNN) architecture to extract the region of interest (ROI) from short-axis cardiac cine MR images as a preprocessing step. Next, we present a multi-scale feature fusion (MSFF) CNN with a new weighted Dice index (WDI) loss function to get the concurrent segmentation of the LV and MYO. We use MSFF modules with three scales to extract different features, and then concatenate feature maps by the short and long skip connections in the encoder and decoder path to capture more complete context information and geometry structure for better segmentation. Finally, we compare the proposed method with Fully Convolutional Networks (FCN) and U-Net on the combined cardiac datasets from MICCAI 2009 and ACDC 2017. Experimental results demonstrate that the proposed method could perform effectively on LV and MYOs segmentation in the combined datasets, indicating its potential for clinical application.


Sign in / Sign up

Export Citation Format

Share Document