Pedestrian Detection at Night in Infrared Images Using an Attention-Guided Encoder-Decoder Convolutional Neural Network

Yunfan Chen; Hyunchul Shin

doi:10.3390/app10030809

Pedestrian Detection at Night in Infrared Images Using an Attention-Guided Encoder-Decoder Convolutional Neural Network

Applied Sciences ◽

10.3390/app10030809 ◽

2020 ◽

Vol 10 (3) ◽

pp. 809 ◽

Cited By ~ 4

Author(s):

Yunfan Chen ◽

Hyunchul Shin

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Pedestrian Detection ◽

Weather Conditions ◽

Superior Performance ◽

Low Resolution ◽

Feature Maps ◽

Camera System ◽

Ir Camera ◽

Multi Scale

Pedestrian-related accidents are much more likely to occur during nighttime when visible (VI) cameras are much less effective. Unlike VI cameras, infrared (IR) cameras can work in total darkness. However, IR images have several drawbacks, such as low-resolution, noise, and thermal energy characteristics that can differ depending on the weather. To overcome these drawbacks, we propose an IR camera system to identify pedestrians at night that uses a novel attention-guided encoder-decoder convolutional neural network (AED-CNN). In AED-CNN, encoder-decoder modules are introduced to generate multi-scale features, in which new skip connection blocks are incorporated into the decoder to combine the feature maps from the encoder and decoder module. This new architecture increases context information which is helpful for extracting discriminative features from low-resolution and noisy IR images. Furthermore, we propose an attention module to re-weight the multi-scale features generated by the encoder-decoder module. The attention mechanism effectively highlights pedestrians while eliminating background interference, which helps to detect pedestrians under various weather conditions. Empirical experiments on two challenging datasets fully demonstrate that our method shows superior performance. Our approach significantly improves the precision of the state-of-the-art method by 5.1% and 23.78% on the Keimyung University (KMU) and Computer Vision Center (CVC)-09 pedestrian dataset, respectively.

Download Full-text

MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer

Remote Sensing ◽

10.3390/rs13234743 ◽

2021 ◽

Vol 13 (23) ◽

pp. 4743

Author(s):

Wei Yuan ◽

Wenbo Xu

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Convolutional Neural Network ◽

Network Model ◽

Remote Sensing Images ◽

Feature Maps ◽

Global Features ◽

Adaptive Network ◽

Data Set ◽

Multi Scale

The segmentation of remote sensing images by deep learning technology is the main method for remote sensing image interpretation. However, the segmentation model based on a convolutional neural network cannot capture the global features very well. A transformer, whose self-attention mechanism can supply each pixel with a global feature, makes up for the deficiency of the convolutional neural network. Therefore, a multi-scale adaptive segmentation network model (MSST-Net) based on a Swin Transformer is proposed in this paper. Firstly, a Swin Transformer is used as the backbone to encode the input image. Then, the feature maps of different levels are decoded separately. Thirdly, the convolution is used for fusion, so that the network can automatically learn the weight of the decoding results of each level. Finally, we adjust the channels to obtain the final prediction map by using the convolution with a kernel of 1 × 1. By comparing this with other segmentation network models on a WHU building data set, the evaluation metrics, mIoU, F1-score and accuracy are all improved. The network model proposed in this paper is a multi-scale adaptive network model that pays more attention to the global features for remote sensing segmentation.

Download Full-text

SEMANTIC SEGMENTATION OF AERIAL IMAGERY VIA MULTI-SCALE SHUFFLING CONVOLUTIONAL NEURAL NETWORKS WITH DEEP SUPERVISION

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-iv-1-29-2018 ◽

2018 ◽

Vol IV-1 ◽

pp. 29-36 ◽

Cited By ~ 4

Author(s):

K. Chen ◽

M. Weinmann ◽

X. Sun ◽

M. Yan ◽

S. Hinz ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Semantic Segmentation ◽

Aerial Imagery ◽

Geometric Features ◽

Feature Maps ◽

Multi Scale ◽

Intermediate Layers ◽

Segmentation Task ◽

The Impact

<p><strong>Abstract.</strong> In this paper, we address the semantic segmentation of aerial imagery based on the use of multi-modal data given in the form of true orthophotos and the corresponding Digital Surface Models (DSMs). We present the Deeply-supervised Shuffling Convolutional Neural Network (DSCNN) representing a multi-scale extension of the Shuffling Convolutional Neural Network (SCNN) with deep supervision. Thereby, we take the advantage of the SCNN involving the shuffling operator to effectively upsample feature maps and then fuse multiscale features derived from the intermediate layers of the SCNN, which results in the Multi-scale Shuffling Convolutional Neural Network (MSCNN). Based on the MSCNN, we derive the DSCNN by introducing additional losses into the intermediate layers of the MSCNN. In addition, we investigate the impact of using different sets of hand-crafted radiometric and geometric features derived from the true orthophotos and the DSMs on the semantic segmentation task. For performance evaluation, we use a commonly used benchmark dataset. The achieved results reveal that both multi-scale fusion and deep supervision contribute to an improvement in performance. Furthermore, the use of a diversity of hand-crafted radiometric and geometric features as input for the DSCNN does not provide the best numerical results, but smoother and improved detections for several objects.</p>

Download Full-text

Super-resolution reconstruction of seismic section image via multi-scale convolution neural network

E3S Web of Conferences ◽

10.1051/e3sconf/202130301058 ◽

2021 ◽

Vol 303 ◽

pp. 01058

Author(s):

Meng-Di Deng ◽

Rui-Sheng Jia ◽

Hong-Mei Sun ◽

Xing-Li Zhang

Keyword(s):

Neural Network ◽

High Resolution ◽

Convolutional Neural Network ◽

Super Resolution ◽

Image Features ◽

Image Feature ◽

Reconstruction Method ◽

Low Resolution ◽

Seismic Section ◽

Multi Scale

The resolution of seismic section images can directly affect the subsequent interpretation of seismic data. In order to improve the spatial resolution of low-resolution seismic section images, a super-resolution reconstruction method based on multi-scale convolution is proposed. This method designs a multi-scale convolutional neural network to learn high-low resolution image feature pairs, and realizes mapping learning from low-resolution seismic section images to high-resolution seismic section images. This multi-scale convolutional neural network model consists of four convolutional layers and a sub-pixel convolutional layer. Convolution operations are used to learn abundant seismic section image features, and sub-pixel convolution layer is used to reconstruct high-resolution seismic section image. The experimental results show that the proposed method is superior to the comparison method in peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). In the total training time and reconstruction time, our method is about 22% less than the FSRCNN method and about 18% less than the ESPCN method.

Download Full-text

VP-Detector: A 3D convolutional neural network for automated macromolecule localization and classification in cryo-electron tomograms

10.1101/2021.05.25.443703 ◽

2021 ◽

Author(s):

Yu Hao ◽

Biao Zhang ◽

Xiaohua Wan ◽

Rui Yan ◽

Zhiyong Liu ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Electron Tomography ◽

Class Imbalance ◽

Feature Maps ◽

Particle Detection ◽

Multi Scale ◽

Accurate Performance ◽

Fully Automatic

Motivation: Cryo-electron tomography (Cryo-ET) with sub-tomogram averaging (STA) is indispensable when studying macromolecule structures and functions in their native environments. However, current tomographic reconstructions suffer the low signal-to-noise (SNR) ratio and the missing wedge artifacts. Hence, automatic and accurate macromolecule localization and classification become the bottleneck problem for structural determination by STA. Here, we propose a 3D multi-scale dense convolutional neural network (MSDNet) for voxel-wise annotations of tomograms. Weighted focal loss is adopted as a loss function to solve the class imbalance. The proposed network combines 3D hybrid dilated convolutions (HDC) and dense connectivity to ensure an accurate performance with relatively few trainable parameters. 3D HDC expands the receptive field without losing resolution or learning extra parameters. Dense connectivity facilitates the re-use of feature maps to generate fewer intermediate feature maps and trainable parameters. Then, we design a 3D MSDNet based approach for fully automatic macromolecule localization and classification, called VP-Detector (Voxel-wise Particle Detector). VP-Detector is efficient because classification performs on the pre-calculated coordinates instead of a sliding window. Results: We evaluated the VP-Detector on simulated tomograms. Compared to the state-of-the-art methods, our method achieved a competitive performance on localization with the highest F1-score. We also demonstrated that the weighted focal loss improves the classification of hard classes. We trained the network on a part of training sets to prove the availability of training on relatively small datasets. Moreover, the experiment shows that VP-Detector has a fast particle detection speed, which costs less than 14 minutes on a test tomogram.

Download Full-text

Pedestrian detection via multi-scale feature fusion convolutional neural network

2017 Chinese Automation Congress (CAC) ◽

10.1109/cac.2017.8242979 ◽

2017 ◽

Cited By ~ 2

Author(s):

Aixin Guo ◽

Baoqun Yin ◽

Jing Zhang ◽

Jinfa Yao

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Feature Fusion ◽

Pedestrian Detection ◽

Scale Feature ◽

Multi Scale

Download Full-text

Multi-scale Pedestrian Detection in Thermal Imaging Using Deep Convolutional Neural Network and Adaptive NMS

The Journal of Korean Institute of Information Technology ◽

10.14801/jkiit.2018.16.9.85 ◽

2018 ◽

Vol 16 (9) ◽

pp. 85-94

Author(s):

Tan Dat Trinh ◽

Jin Young Kim

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Thermal Imaging ◽

Pedestrian Detection ◽

Deep Convolutional Neural Network ◽

Multi Scale

Download Full-text

Multi-Scale Feature Fusion Convolutional Neural Network for Concurrent Segmentation of Left Ventricle and Myocardium in Cardiac MR Images

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2020.3005 ◽

2020 ◽

Vol 10 (5) ◽

pp. 1023-1032

Author(s):

Lin Qi ◽

Haoran Zhang ◽

Xuehao Cao ◽

Xuyang Lyu ◽

Lisheng Xu ◽

...

Keyword(s):

Neural Network ◽

Left Ventricle ◽

Convolutional Neural Network ◽

Feature Fusion ◽

Left Ventricular ◽

Mr Images ◽

Feature Maps ◽

Scale Feature ◽

Multi Scale ◽

Cine Mr

Accurate segmentation of the blood pool of left ventricle (LV) and myocardium (or left ventricular epicardium, MYO) from cardiac magnetic resonance (MR) can help doctors to quantify LV ejection fraction and myocardial deformation. To reduce doctor’s burden of manual segmentation, in this study, we propose an automated and concurrent segmentation method of the LV and MYO. First, we employ a convolutional neural network (CNN) architecture to extract the region of interest (ROI) from short-axis cardiac cine MR images as a preprocessing step. Next, we present a multi-scale feature fusion (MSFF) CNN with a new weighted Dice index (WDI) loss function to get the concurrent segmentation of the LV and MYO. We use MSFF modules with three scales to extract different features, and then concatenate feature maps by the short and long skip connections in the encoder and decoder path to capture more complete context information and geometry structure for better segmentation. Finally, we compare the proposed method with Fully Convolutional Networks (FCN) and U-Net on the combined cardiac datasets from MICCAI 2009 and ACDC 2017. Experimental results demonstrate that the proposed method could perform effectively on LV and MYOs segmentation in the combined datasets, indicating its potential for clinical application.

Download Full-text

A novel multi-scale convolutional neural network for motor imagery classification

Biomedical Signal Processing and Control ◽

10.1016/j.bspc.2021.102747 ◽

2021 ◽

Vol 68 ◽

pp. 102747

Author(s):

Mouad Riyad ◽

Mohammed Khalil ◽

Abdellah Adib

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Motor Imagery ◽

Multi Scale

Download Full-text

Lightweight convolutional neural network-based pedestrian detection and re-identification in multiple scenarios

Machine Vision and Applications ◽

10.1007/s00138-021-01169-7 ◽

2021 ◽

Vol 32 (2) ◽

Author(s):

Xiao Ke ◽

Xinru Lin ◽

Liyun Qin

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Pedestrian Detection ◽

Multiple Scenarios

Download Full-text

Pixel-level Diabetic Retinopathy Lesion Detection Using Multi-scale Convolutional Neural Network

2021 IEEE 3rd Global Conference on Life Sciences and Technologies (LifeTech) ◽

10.1109/lifetech52111.2021.9391891 ◽

2021 ◽

Author(s):

Qi Li ◽

Chenglei Peng ◽

Yazhen Ma ◽

Sidan Du ◽

Bin Guo ◽

...

Keyword(s):

Neural Network ◽

Diabetic Retinopathy ◽

Convolutional Neural Network ◽

Lesion Detection ◽

Multi Scale

Download Full-text