Jointly human semantic parsing and attribute recognition with feature pyramid structure in EfficientNets

Feature Pyramid Attention Model and Multi-Label Focal Loss for Pedestrian Attribute Recognition

IEEE Access ◽

10.1109/access.2020.3010435 ◽

2020 ◽

Vol 8 ◽

pp. 164570-164579

Author(s):

Ye Li ◽

Fangyan Shi ◽

Shaoqi Hou ◽

Jipeng Li ◽

Chao Li ◽

...

Keyword(s):

Attention Model ◽

Feature Pyramid ◽

Attribute Recognition

Download Full-text

Depth Estimation for Monocular Image Based on Convolutional Neural Networks

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2021.15.59 ◽

2021 ◽

Vol 15 ◽

pp. 533-540

Author(s):

Binglin Niu ◽

Mengxia Tang ◽

Xuelin Chen

Keyword(s):

Three Dimensional ◽

Depth Estimation ◽

Dimensional Structure ◽

Depth Information ◽

Pyramid Structure ◽

Level Information ◽

Monocular Image ◽

Feature Pyramid ◽

Sampling Structure ◽

Autonomous Movement

Perceiving the three-dimensional structure of the surrounding environment and analyzing it for autonomous movement is an indispensable element for robots to operate in scenes. Recovering depth information and the three-dimensional spatial structure from monocular images is a basic mission of computer vision. For the objects in the image, there are many scenes that may produce it. This paper proposes to use a supervised end-to-end network to perform depth estimation without relying on any subsequent processing operations, such as probabilistic graphic models and other extra fine steps. This paper uses an encoder-decoder structure with feature pyramid to complete the prediction of dense depth maps. The encoder adopts ResNeXt-50 network to achieve main features from the original image. The feature pyramid structure can merge high and low level information with each other, and the feature information is not lost. The decoder utilizes the transposed convolutional and the convolutional layer to connect as an up-sampling structure to expand the resolution of the output. The structure adopted in this paper is applied to the indoor dataset NYU Depth v2 to obtain better prediction results than other methods. The experimental results show that on the NYU Depth v2 dataset, our method achieves the best results on 5 indicators and the sub-optimal results on 1 indicator.

Download Full-text

Chinese Character Boxes: Single Shot Detector Network for Chinese Character Detection

Applied Sciences ◽

10.3390/app9020315 ◽

2019 ◽

Vol 9 (2) ◽

pp. 315 ◽

Cited By ~ 3

Author(s):

Junhwan Ryu ◽

Sungho Kim

Keyword(s):

Deep Learning ◽

Character Recognition ◽

Chinese Character ◽

Layer Structure ◽

Single Step ◽

Single Shot ◽

Data Set ◽

Pyramid Structure ◽

Feature Pyramid ◽

Translation Systems

This paper proposes a deep learning-based Chinese character detection network which is important for character recognition and translation. Detecting the correct character area is an important part of recognition and translation. Previous studies have focused on methods using projection through image pre-processing and recognition methods based on segmentation and methods using hand-crafted features such as analyzing and using features. Unfortunately, the results are vulnerable to noise. Recently, recognition or translation systems based on deep learning were dealt with as a single step from detection to translation but they failed to consider the inaccurate localization problem that arises in detectors. This paper proposes a Chinese character boxes (CCB) network that deals with a method to detect the character area more accurately using the single-shot multibox detector (SSD) as the baseline and called CCB-SSD. The proposed CCB-SSD network has a single prediction layer structure in which unnecessary layers are removed from the feature-pyramid structure. The augmentation method for training is introduced and the problem caused by the use of default boxes is solved by using the proposed non-maximum suppression (NMS). The experimental results revealed a 96.1% detection rate and 0.89 performance against the false positives per character (FPPC) which is the proposed false positive index for the character data-set and caoshu data-set used in this paper. This method showed better performance than the conventional SSD with 69.4% and 6.57 FPPC.

Download Full-text

MGFPN: Enhancing multi-scale feature for object detection

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202372 ◽

2021 ◽

pp. 1-11

Author(s):

Weiming He ◽

You Wu ◽

Jing Xiao ◽

Yang Cao

Keyword(s):

Object Detection ◽

Spatial Information ◽

Mixed Group ◽

Scale Feature ◽

Multi Scale ◽

Pyramid Structure ◽

Feature Pyramid ◽

New Feature ◽

Design Defects ◽

Feature Utilization

Feature pyramids are commonly applied to solve the scale variation problem for object detection. One of the most representative works of feature pyramid is Feature Pyramid Network (FPN), which is simple and efficient. However, the fully power of multi-scale features might not be completely exploited in FPN due to its design defects. In this paper, we first analyze the structure problems of FPN which prevent the multi-scale feature from being fully exploited, then propose a new feature pyramid structure named Mixed Group FPN (MGFPN), to mitigate these design defects of FPN. Concretely, MGFPN strengthens the feature utilization by two modules named Mixed Group Convolution(MGConv) and Contextual Attention(CA). MGConv reduces the spatial information loss of FPN in feature generation stage. And CA narrows the semantic gaps between features of different receptive field before lateral summation. By replacing FPN with MGFPN in FCOS, our method can improve the performance of detectors in many major backbones by 0.7 to 1.2 Average Precision(AP) on MS-COCO benchmark without adding too much parameters and it is easy to be extended to other FPN-based models. The proposed MGFPN can serve as a simple and strong alternative for many other FPN based models.

Download Full-text

Object Detection in Autonomous Driving Scenarios Based on an Improved Faster-RCNN

Applied Sciences ◽

10.3390/app112411630 ◽

2021 ◽

Vol 11 (24) ◽

pp. 11630

Author(s):

Yan Zhou ◽

Sijie Wen ◽

Dongli Wang ◽

Jinzhen Mu ◽

Irampaye Richard

Keyword(s):

Object Detection ◽

Autonomous Driving ◽

Detection Algorithm ◽

Data Sets ◽

False Detection ◽

Automatic Driving ◽

Occluded Objects ◽

Pyramid Structure ◽

Feature Pyramid ◽

Bounding Boxes

Object detection is one of the key algorithms in automatic driving systems. Aiming at addressing the problem of false detection and the missed detection of both small and occluded objects in automatic driving scenarios, an improved Faster-RCNN object detection algorithm is proposed. First, deformable convolution and a spatial attention mechanism are used to improve the ResNet-50 backbone network to enhance the feature extraction of small objects; then, an improved feature pyramid structure is introduced to reduce the loss of features in the fusion process. Three cascade detectors are introduced to solve the problem of IOU (Intersection-Over-Union) threshold mismatch, and side-aware boundary localization is applied for frame regression. Finally, Soft-NMS (Soft Non-maximum Suppression) is used to remove bounding boxes to obtain the best results. The experimental results show that the improved Faster-RCNN can better detect small objects and occluded objects, and its accuracy is 7.7% and 4.1% respectively higher than that of the baseline in the eight categories selected from the COCO2017 and BDD100k data sets.

Download Full-text

ShipYOLO: An Enhanced Model for Ship Detection

Journal of Advanced Transportation ◽

10.1155/2021/1060182 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Xu Han ◽

Lining Zhao ◽

Yue Ning ◽

Jingfeng Hu

Keyword(s):

Feature Extraction ◽

Target Space ◽

Small Scale ◽

Model Parameters ◽

Detection Accuracy ◽

Ship Detection ◽

Backbone Network ◽

Input Size ◽

Pyramid Structure ◽

Feature Pyramid

The application of ship detection for assistant intelligent ship navigation has stringent requirements for the model’s detection speed and accuracy. In response to this problem, this study uses an improved YOLO-V4 detection model (ShipYOLO) to detect ships. Compared to YOLO-V4, the model has three main improvements. Firstly, the backbone network (CSPDarknet) of YOLO-V4 is optimized. In the training process, the 3 × 3 convolution, 1 × 1 convolution, and identity parallel mode are used to replace the original feature extraction component (ResUnit) and more features are extracted. In the inference process, the branch parameters are combined to form a new backbone network named RCSPDarknet, which improves the inference speed of the model while improving the accuracy. Secondly, in order to solve the problem of missed detection of the small-scale ships, we designed a new amplified receptive field module named DSPP with dilated convolution and Max-Pooling, which improves the model’s acquisition of small-scale ship spatial information and robustness of ship target space displacement. Finally, we use the attention mechanism and Resnet’s shortcut idea to improve the feature pyramid structure (PAFPN) of YOLO-V4 and get a new feature pyramid structure named AtFPN. The structure effectively improves the model’s feature extraction effect for ships of different scales and reduces the number of model parameters, further improving the model’s inference speed and detection accuracy. In addition, we have created a ship dataset with a total of 2238 images, which is a single-category dataset. The experimental results show that ShipYOLO has the advantage of faster speed and higher accuracy even in different input sizes. Considering the input size of 320 × 320 on the PC equipped with NVIDIA 1080Ti GPU, the FPS and mAP@5 : 5:95 (mAP90) of ShipYOLO are increased by 23.7% and 13.6% (10.6%), respectively, with an input size of 320 × 320, ShipYOLO, compared to YOLO-V4.

Download Full-text

An anchor-free detector and R-CNN integrated neural network architecture for environmental perception of urban roads

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/09544070211004466 ◽

2021 ◽

pp. 095440702110044

Author(s):

Chaojun Lin ◽

Ying Shi ◽

Jian Zhang ◽

Changjun Xie ◽

Wei Chen ◽

...

Keyword(s):

Neural Network ◽

Autonomous Vehicles ◽

Network Architecture ◽

Pedestrian Detection ◽

Semantic Segmentation ◽

Environmental Perception ◽

Pyramid Structure ◽

Research Goal ◽

Assignment Strategy ◽

Feature Pyramid

Environmental perception of urban roads is a critical research goal in intelligent transportation technology and autonomous vehicles, and pedestrian location is key to many relevant algorithms. Because anchor-free detectors are faster and region-based convolutional neural networks have a higher accuracy in object detection and classification, we propose an integrated convolutional networking architecture combining an anchor-free detector with a region-based convolutional neural network in the environmental perception task. The proposed network achieves higher precision and increases inference speed by up to 30%. To acquire more accurate region boundaries than a coarse bounding box method, a semantic segmentation sub-network is adopted to predict an instance segmentation mask for each object, and more accurate segmentation results are obtained by using the Dice loss. Moreover, we present an assignment strategy using a modified feature pyramid structure and show that it improves mean average precision of pedestrian detection by 2% on average. Finally, we verify that the pretrained neural network is beneficial for small datasets. Overall, the results show that our model achieves higher precision than the approaches used for comparison.

Download Full-text

Road crack detection network under noise based on feature pyramid structure with feature enhancement (road crack detection under noise)

IET Image Processing ◽

10.1049/ipr2.12388 ◽

2021 ◽

Author(s):

Mingsi Sun ◽

Hongwei Zhao ◽

Jiao Li

Keyword(s):

Crack Detection ◽

Feature Enhancement ◽

Pyramid Structure ◽

Feature Pyramid

Download Full-text

Improved YOLO v5 with balanced feature pyramid and attention module for traffic sign detection

MATEC Web of Conferences ◽

10.1051/matecconf/202235503023 ◽

2022 ◽

Vol 355 ◽

pp. 03023

Author(s):

Linfeng Jiang ◽

Hui Liu ◽

Hong Zhu ◽

Guangjian Zhang

Keyword(s):

Feature Fusion ◽

Detection Methods ◽

Traffic Sign ◽

Global Context ◽

Automatic Driving ◽

Computational Overhead ◽

Pyramid Structure ◽

Sign Detection ◽

Feature Pyramid ◽

Traffic Sign Detection

With the development of automatic driving technology, traffic sign detection has become a very important task. However, it is a challenging task because of the complex traffic sign scene and the small size of the target. In recent years, a number of convolutional neural network (CNN) based object detection methods have brought great progress to traffic sign detection. Considering the still high false detection rate, as well as the high time overhead and computational overhead, the effect is not satisfactory. Therefore, we employ lightweight network model YOLO v5 (You Only Look Once) as our work foundation. In this paper, we propose an improved YOLO v5 method by using balances feature pyramid structure and global context block to enhance the ability of feature fusion and feature extraction. To verify our proposed method, we have conducted a lot of comparative experiments on the challenging dataset Tsinghua-Tencent-100K (TT100K). The experimental results demonstrate that the [email protected] and [email protected]:0.95 are improved by 1.9% and 2.1%, respectively.

Download Full-text

Learning Compact Lexicons for CCG Semantic Parsing

10.3115/v1/d14-1134 ◽

2014 ◽

Cited By ~ 4

Author(s):

Yoav Artzi ◽

Dipanjan Das ◽

Slav Petrov

Keyword(s):

Semantic Parsing

Download Full-text