scholarly journals Jointly human semantic parsing and attribute recognition with feature pyramid structure in EfficientNets

2021 ◽  
Author(s):  
Mahnaz Moghaddam ◽  
Mostafa Charmi ◽  
Hossein Hassanpoor
IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 164570-164579
Author(s):  
Ye Li ◽  
Fangyan Shi ◽  
Shaoqi Hou ◽  
Jipeng Li ◽  
Chao Li ◽  
...  

Author(s):  
Binglin Niu ◽  
Mengxia Tang ◽  
Xuelin Chen

Perceiving the three-dimensional structure of the surrounding environment and analyzing it for autonomous movement is an indispensable element for robots to operate in scenes. Recovering depth information and the three-dimensional spatial structure from monocular images is a basic mission of computer vision. For the objects in the image, there are many scenes that may produce it. This paper proposes to use a supervised end-to-end network to perform depth estimation without relying on any subsequent processing operations, such as probabilistic graphic models and other extra fine steps. This paper uses an encoder-decoder structure with feature pyramid to complete the prediction of dense depth maps. The encoder adopts ResNeXt-50 network to achieve main features from the original image. The feature pyramid structure can merge high and low level information with each other, and the feature information is not lost. The decoder utilizes the transposed convolutional and the convolutional layer to connect as an up-sampling structure to expand the resolution of the output. The structure adopted in this paper is applied to the indoor dataset NYU Depth v2 to obtain better prediction results than other methods. The experimental results show that on the NYU Depth v2 dataset, our method achieves the best results on 5 indicators and the sub-optimal results on 1 indicator.


2019 ◽  
Vol 9 (2) ◽  
pp. 315 ◽  
Author(s):  
Junhwan Ryu ◽  
Sungho Kim

This paper proposes a deep learning-based Chinese character detection network which is important for character recognition and translation. Detecting the correct character area is an important part of recognition and translation. Previous studies have focused on methods using projection through image pre-processing and recognition methods based on segmentation and methods using hand-crafted features such as analyzing and using features. Unfortunately, the results are vulnerable to noise. Recently, recognition or translation systems based on deep learning were dealt with as a single step from detection to translation but they failed to consider the inaccurate localization problem that arises in detectors. This paper proposes a Chinese character boxes (CCB) network that deals with a method to detect the character area more accurately using the single-shot multibox detector (SSD) as the baseline and called CCB-SSD. The proposed CCB-SSD network has a single prediction layer structure in which unnecessary layers are removed from the feature-pyramid structure. The augmentation method for training is introduced and the problem caused by the use of default boxes is solved by using the proposed non-maximum suppression (NMS). The experimental results revealed a 96.1% detection rate and 0.89 performance against the false positives per character (FPPC) which is the proposed false positive index for the character data-set and caoshu data-set used in this paper. This method showed better performance than the conventional SSD with 69.4% and 6.57 FPPC.


2021 ◽  
pp. 1-11
Author(s):  
Weiming He ◽  
You Wu ◽  
Jing Xiao ◽  
Yang Cao

Feature pyramids are commonly applied to solve the scale variation problem for object detection. One of the most representative works of feature pyramid is Feature Pyramid Network (FPN), which is simple and efficient. However, the fully power of multi-scale features might not be completely exploited in FPN due to its design defects. In this paper, we first analyze the structure problems of FPN which prevent the multi-scale feature from being fully exploited, then propose a new feature pyramid structure named Mixed Group FPN (MGFPN), to mitigate these design defects of FPN. Concretely, MGFPN strengthens the feature utilization by two modules named Mixed Group Convolution(MGConv) and Contextual Attention(CA). MGConv reduces the spatial information loss of FPN in feature generation stage. And CA narrows the semantic gaps between features of different receptive field before lateral summation. By replacing FPN with MGFPN in FCOS, our method can improve the performance of detectors in many major backbones by 0.7 to 1.2 Average Precision(AP) on MS-COCO benchmark without adding too much parameters and it is easy to be extended to other FPN-based models. The proposed MGFPN can serve as a simple and strong alternative for many other FPN based models.


2021 ◽  
Vol 11 (24) ◽  
pp. 11630
Author(s):  
Yan Zhou ◽  
Sijie Wen ◽  
Dongli Wang ◽  
Jinzhen Mu ◽  
Irampaye Richard

Object detection is one of the key algorithms in automatic driving systems. Aiming at addressing the problem of false detection and the missed detection of both small and occluded objects in automatic driving scenarios, an improved Faster-RCNN object detection algorithm is proposed. First, deformable convolution and a spatial attention mechanism are used to improve the ResNet-50 backbone network to enhance the feature extraction of small objects; then, an improved feature pyramid structure is introduced to reduce the loss of features in the fusion process. Three cascade detectors are introduced to solve the problem of IOU (Intersection-Over-Union) threshold mismatch, and side-aware boundary localization is applied for frame regression. Finally, Soft-NMS (Soft Non-maximum Suppression) is used to remove bounding boxes to obtain the best results. The experimental results show that the improved Faster-RCNN can better detect small objects and occluded objects, and its accuracy is 7.7% and 4.1% respectively higher than that of the baseline in the eight categories selected from the COCO2017 and BDD100k data sets.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Xu Han ◽  
Lining Zhao ◽  
Yue Ning ◽  
Jingfeng Hu

The application of ship detection for assistant intelligent ship navigation has stringent requirements for the model’s detection speed and accuracy. In response to this problem, this study uses an improved YOLO-V4 detection model (ShipYOLO) to detect ships. Compared to YOLO-V4, the model has three main improvements. Firstly, the backbone network (CSPDarknet) of YOLO-V4 is optimized. In the training process, the 3  ×  3 convolution, 1  ×  1 convolution, and identity parallel mode are used to replace the original feature extraction component (ResUnit) and more features are extracted. In the inference process, the branch parameters are combined to form a new backbone network named RCSPDarknet, which improves the inference speed of the model while improving the accuracy. Secondly, in order to solve the problem of missed detection of the small-scale ships, we designed a new amplified receptive field module named DSPP with dilated convolution and Max-Pooling, which improves the model’s acquisition of small-scale ship spatial information and robustness of ship target space displacement. Finally, we use the attention mechanism and Resnet’s shortcut idea to improve the feature pyramid structure (PAFPN) of YOLO-V4 and get a new feature pyramid structure named AtFPN. The structure effectively improves the model’s feature extraction effect for ships of different scales and reduces the number of model parameters, further improving the model’s inference speed and detection accuracy. In addition, we have created a ship dataset with a total of 2238 images, which is a single-category dataset. The experimental results show that ShipYOLO has the advantage of faster speed and higher accuracy even in different input sizes. Considering the input size of 320  ×  320 on the PC equipped with NVIDIA 1080Ti GPU, the FPS and mAP@5 : 5:95 (mAP90) of ShipYOLO are increased by 23.7% and 13.6% (10.6%), respectively, with an input size of 320  ×  320, ShipYOLO, compared to YOLO-V4.


Author(s):  
Chaojun Lin ◽  
Ying Shi ◽  
Jian Zhang ◽  
Changjun Xie ◽  
Wei Chen ◽  
...  

Environmental perception of urban roads is a critical research goal in intelligent transportation technology and autonomous vehicles, and pedestrian location is key to many relevant algorithms. Because anchor-free detectors are faster and region-based convolutional neural networks have a higher accuracy in object detection and classification, we propose an integrated convolutional networking architecture combining an anchor-free detector with a region-based convolutional neural network in the environmental perception task. The proposed network achieves higher precision and increases inference speed by up to 30%. To acquire more accurate region boundaries than a coarse bounding box method, a semantic segmentation sub-network is adopted to predict an instance segmentation mask for each object, and more accurate segmentation results are obtained by using the Dice loss. Moreover, we present an assignment strategy using a modified feature pyramid structure and show that it improves mean average precision of pedestrian detection by 2% on average. Finally, we verify that the pretrained neural network is beneficial for small datasets. Overall, the results show that our model achieves higher precision than the approaches used for comparison.


2022 ◽  
Vol 355 ◽  
pp. 03023
Author(s):  
Linfeng Jiang ◽  
Hui Liu ◽  
Hong Zhu ◽  
Guangjian Zhang

With the development of automatic driving technology, traffic sign detection has become a very important task. However, it is a challenging task because of the complex traffic sign scene and the small size of the target. In recent years, a number of convolutional neural network (CNN) based object detection methods have brought great progress to traffic sign detection. Considering the still high false detection rate, as well as the high time overhead and computational overhead, the effect is not satisfactory. Therefore, we employ lightweight network model YOLO v5 (You Only Look Once) as our work foundation. In this paper, we propose an improved YOLO v5 method by using balances feature pyramid structure and global context block to enhance the ability of feature fusion and feature extraction. To verify our proposed method, we have conducted a lot of comparative experiments on the challenging dataset Tsinghua-Tencent-100K (TT100K). The experimental results demonstrate that the [email protected] and [email protected]:0.95 are improved by 1.9% and 2.1%, respectively.


2014 ◽  
Author(s):  
Yoav Artzi ◽  
Dipanjan Das ◽  
Slav Petrov
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document