scholarly journals AFI-Net: Attention-Guided Feature Integration Network for RGBD Saliency Detection

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Liming Li ◽  
Shuguang Zhao ◽  
Rui Sun ◽  
Xiaodong Chai ◽  
Shubin Zheng ◽  
...  

This article proposes an innovative RGBD saliency model, that is, attention-guided feature integration network, which can extract and fuse features and perform saliency inference. Specifically, the model first extracts multimodal and level deep features. Then, a series of attention modules are deployed to the multilevel RGB and depth features, yielding enhanced deep features. Next, the enhanced multimodal deep features are hierarchically fused. Lastly, the RGB and depth boundary features, that is, low-level spatial details, are added to the integrated feature to perform saliency inference. The key points of the AFI-Net are the attention-guided feature enhancement and the boundary-aware saliency inference, where the attention module indicates salient objects coarsely, and the boundary information is used to equip the deep feature with more spatial details. Therefore, salient objects are well characterized, that is, well highlighted. The comprehensive experiments on five challenging public RGBD datasets clearly exhibit the superiority and effectiveness of the proposed AFI-Net.

2017 ◽  
Vol 226 ◽  
pp. 212-220 ◽  
Author(s):  
Hongyang Li ◽  
Jiang Chen ◽  
Huchuan Lu ◽  
Zhizhen Chi

2019 ◽  
Vol 88 ◽  
pp. 139-152 ◽  
Author(s):  
Weiying Xie ◽  
Yanzi Shi ◽  
Yunsong Li ◽  
Xiuping Jia ◽  
Jie Lei

2018 ◽  
Vol 2018 ◽  
pp. 1-11 ◽  
Author(s):  
Hai Wang ◽  
Lei Dai ◽  
Yingfeng Cai ◽  
Long Chen ◽  
Yong Zhang

Traditional salient object detection models are divided into several classes based on low-level features and contrast between pixels. In this paper, we propose a model based on a multilevel deep pyramid (MLDP), which involves fusing multiple features on different levels. Firstly, the MLDP uses the original image as the input for a VGG16 model to extract high-level features and form an initial saliency map. Next, the MLDP further extracts high-level features to form a saliency map based on a deep pyramid. Then, the MLDP obtains the salient map fused with superpixels by extracting low-level features. After that, the MLDP applies background noise filtering to the saliency map fused with superpixels in order to filter out the interference of background noise and form a saliency map based on the foreground. Lastly, the MLDP combines the saliency map fused with the superpixels with the saliency map based on the foreground, which results in the final saliency map. The MLDP is not limited to low-level features while it fuses multiple features and achieves good results when extracting salient targets. As can be seen in our experiment section, the MLDP is better than the other 7 state-of-the-art models across three different public saliency datasets. Therefore, the MLDP has superiority and wide applicability in extraction of salient targets.


Author(s):  
Monika Singh ◽  
Anand Singh Singh Jalal ◽  
Ruchira Manke ◽  
Aamir Khan

Saliency detection has always been a challenging and interesting research area for researchers. The existing methodologies either focus on foreground regions or background regions of an image by computing low-level features. However, considering only low-level features did not produce worthy results. In this paper, low-level features, which are extracted using super pixels, are embodied with high-level priors. The background features are assumed as the low-level prior due to the similarity in the background areas and boundary of an image which are interconnected and have minimum distance in between them. High-level priors such as location, color, and semantic prior are incorporated with low-level prior to spotlight the salient area in the image. The experimental results illustrate that the proposed approach outperform the sate-of-the-art methods.


2019 ◽  
Vol 352 ◽  
pp. 75-92 ◽  
Author(s):  
Jianning Chi ◽  
Chengdong Wu ◽  
Xiaosheng Yu ◽  
Hao Chu ◽  
Peng Ji

Entropy ◽  
2019 ◽  
Vol 21 (2) ◽  
pp. 165 ◽  
Author(s):  
Xiantao Jiang ◽  
Tian Song ◽  
Daqi Zhu ◽  
Takafumi Katayama ◽  
Lu Wang

Perceptual video coding (PVC) can provide a lower bitrate with the same visual quality compared with traditional H.265/high efficiency video coding (HEVC). In this work, a novel H.265/HEVC-compliant PVC framework is proposed based on the video saliency model. Firstly, both an effective and efficient spatiotemporal saliency model is used to generate a video saliency map. Secondly, a perceptual coding scheme is developed based on the saliency map. A saliency-based quantization control algorithm is proposed to reduce the bitrate. Finally, the simulation results demonstrate that the proposed perceptual coding scheme shows its superiority in objective and subjective tests, achieving up to a 9.46% bitrate reduction with negligible subjective and objective quality loss. The advantage of the proposed method is the high quality adapted for a high-definition video application.


2019 ◽  
Vol 11 (13) ◽  
pp. 1617 ◽  
Author(s):  
Jicheng Wang ◽  
Li Shen ◽  
Wenfan Qiao ◽  
Yanshuai Dai ◽  
Zhilin Li

The classification of very-high-resolution (VHR) remote sensing images is essential in many applications. However, high intraclass and low interclass variations in these kinds of images pose serious challenges. Fully convolutional network (FCN) models, which benefit from a powerful feature learning ability, have shown impressive performance and great potential. Nevertheless, only classification results with coarse resolution can be obtained from the original FCN method. Deep feature fusion is often employed to improve the resolution of outputs. Existing strategies for such fusion are not capable of properly utilizing the low-level features and considering the importance of features at different scales. This paper proposes a novel, end-to-end, fully convolutional network to integrate a multiconnection ResNet model and a class-specific attention model into a unified framework to overcome these problems. The former fuses multilevel deep features without introducing any redundant information from low-level features. The latter can learn the contributions from different features of each geo-object at each scale. Extensive experiments on two open datasets indicate that the proposed method can achieve class-specific scale-adaptive classification results and it outperforms other state-of-the-art methods. The results were submitted to the International Society for Photogrammetry and Remote Sensing (ISPRS) online contest for comparison with more than 50 other methods. The results indicate that the proposed method (ID: SWJ_2) ranks #1 in terms of overall accuracy, even though no additional digital surface model (DSM) data that were offered by ISPRS were used and no postprocessing was applied.


2014 ◽  
Vol 602-605 ◽  
pp. 2238-2241
Author(s):  
Jian Kun Chen ◽  
Zhi Wei Kang

In this paper, we present a new visual saliency model, which based on Wavelet Transform and simple Priors. Firstly, we create multi-scale feature maps to represent different features from edge to texture in wavelet transform. Then we modulate local saliency at a location and its global saliency, combine the local saliency and global saliency to generate a new saliency .Finally, the final saliency is generated by combining the new saliency and two simple priors (color prior an location prior). Experimental evaluation shows the proposed model can achieve state-of-the-art results and better than the other models on a public available benchmark dataset.


Sensor Review ◽  
2016 ◽  
Vol 36 (2) ◽  
pp. 148-157 ◽  
Author(s):  
Tao Liu ◽  
Zhixiang Fang ◽  
Qingzhou Mao ◽  
Qingquan Li ◽  
Xing Zhang

Purpose The spatial feature is important for scene saliency detection. Scene-based visual saliency detection methods fail to incorporate 3D scene spatial aspects. This paper aims to propose a cube-based method to improve saliency detection through integrating visual and spatial features in 3D scenes. Design/methodology/approach In the presented approach, a multiscale cube pyramid is used to organize the 3D image scene and mesh model. Each 3D cube in this pyramid represents a space unit similar to a pixel in the image saliency model multiscale image pyramid. In each 3D cube color, intensity and orientation features are extracted from the image and a quantitative concave–convex descriptor is extracted from the 3D space. A Gaussian filter is then used on this pyramid of cubes with an extended center-surround difference introduced to compute the cube-based 3D scene saliency. Findings The precision-recall rate and receiver operating characteristic curve is used to evaluate the method and other state-of-art methods. The results show that the method used is better than traditional image-based methods, especially for 3D scenes. Originality/value This paper presents a method that improves the image-based visual saliency model.


Sign in / Sign up

Export Citation Format

Share Document