scholarly journals Eight-Channel Multispectral Image Database for Saliency Prediction

Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 970
Author(s):  
Miguel Ángel Martínez-Domingo ◽  
Juan Luis Nieves ◽  
Eva M. Valero

Saliency prediction is a very important and challenging task within the computer vision community. Many models exist that try to predict the salient regions on a scene from its RGB image values. Several new models are developed, and spectral imaging techniques may potentially overcome the limitations found when using RGB images. However, the experimental study of such models based on spectral images is difficult because of the lack of available data to work with. This article presents the first eight-channel multispectral image database of outdoor urban scenes together with their gaze data recorded using an eyetracker over several observers performing different visualization tasks. Besides, the information from this database is used to study whether the complexity of the images has an impact on the saliency maps retrieved from the observers. Results show that more complex images do not correlate with higher differences in the saliency maps obtained.

Author(s):  
Lai Jiang ◽  
Zhe Wang ◽  
Mai Xu ◽  
Zulin Wang

The transformed domain fearures of images show effectiveness in distinguishing salient and non-salient regions. In this paper, we propose a novel deep complex neural network, named SalDCNN, to predict image saliency by learning features in both pixel and transformed domains. Before proposing Sal-DCNN, we analyze the saliency cues encoded in discrete Fourier transform (DFT) domain. Consequently, we have the following findings: 1) the phase spectrum encodes most saliency cues; 2) a certain pattern of the amplitude spectrum is important for saliency prediction; 3) the transformed domain spectrum is robust to noise and down-sampling for saliency prediction. According to these findings, we develop the structure of SalDCNN, including two main stages: the complex dense encoder and three-stream multi-domain decoder. Given the new SalDCNN structure, the saliency maps can be predicted under the supervision of ground-truth fixation maps in both pixel and transformed domains. Finally, the experimental results show that our Sal-DCNN method outperforms other 8 state-of-theart methods for image saliency prediction on 3 databases.


Author(s):  
Zijun Deng ◽  
Xiaowei Hu ◽  
Lei Zhu ◽  
Xuemiao Xu ◽  
Jing Qin ◽  
...  

Saliency detection is a fundamental yet challenging task in computer vision, aiming at highlighting the most visually distinctive objects in an image. We propose a novel recurrent residual refinement network (R^3Net) equipped with residual refinement blocks (RRBs) to more accurately detect salient regions of an input image. Our RRBs learn the residual between the intermediate saliency prediction and the ground truth by alternatively leveraging the low-level integrated features and the high-level integrated features of a fully convolutional network (FCN). While the low-level integrated features are capable of capturing more saliency details, the high-level integrated features can reduce non-salient regions in the intermediate prediction. Furthermore, the RRBs can obtain complementary saliency information of the intermediate prediction, and add the residual into the intermediate prediction to refine the saliency maps. We evaluate the proposed R^3Net on five widely-used saliency detection benchmarks by comparing it with 16 state-of-the-art saliency detectors. Experimental results show that our network outperforms our competitors in all the benchmark datasets.


2020 ◽  
Author(s):  
Guoliang Liu

Full resolution depth is required in many realworld engineering applications. However, exist depth sensorsonly offer sparse depth sample points with limited resolutionand noise, e.g., LiDARs. We here propose a deep learningbased full resolution depth recovery method from monocularimages and corresponding sparse depth measurements of targetenvironment. The novelty of our idea is that the structure similarinformation between the RGB image and depth image is used torefine the dense depth estimation result. This important similarstructure information can be found using a correlation layerin the regression neural network. We show that the proposedmethod can achieve higher estimation accuracy compared tothe state of the art methods. The experiments conducted on theNYU Depth V2 prove the novelty of our idea.<br>


Sensors ◽  
2020 ◽  
Vol 20 (8) ◽  
pp. 2170 ◽  
Author(s):  
Yuya Moroto ◽  
Keisuke Maeda ◽  
Takahiro Ogawa ◽  
Miki Haseyama

A few-shot personalized saliency prediction based on adaptive image selection considering object and visual attention is presented in this paper. Since general methods predicting personalized saliency maps (PSMs) need a large number of training images, the establishment of a theory using a small number of training images is needed. To tackle this problem, although finding persons who have visual attention similar to that of a target person is effective, all persons have to commonly gaze at many images. Thus, it becomes difficult and unrealistic when considering their burden. On the other hand, this paper introduces a novel adaptive image selection (AIS) scheme that focuses on the relationship between human visual attention and objects in images. AIS focuses on both a diversity of objects in images and a variance of PSMs for the objects. Specifically, AIS selects images so that selected images have various kinds of objects to maintain their diversity. Moreover, AIS guarantees the high variance of PSMs for persons since it represents the regions that many persons commonly gaze at or do not gaze at. The proposed method enables selecting similar users from a small number of images by selecting images that have high diversities and variances. This is the technical contribution of this paper. Experimental results show the effectiveness of our personalized saliency prediction including the new image selection scheme.


Author(s):  
Claudio Ferrari ◽  
Stefano Berretti ◽  
Alberto del Bimbo

3D face reconstruction from a single 2D image is a fundamental computer vision problem of extraordinary difficulty that dates back to the 1980s. Briefly, it is the task of recovering the three-dimensional geometry of a human face from a single RGB image. While the problem of automatically estimating the 3D structure of a generic scene from RGB images can be regarded as a general task, the particular morphology and non-rigid nature of human faces make it a challenging problem for which dedicated approaches are still currently studied. This chapter aims at providing an overview of the problem, its evolutions, the current state of the art, and future trends.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 666
Author(s):  
Wenju Wang ◽  
Jiangwei Wang

Current research on the reconstruction of hyperspectral images from RGB images using deep learning mainly focuses on learning complex mappings through deeper and wider convolutional neural networks (CNNs). However, the reconstruction accuracy of the hyperspectral image is not high and among other issues the model for generating these images takes up too much storage space. In this study, we propose the double ghost convolution attention mechanism network (DGCAMN) framework for the reconstruction of a single RGB image to improve the accuracy of spectral reconstruction and reduce the storage occupied by the model. The proposed DGCAMN consists of a double ghost residual attention block (DGRAB) module and optimal nonlocal block (ONB). DGRAB module uses GhostNet and PRELU activation functions to reduce the calculation parameters of the data and reduce the storage size of the generative model. At the same time, the proposed double output feature Convolutional Block Attention Module (DOFCBAM) is used to capture the texture details on the feature map to maximize the content of the reconstructed hyperspectral image. In the proposed ONB, the Argmax activation function is used to obtain the region with the most abundant feature information and maximize the most useful feature parameters. This helps to improve the accuracy of spectral reconstruction. These contributions enable the DGCAMN framework to achieve the highest spectral accuracy with minimal storage consumption. The proposed method has been applied to the NTIRE 2020 dataset. Experimental results show that the proposed DGCAMN method outperforms the spectral accuracy reconstructed by advanced deep learning methods and greatly reduces storage consumption.


2020 ◽  
Vol 12 (7) ◽  
pp. 1142
Author(s):  
Jeonghoon Kwak ◽  
Yunsick Sung

To provide a realistic environment for remote sensing applications, point clouds are used to realize a three-dimensional (3D) digital world for the user. Motion recognition of objects, e.g., humans, is required to provide realistic experiences in the 3D digital world. To recognize a user’s motions, 3D landmarks are provided by analyzing a 3D point cloud collected through a light detection and ranging (LiDAR) system or a red green blue (RGB) image collected visually. However, manual supervision is required to extract 3D landmarks as to whether they originate from the RGB image or the 3D point cloud. Thus, there is a need for a method for extracting 3D landmarks without manual supervision. Herein, an RGB image and a 3D point cloud are used to extract 3D landmarks. The 3D point cloud is utilized as the relative distance between a LiDAR and a user. Because it cannot contain all information the user’s entire body due to disparities, it cannot generate a dense depth image that provides the boundary of user’s body. Therefore, up-sampling is performed to increase the density of the depth image generated based on the 3D point cloud; the density depends on the 3D point cloud. This paper proposes a system for extracting 3D landmarks using 3D point clouds and RGB images without manual supervision. A depth image provides the boundary of a user’s motion and is generated by using 3D point cloud and RGB image collected by a LiDAR and an RGB camera, respectively. To extract 3D landmarks automatically, an encoder–decoder model is trained with the generated depth images, and the RGB images and 3D landmarks are extracted from these images with the trained encoder model. The method of extracting 3D landmarks using RGB depth (RGBD) images was verified experimentally, and 3D landmarks were extracted to evaluate the user’s motions with RGBD images. In this manner, landmarks could be extracted according to the user’s motions, rather than by extracting them using the RGB images. The depth images generated by the proposed method were 1.832 times denser than the up-sampling-based depth images generated with bilateral filtering.


Author(s):  
T. Yamakawa ◽  
K. Fukano ◽  
R. Onodera ◽  
H. Masuda

Mobile mapping systems (MMS) can capture dense point-clouds of urban scenes. For visualizing realistic scenes using point-clouds, RGB colors have to be added to point-clouds. To generate colored point-clouds in a post-process, each point is projected onto camera images and a RGB color is copied to the point at the projected position. However, incorrect colors are often added to point-clouds because of the misalignment of laser scanners, the calibration errors of cameras and laser scanners, or the failure of GPS acquisition. In this paper, we propose a new method to correct RGB colors of point-clouds captured by a MMS. In our method, RGB colors of a point-cloud are corrected by comparing intensity images and RGB images. However, since a MMS outputs sparse and anisotropic point-clouds, regular images cannot be obtained from intensities of points. Therefore, we convert a point-cloud into a mesh model and project triangle faces onto image space, on which regular lattices are defined. Then we extract edge features from intensity images and RGB images, and detect their correspondences. In our experiments, our method worked very well for correcting RGB colors of point-clouds captured by a MMS.


2021 ◽  
Author(s):  
Yusuf Saber

In this work, three novel approaches to detecting visual attention in images are presented. The idea behind detecting areas within images or video that naturally attract a viewer’s attention is based on the concept of generating pre-attentive saliency maps. Saliency, in and of itself, relates to some measure of “conspicuity” in the visual field and is believed to be an important precursor for many tasks in computer vision. One of the proposed methods in this thesis detects salient regions, while the other two detect salient edges. The classical approach to saliency detection proposed by Itti is extended by introducing wavelets as a lossless resizing tool while maintaining the aspect of biological inspiration. In addition to this, the spectral residual method and the frequency tuned method are modified using wavelets to allow for salient edge detection. Tests show that the proposed methods yield results that are not only comparable to leading,cutting-edge methods, but also exceed them in terms of correct and complete object detection as well as noise reduction.


Sign in / Sign up

Export Citation Format

Share Document