Asymmetric Adaptive Fusion in a Two-Stream Network for RGB-D Human Detection

Wenli Zhang; Xiang Guo; Jiaqi Wang; Ning Wang; Kaizhen Chen

doi:10.3390/s21030916

Asymmetric Adaptive Fusion in a Two-Stream Network for RGB-D Human Detection

Sensors ◽

10.3390/s21030916 ◽

2021 ◽

Vol 21 (3) ◽

pp. 916

Author(s):

Wenli Zhang ◽

Xiang Guo ◽

Jiaqi Wang ◽

Ning Wang ◽

Kaizhen Chen

Keyword(s):

State Of The Art ◽

Contextual Information ◽

Human Detection ◽

Stream Network ◽

Adaptive Fusion ◽

Indoor Scenes ◽

Stable Performance ◽

Feature Pyramid ◽

Low Illumination ◽

Depth Feature

In recent years, human detection in indoor scenes has been widely applied in smart buildings and smart security, but many related challenges can still be difficult to address, such as frequent occlusion, low illumination and multiple poses. This paper proposes an asymmetric adaptive fusion two-stream network (AAFTS-net) for RGB-D human detection. This network can fully extract person-specific depth features and RGB features while reducing the typical complexity of a two-stream network. A depth feature pyramid is constructed by combining contextual information, with the motivation of combining multiscale depth features to improve the adaptability for targets of different sizes. An adaptive channel weighting (ACW) module weights the RGB-D feature channels to achieve efficient feature selection and information complementation. This paper also introduces a novel RGB-D dataset for human detection called RGBD-human, on which we verify the performance of the proposed algorithm. The experimental results show that AAFTS-net outperforms existing state-of-the-art methods and can maintain stable performance under conditions of frequent occlusion, low illumination and multiple poses.

Download Full-text

Hierarchical Multimodal Adaptive Fusion (HMAF) Network for Prediction of RGB-D Saliency

Computational Intelligence and Neuroscience ◽

10.1155/2020/8841681 ◽

2020 ◽

Vol 2020 ◽

pp. 1-9

Author(s):

Ying Lv ◽

Wujie Zhou

Keyword(s):

State Of The Art ◽

Depth Map ◽

Visual Saliency ◽

Stream Network ◽

Multimodal Features ◽

Saliency Prediction ◽

Adaptive Fusion ◽

Hierarchical Features ◽

Different Levels ◽

Rgb Image

Visual saliency prediction for RGB-D images is more challenging than that for their RGB counterparts. Additionally, very few investigations have been undertaken concerning RGB-D-saliency prediction. The proposed study presents a method based on a hierarchical multimodal adaptive fusion (HMAF) network to facilitate end-to-end prediction of RGB-D saliency. In the proposed method, hierarchical (multilevel) multimodal features are first extracted from an RGB image and depth map using a VGG-16-based two-stream network. Subsequently, the most significant hierarchical features of the said RGB image and depth map are predicted using three two-input attention modules. Furthermore, adaptive fusion of saliencies concerning the above-mentioned fused saliency features of different levels (hierarchical fusion saliency features) can be accomplished using a three-input attention module to facilitate high-accuracy RGB-D visual saliency prediction. Comparisons based on the application of the proposed HMAF-based approach against those of other state-of-the-art techniques on two challenging RGB-D datasets demonstrate that the proposed method outperforms other competing approaches consistently by a considerable margin.

Download Full-text

Image Splicing Location Based on Illumination Maps and Cluster Region Proposal Network

Applied Sciences ◽

10.3390/app11188437 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8437

Author(s):

Ye Zhu ◽

Xiaoqian Shen ◽

Shikun Liu ◽

Xiaoli Zhang ◽

Gang Yan

Keyword(s):

State Of The Art ◽

Location Information ◽

Context Information ◽

Post Processing ◽

Image Splicing ◽

Stream Network ◽

Cluster Region ◽

Multiple Feature ◽

Common Operation ◽

Feature Pyramid

Splicing is the most common operation in image forgery, where the tampered background regions are imported from different images. Illumination maps are inherent attribute of images and provide significant clues when searching for splicing locations. This paper proposes an end-to-end dual-stream network for splicing location, where the illumination stream, which includes Grey-Edge (GE) and Inverse-Intensity Chromaticity (IIC), extract the inconsistent features, and the image stream extracts the global unnatural tampered features. The dual-stream feature in our network is fused through Multiple Feature Pyramid Network (MFPN), which contains richer context information. Finally, a Cluster Region Proposal Network (C-RPN) with spatial attention and an adaptive cluster anchor are proposed to generate potential tampered regions with greater retention of location information. Extensive experiments, which were evaluated on the NIST16 and CASIA standard datasets, show that our proposed algorithm is superior to some state-of-the-art algorithms, because it achieves accurate tampered locations at the pixel level, and has great robustness in post-processing operations, such as noise, blur and JPEG recompression.

Download Full-text

Contextualized Filtering for Shared Cyber Threat Information

Sensors ◽

10.3390/s21144890 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4890

Author(s):

Athanasios Dimitriadis ◽

Christos Prassas ◽

Jose Luis Flores ◽

Boonserm Kulvatunyou ◽

Nenad Ivezic ◽

...

Keyword(s):

Information Sharing ◽

Business Processes ◽

State Of The Art ◽

Contextual Information ◽

Coarse Grained ◽

Business Information ◽

Cyber Threat ◽

Domain Expertise ◽

Multi Level ◽

Filtering Approach

Cyber threat information sharing is an imperative process towards achieving collaborative security, but it poses several challenges. One crucial challenge is the plethora of shared threat information. Therefore, there is a need to advance filtering of such information. While the state-of-the-art in filtering relies primarily on keyword- and domain-based searching, these approaches require sizable human involvement and rarely available domain expertise. Recent research revealed the need for harvesting of business information to fill the gap in filtering, albeit it resulted in providing coarse-grained filtering based on the utilization of such information. This paper presents a novel contextualized filtering approach that exploits standardized and multi-level contextual information of business processes. The contextual information describes the conditions under which a given threat information is actionable from an organization perspective. Therefore, it can automate filtering by measuring the equivalence between the context of the shared threat information and the context of the consuming organization. The paper directly contributes to filtering challenge and indirectly to automated customized threat information sharing. Moreover, the paper proposes the architecture of a cyber threat information sharing ecosystem that operates according to the proposed filtering approach and defines the characteristics that are advantageous to filtering approaches. Implementation of the proposed approach can support compliance with the Special Publication 800-150 of the National Institute of Standards and Technology.

Download Full-text

The Impact of Stationarity, Regularity, and Context on the Predictability of Individual Human Mobility

ACM Transactions on Spatial Algorithms and Systems ◽

10.1145/3459625 ◽

2021 ◽

Vol 7 (4) ◽

pp. 1-24

Author(s):

Douglas Do Couto Teixeira ◽

Aline Carneiro Viana ◽

Jussara M. Almeida ◽

Mrio S. Alvim

Keyword(s):

State Of The Art ◽

Human Mobility ◽

Contextual Information ◽

Prediction Method ◽

Mobility Prediction ◽

Mobility Patterns ◽

Distinct Cell ◽

Inherent Nature ◽

The One ◽

The Impact

Predicting mobility-related behavior is an important yet challenging task. On the one hand, factors such as one’s routine or preferences for a few favorite locations may help in predicting their mobility. On the other hand, several contextual factors, such as variations in individual preferences, weather, traffic, or even a person’s social contacts, can affect mobility patterns and make its modeling significantly more challenging. A fundamental approach to study mobility-related behavior is to assess how predictable such behavior is, deriving theoretical limits on the accuracy that a prediction model can achieve given a specific dataset. This approach focuses on the inherent nature and fundamental patterns of human behavior captured in that dataset, filtering out factors that depend on the specificities of the prediction method adopted. However, the current state-of-the-art method to estimate predictability in human mobility suffers from two major limitations: low interpretability and hardness to incorporate external factors that are known to help mobility prediction (i.e., contextual information). In this article, we revisit this state-of-the-art method, aiming at tackling these limitations. Specifically, we conduct a thorough analysis of how this widely used method works by looking into two different metrics that are easier to understand and, at the same time, capture reasonably well the effects of the original technique. We evaluate these metrics in the context of two different mobility prediction tasks, notably, next cell and next distinct cell prediction, which have different degrees of difficulty. Additionally, we propose alternative strategies to incorporate different types of contextual information into the existing technique. Our evaluation of these strategies offer quantitative measures of the impact of adding context to the predictability estimate, revealing the challenges associated with doing so in practical scenarios.

Download Full-text

Dual-Stream Guided-Learning via a Priori Optimization for Person Re-identification

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3447715 ◽

2021 ◽

Vol 17 (4) ◽

pp. 1-22

Author(s):

Junyi Wu ◽

Yan Huang ◽

Qiang Wu ◽

Zhipeng Gao ◽

Jianqiang Zhao ◽

...

Keyword(s):

Learning Strategy ◽

State Of The Art ◽

A Priori ◽

Background Information ◽

Stream Network ◽

Related Information ◽

Guided Learning ◽

Segmentation Algorithms ◽

Art Methods ◽

Background Clutter

The task of person re-identification (re-ID) is to find the same pedestrian across non-overlapping camera views. Generally, the performance of person re-ID can be affected by background clutter. However, existing segmentation algorithms cannot obtain perfect foreground masks to cover the background information clearly. In addition, if the background is completely removed, some discriminative ID-related cues (i.e., backpack or companion) may be lost. In this article, we design a dual-stream network consisting of a Provider Stream (P-Stream) and a Receiver Stream (R-Stream). The R-Stream performs an a priori optimization operation on foreground information. The P-Stream acts as a pusher to guide the R-Stream to concentrate on foreground information and some useful ID-related cues in the background. The proposed dual-stream network can make full use of the a priori optimization and guided-learning strategy to learn encouraging foreground information and some useful ID-related information in the background. Our method achieves Rank-1 accuracy of 95.4% on Market-1501, 89.0% on DukeMTMC-reID, 78.9% on CUHK03 (labeled), and 75.4% on CUHK03 (detected), outperforming state-of-the-art methods.

Download Full-text

Correlation Tracking via Self-Adaptive Fusion of Multiple Features

Information ◽

10.3390/info9100241 ◽

2018 ◽

Vol 9 (10) ◽

pp. 241 ◽

Cited By ~ 1

Author(s):

Zhi Chen ◽

Peizhong Liu ◽

Yongzhao Du ◽

Yanmin Luo ◽

Wancheng Zhang

Keyword(s):

State Of The Art ◽

Correlation Filter ◽

Multiple Features ◽

Multi Scale ◽

Tracking Algorithms ◽

Model Update ◽

Adaptive Fusion ◽

Tracking Model ◽

Update Strategy ◽

Self Adaptive

Correlation filter (CF) based tracking algorithms have shown excellent performance in comparison to most state-of-the-art algorithms on the object tracking benchmark (OTB). Nonetheless, most CF based tracking algorithms only consider limited single channel feature, and the tracking model always updated from frame-by-frame. It will generate some erroneous information when the target objects undergo sophisticated scenario changes, such as background clutter, occlusion, out-of-view, and so forth. Long-term accumulation of erroneous model updating will cause tracking drift. In order to address problems that are mentioned above, in this paper, we propose a robust multi-scale correlation filter tracking algorithm via self-adaptive fusion of multiple features. First, we fuse powerful multiple features including histogram of oriented gradients (HOG), color name (CN), and histogram of local intensities (HI) in the response layer. The weights assigned according to the proportion of response scores that are generated by each feature, which achieve self-adaptive fusion of multiple features for preferable feature representation. In the meantime the efficient model update strategy is proposed, which is performed by exploiting a pre-defined response threshold as discriminative condition for updating tracking model. In addition, we introduce an accurate multi-scale estimation method integrate with the model update strategy, which further improves the scale variation adaptability. Both qualitative and quantitative evaluations on challenging video sequences demonstrate that the proposed tracker performs superiorly against the state-of-the-art CF based methods.

Download Full-text

One for All: Neural Joint Modeling of Entities and Events

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016851 ◽

2019 ◽

Vol 33 ◽

pp. 6851-6858 ◽

Cited By ~ 4

Author(s):

Trung Minh Nguyen ◽

Thien Huu Nguyen

Keyword(s):

Deep Learning ◽

Recent Work ◽

State Of The Art ◽

Contextual Information ◽

Joint Modeling ◽

Event Extraction ◽

Event Trigger ◽

The Individual ◽

Novel Model ◽

Argument Roles

The previous work for event extraction has mainly focused on the predictions for event triggers and argument roles, treating entity mentions as being provided by human annotators. This is unrealistic as entity mentions are usually predicted by some existing toolkits whose errors might be propagated to the event trigger and argument role recognition. Few of the recent work has addressed this problem by jointly predicting entity mentions, event triggers and arguments. However, such work is limited to using discrete engineering features to represent contextual information for the individual tasks and their interactions. In this work, we propose a novel model to jointly perform predictions for entity mentions, event triggers and arguments based on the shared hidden representations from deep learning. The experiments demonstrate the benefits of the proposed method, leading to the state-of-the-art performance for event extraction.

Download Full-text

Preface to the April 2018 Issue including selected works from CIbSE 2017 and LACLO 2016

CLEI electronic journal ◽

10.19153/cleiej.21.1.0 ◽

2018 ◽

Vol 21 (1) ◽

Author(s):

Héctor Cancela ◽

Isabel Brito ◽

Luca Cernuzzi ◽

Marcela Genero ◽

Jesús García Molina ◽

...

Keyword(s):

Costa Rica ◽

Software Engineering ◽

Latin American ◽

Buenos Aires ◽

Review Paper ◽

State Of The Art ◽

Contextual Information ◽

Learning Objects ◽

The State ◽

San Jose

This issue of the CLEIej consists of three main parts: i) a review paper on the state of the art of how contextual information extracted from a user task can help to improve searches for contents relevant to this task; ii) extended and revised versions of Selected Papers (which correspond to the second and third best paper from each track) presented at the XX Ibero-American Conference on Software Engineering (CIbSE 2017), which took place in Buenos Aires, Argentina, in May 2017; and, iii) extended and revised versions of selected papers from LACLO 2016, the XI Latin American Conference on Learning Objects and Technology, which took place in San José, Costa Rica, in October 2016.

Download Full-text

Mosaic Super-resolution via Sequential Feature Pyramid Networks

10.36227/techrxiv.11402130 ◽

2019 ◽

Author(s):

Mehrdad Shoeiby ◽

Mohammad Ali Armin ◽

Sadegh Aliakbarian ◽

Saeed Anwar ◽

Lars petersson

Keyword(s):

State Of The Art ◽

Super Resolution ◽

Autonomous Driving ◽

Single Shot ◽

Current State ◽

Wide Range ◽

Feature Pyramid ◽

Novel Method ◽

Convolutional Lstm ◽

Mosaic Images

<div>Advances in the design of multi-spectral cameras have</div><div>led to great interests in a wide range of applications, from</div><div>astronomy to autonomous driving. However, such cameras</div><div>inherently suffer from a trade-off between the spatial and</div><div>spectral resolution. In this paper, we propose to address</div><div>this limitation by introducing a novel method to carry out</div><div>super-resolution on raw mosaic images, multi-spectral or</div><div>RGB Bayer, captured by modern real-time single-shot mo-</div><div>saic sensors. To this end, we design a deep super-resolution</div><div>architecture that benefits from a sequential feature pyramid</div><div>along the depth of the network. This, in fact, is achieved</div><div>by utilizing a convolutional LSTM (ConvLSTM) to learn the</div><div>inter-dependencies between features at different receptive</div><div>fields. Additionally, by investigating the effect of different</div><div>attention mechanisms in our framework, we show that a</div><div>ConvLSTM inspired module is able to provide superior at-</div><div>tention in our context. Our extensive experiments and anal-</div><div>yses evidence that our approach yields significant super-</div><div>resolution quality, outperforming current state-of-the-art</div><div>mosaic super-resolution methods on both Bayer and multi-</div><div>spectral images. Additionally, to the best of our knowledge,</div><div>our method is the first specialized method to super-resolve</div><div>mosaic images, whether it be multi-spectral or Bayer.</div><div><br></div>

Download Full-text

Non-Local Context Encoder: Robust Biomedical Image Segmentation against Adversarial Attacks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018417 ◽

2019 ◽

Vol 33 ◽

pp. 8417-8424 ◽

Cited By ~ 6

Author(s):

Xiang He ◽

Sibei Yang ◽

Guanbin Li ◽

Haofeng Li ◽

Huiyou Chang ◽

...

Keyword(s):

Image Segmentation ◽

State Of The Art ◽

Contextual Information ◽

Local Context ◽

Deep Convolutional Neural Networks ◽

Biomedical Image ◽

Feature Representations ◽

Segmentation Methods ◽

Spatial Dependencies ◽

Non Local

Recent progress in biomedical image segmentation based on deep convolutional neural networks (CNNs) has drawn much attention. However, its vulnerability towards adversarial samples cannot be overlooked. This paper is the first one that discovers that all the CNN-based state-of-the-art biomedical image segmentation models are sensitive to adversarial perturbations. This limits the deployment of these methods in safety-critical biomedical fields. In this paper, we discover that global spatial dependencies and global contextual information in a biomedical image can be exploited to defend against adversarial attacks. To this end, non-local context encoder (NLCE) is proposed to model short- and long-range spatial dependencies and encode global contexts for strengthening feature activations by channel-wise attention. The NLCE modules enhance the robustness and accuracy of the non-local context encoding network (NLCEN), which learns robust enhanced pyramid feature representations with NLCE modules, and then integrates the information across different levels. Experiments on both lung and skin lesion segmentation datasets have demonstrated that NLCEN outperforms any other state-of-the-art biomedical image segmentation methods against adversarial attacks. In addition, NLCE modules can be applied to improve the robustness of other CNN-based biomedical image segmentation methods.

Download Full-text