Crowd Counting Guided by Attention Network

Pei Nie; Cien Fan; Lian Zou; Liqiong Chen; Xiaopeng Li

doi:10.3390/info11120567

Crowd Counting Guided by Attention Network

Information ◽

10.3390/info11120567 ◽

2020 ◽

Vol 11 (12) ◽

pp. 567

Author(s):

Pei Nie ◽

Cien Fan ◽

Lian Zou ◽

Liqiong Chen ◽

Xiaopeng Li

Keyword(s):

Spatial Distribution ◽

State Of The Art ◽

Attention Mechanism ◽

Attention Network ◽

Crowd Counting ◽

Feature Map ◽

Density Maps ◽

Feature Extractor ◽

Global And Local ◽

Crowded Scenes

Crowd Crowd counting is not simply a matter of counting the numbers of people, but also requires that one obtains people’s spatial distribution in a picture. It is still a challenging task for crowded scenes, occlusion, and scale variation. This paper proposes a global and local attention network (GLANet) for efficient crowd counting, which applies an attention mechanism to enhance the features. Firstly, the feature extractor module (FEM) uses the pertained VGG-16 to parse out a simple feature map. Secondly, the global and local attention module (GLAM) effectively captures the local and global attention information to enhance features. Thirdly, the feature fusing module (FFM) applies a series of convolutions to fuse various features, and generate density maps. Finally, we conduct some experiments on a mainstream dataset and compare them with state-of-the-art methods’ performances.

Download Full-text

Extracting Crop Spatial Distribution from Gaofen 2 Imagery Using a Convolutional Neural Network

Applied Sciences ◽

10.3390/app9142917 ◽

2019 ◽

Vol 9 (14) ◽

pp. 2917 ◽

Cited By ~ 3

Author(s):

Yan Chen ◽

Chengming Zhang ◽

Shouyi Wang ◽

Jianping Li ◽

Feng Li ◽

...

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Spatial Distribution ◽

Convolutional Neural Network ◽

Spectral Feature ◽

Input Image ◽

Remote Sensing Images ◽

Feature Map ◽

Feature Extractor ◽

Extraction Unit

Using satellite remote sensing has become a mainstream approach for extracting crop spatial distribution. Making edges finer is a challenge, while simultaneously extracting crop spatial distribution information from high-resolution remote sensing images using a convolutional neural network (CNN). Based on the characteristics of the crop area in the Gaofen 2 (GF-2) images, this paper proposes an improved CNN to extract fine crop areas. The CNN comprises a feature extractor and a classifier. The feature extractor employs a spectral feature extraction unit to generate spectral features, and five coding-decoding-pair units to generate five level features. A linear model is used to fuse features of different levels, and the fusion results are up-sampled to obtain a feature map consistent with the structure of the input image. This feature map is used by the classifier to perform pixel-by-pixel classification. In this study, the SegNet and RefineNet models and 21 GF-2 images of Feicheng County, Shandong Province, China, were chosen for comparison experiment. Our approach had an accuracy of 93.26%, which is higher than those of the existing SegNet (78.12%) and RefineNet (86.54%) models. This demonstrates the superiority of the proposed method in extracting crop spatial distribution information from GF-2 remote sensing images.

Download Full-text

Graph Contextualized Self-Attention Network for Session-based Recommendation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/547 ◽

2019 ◽

Cited By ~ 20

Author(s):

Chengfeng Xu ◽

Pengpeng Zhao ◽

Yanchi Liu ◽

Victor S. Sheng ◽

Jiajie Xu ◽

...

Keyword(s):

Neural Network ◽

State Of The Art ◽

Attention Mechanism ◽

The Self ◽

Graph Structure ◽

Convolutional Network ◽

Attention Network ◽

Attention Model ◽

Sequence Modeling ◽

Real World Datasets

Session-based recommendation, which aims to predict the user's immediate next action based on anonymous sessions, is a key task in many online services (e.g., e-commerce, media streaming). Recently, Self-Attention Network (SAN) has achieved significant success in various sequence modeling tasks without using either recurrent or convolutional network. However, SAN lacks local dependencies that exist over adjacent items and limits its capacity for learning contextualized representations of items in sequences. In this paper, we propose a graph contextualized self-attention model (GC-SAN), which utilizes both graph neural network and self-attention mechanism, for session-based recommendation. In GC-SAN, we dynamically construct a graph structure for session sequences and capture rich local dependencies via graph neural network (GNN). Then each session learns long-range dependencies by applying the self-attention mechanism. Finally, each session is represented as a linear combination of the global preference and the current interest of that session. Extensive experiments on two real-world datasets show that GC-SAN outperforms state-of-the-art methods consistently.

Download Full-text

Content-Based Attention Network for Person Image Generation

Journal of Circuits System and Computers ◽

10.1142/s0218126620502503 ◽

2020 ◽

Vol 29 (15) ◽

pp. 2050250

Author(s):

Xiongfei Liu ◽

Bengao Li ◽

Xin Chen ◽

Haiyan Zhang ◽

Shu Zhan

Keyword(s):

Major Part ◽

State Of The Art ◽

Attention Mechanism ◽

Experimental Results ◽

Generative Adversarial Networks ◽

Image Generation ◽

Attention Network ◽

Adversarial Networks ◽

Proposed Model ◽

Novel Method

This paper proposes a novel method for person image generation with arbitrary target pose. Given a person image and an arbitrary target pose, our proposed model can synthesize images with the same person but different poses. The Generative Adversarial Networks (GANs) are the major part of the proposed model. Different from the traditional GANs, we add attention mechanism to the generator in order to generate realistic-looking images, we also use content reconstruction with a pretrained VGG16 Net to keep the content consistency between generated images and target images. Furthermore, we test our model on DeepFashion and Market-1501 datasets. The experimental results show that the proposed network performs favorably against state-of-the-art methods.

Download Full-text

KRAN: Knowledge Refining Attention Network for Recommendation

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3470783 ◽

2022 ◽

Vol 16 (2) ◽

pp. 1-20

Author(s):

Zhenyu Zhang ◽

Lei Zhang ◽

Dingqi Yang ◽

Liu Yang

Keyword(s):

State Of The Art ◽

Negative Impact ◽

Cold Start ◽

Attention Mechanism ◽

Knowledge Graph ◽

Convolutional Network ◽

Auxiliary Data ◽

Attention Network ◽

Additional Information ◽

Data Source

Recommender algorithms combining knowledge graph and graph convolutional network are becoming more and more popular recently. Specifically, attributes describing the items to be recommended are often used as additional information. These attributes along with items are highly interconnected, intrinsically forming a Knowledge Graph (KG). These algorithms use KGs as an auxiliary data source to alleviate the negative impact of data sparsity. However, these graph convolutional network based algorithms do not distinguish the importance of different neighbors of entities in the KG, and according to Pareto’s principle, the important neighbors only account for a small proportion. These traditional algorithms can not fully mine the useful information in the KG. To fully release the power of KGs for building recommender systems, we propose in this article KRAN, a Knowledge Refining Attention Network, which can subtly capture the characteristics of the KG and thus boost recommendation performance. We first introduce a traditional attention mechanism into the KG processing, making the knowledge extraction more targeted, and then propose a refining mechanism to improve the traditional attention mechanism to extract the knowledge in the KG more effectively. More precisely, KRAN is designed to use our proposed knowledge-refining attention mechanism to aggregate and obtain the representations of the entities (both attributes and items) in the KG. Our knowledge-refining attention mechanism first measures the relevance between an entity and it’s neighbors in the KG by attention coefficients, and then further refines the attention coefficients using a “richer-get-richer” principle, in order to focus on highly relevant neighbors while eliminating less relevant neighbors for noise reduction. In addition, for the item cold start problem, we propose KRAN-CD, a variant of KRAN, which further incorporates pre-trained KG embeddings to handle cold start items. Experiments show that KRAN and KRAN-CD consistently outperform state-of-the-art baselines across different settings.

Download Full-text

A Relation-Specific Attention Network for Joint Entity and Relation Extraction

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/561 ◽

2020 ◽

Author(s):

Yue Yuan ◽

Xiaofei Zhou ◽

Shirui Pan ◽

Qiannan Zhu ◽

Zeliang Song ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Relation Extraction ◽

Attention Mechanism ◽

Important Task ◽

Entity Recognition ◽

Attention Network ◽

Public Datasets

Joint extraction of entities and relations is an important task in natural language processing (NLP), which aims to capture all relational triplets from plain texts. This is a big challenge due to some of the triplets extracted from one sentence may have overlapping entities. Most existing methods perform entity recognition followed by relation detection between every possible entity pairs, which usually suffers from numerous redundant operations. In this paper, we propose a relation-specific attention network (RSAN) to handle the issue. Our RSAN utilizes relation-aware attention mechanism to construct specific sentence representations for each relation, and then performs sequence labeling to extract its corresponding head and tail entities. Experiments on two public datasets show that our model can effectively extract overlapping triplets and achieve state-of-the-art performance.

Download Full-text

Target-Guided Structured Attention Network for Target-Dependent Sentiment Analysis

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00308 ◽

2020 ◽

Vol 8 ◽

pp. 172-182

Author(s):

Ji Zhang ◽

Chengyao Chen ◽

Pengfei Liu ◽

Chao He ◽

Cane Wing-Ki Leung

Keyword(s):

Sentiment Analysis ◽

State Of The Art ◽

Semantic Relatedness ◽

The State ◽

Attention Mechanism ◽

Sentiment Classification ◽

Context Sentence ◽

Attention Network ◽

Novel Target

Target-dependent sentiment analysis (TDSA) aims to classify the sentiment of a text towards a given target. The major challenge of this task lies in modeling the semantic relatedness between a target and its context sentence. This paper proposes a novel Target-Guided Structured Attention Network (TG-SAN), which captures target-related contexts for TDSA in a fine-to-coarse manner. Given a target and its context sentence, the proposed TG-SAN first identifies multiple semantic segments from the sentence using a target-guided structured attention mechanism. It then fuses the extracted segments based on their relatedness with the target for sentiment classification. We present comprehensive comparative experiments on three benchmarks with three major findings. First, TG-SAN outperforms the state-of-the-art by up to 1.61% and 3.58% in terms of accuracy and Marco-F1, respectively. Second, it shows a strong advantage in determining the sentiment of a target when the context sentence contains multiple semantic segments. Lastly, visualization results show that the attention scores produced by TG-SAN are highly interpretable

Download Full-text

Leaf Spot Attention Networks Based on Spot Feature Encoding for Leaf Disease Identification and Detection

Applied Sciences ◽

10.3390/app11177960 ◽

2021 ◽

Vol 11 (17) ◽

pp. 7960

Author(s):

Chang-Hwan Son

Keyword(s):

Feature Extraction ◽

Object Detection ◽

Leaf Spot ◽

State Of The Art ◽

Attention Mechanism ◽

Disease Detection ◽

Discriminative Power ◽

Leaf Disease ◽

Feature Encoding ◽

Feature Extractor

This study proposes a new attention-enhanced YOLO model that incorporates a leaf spot attention mechanism based on regions-of-interest (ROI) feature extraction into the YOLO framework for leaf disease detection. Inspired by a previous study, which revealed that leaf spot attention based on the ROI-aware feature extraction can improve leaf disease recognition accuracy significantly and outperform state-of-the-art deep learning models, this study extends the leaf spot attention model to leaf disease detection. The primary idea is that spot areas indicating leaf diseases appear only in leaves, whereas the background area does not contain useful information regarding leaf diseases. To increase the discriminative power of the feature extractor that is required in the object detection framework, it is essential to extract informative and discriminative features from the spot and leaf areas. To realize this, a new ROI-aware feature extractor, that is, a spot feature extractor was designed. To divide the leaf image into spot, leaf, and background areas, the leaf segmentation module was first pretrained, and then spot feature encoding was applied to encode spot information. Next, the ROI-aware feature extractor was connected to an ROI-aware feature fusion layer to model the leaf spot attention mechanism, and to be joined with the YOLO detection subnetwork. The experimental results confirm that the proposed ROI-aware feature extractor can improve leaf disease detection by boosting the discriminative power of the spot features. In addition, the proposed attention-enhanced YOLO model outperforms conventional state-of-the-art object detection models.

Download Full-text

Hashtag Recommendation for Multimodal Microblog Using Co-Attention Network

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/478 ◽

2017 ◽

Cited By ~ 29

Author(s):

Qi Zhang ◽

Jiawen Wang ◽

Haoran Huang ◽

Xuanjing Huang ◽

Yeyun Gong

Keyword(s):

Social Media ◽

Visual Information ◽

State Of The Art ◽

Attention Mechanism ◽

Experimental Result ◽

Textual Information ◽

Attention Network ◽

Art Methods ◽

Textual Content ◽

Media Applications

In microblogging services, authors can use hashtags to mark keywords or topics. Many live social media applications (e.g., microblog retrieval, classification) can gain great benefits from these manually labeled tags. However, only a small portion of microblogs contain hashtags inputed by users. Moreover, many microblog posts contain not only textual content but also images. These visual resources also provide valuable information that may not be included in the textual content. So that it can also help to recommend hashtags more accurately. Motivated by the successful use of the attention mechanism, we propose a co-attention network incorporating textual and visual information to recommend hashtags for multimodal tweets. Experimental result on the data collected from Twitter demonstrated that the proposed method can achieve better performance than state-of-the-art methods using textual information only.

Download Full-text

Learning spatiotemporal features with 3D DenseNet and attention for gesture recognition

International Journal of Electrical Engineering Education ◽

10.1177/0020720919894196 ◽

2019 ◽

pp. 002072091989419

Author(s):

Honegzhe Liu ◽

Zhifang Deng ◽

Cheng Xu

Keyword(s):

Gesture Recognition ◽

Transition Layer ◽

Large Scale ◽

State Of The Art ◽

Attention Mechanism ◽

Spatiotemporal Features ◽

Current State ◽

Feature Extractor ◽

Dynamic Gestures ◽

Better Than

Gesture recognition aims at understanding dynamic gestures of the human body and is one of the most important ways of human–computer interaction; to extract more effective spatiotemporal features in gesture videos for more accurate gesture classification, a novel feature extractor network, spatiotemporal attention 3D DenseNet is proposed in this study. We extend DenseNet with 3D kernels and Refined Temporal Transition Layer based on Temporal Transition Layer, and we also explore attention mechanism in 3D ConvNets. We embed the Refined Temporal Transition Layer and attention mechanism in DenseNet3D, named the proposed network “spatiotemporal attention 3D DenseNet.” Our experiments show that our Refined Temporal Transition Layer performs better than Temporal Transition Layer and the proposed spatiotemporal attention 3D DenseNet in each modality outperforms the current state-of-the-art methods on the ChaLearn LAP Large-Scale Isolated gesture dataset. The code and pretrained model are released in https://github.com/dzf19927/STA3D .

Download Full-text

Attentive Max Feature Map for Acoustic Scene Classification \\ with Joint Learning considering the Abstraction of Classes

10.31219/osf.io/hrpxy ◽

2021 ◽

Author(s):

Hye-jin Shim ◽

Ju-ho Kim ◽

Jee-weon Jung ◽

Ha-Jin Yu

Keyword(s):

State Of The Art ◽

Low Complexity ◽

Attention Mechanism ◽

Scene Classification ◽

Learning Methods ◽

Joint Learning ◽

Feature Map ◽

Multiple Devices ◽

Art Performance

The attention mechanism has been widely adopted in acoustic scene classification. However, we find that during the process of attention exclusively emphasizing information, it tends to excessively discard information although improving the performance. We propose a mechanism referred to as the attentive max feature map which combines two effective techniques, attention and max feature map, to further elaborate the attention mechanism and mitigate the abovementioned phenomenon. Furthermore, we explore various joint learning methods that utilize additional labels originally generated for subtask B (3-classes) on top of existing labels for subtask A (10-classes) of the DCASE2020 challenge. We expect that using two kinds of labels simultaneously would be helpful because the labels of the two subtasks differ in their degree of abstraction. Applying two proposed techniques, our proposed system achieves state-of-the-art performance among single systems on subtask A. In addition, because the model has a complexity comparable to subtask B's requirement, it shows the possibility of developing a system that fulfills the requirements of both subtasks; generalization on multiple devices and low-complexity.

Download Full-text