scholarly journals Crowd Counting Guided by Attention Network

Information ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 567
Author(s):  
Pei Nie ◽  
Cien Fan ◽  
Lian Zou ◽  
Liqiong Chen ◽  
Xiaopeng Li

Crowd Crowd counting is not simply a matter of counting the numbers of people, but also requires that one obtains people’s spatial distribution in a picture. It is still a challenging task for crowded scenes, occlusion, and scale variation. This paper proposes a global and local attention network (GLANet) for efficient crowd counting, which applies an attention mechanism to enhance the features. Firstly, the feature extractor module (FEM) uses the pertained VGG-16 to parse out a simple feature map. Secondly, the global and local attention module (GLAM) effectively captures the local and global attention information to enhance features. Thirdly, the feature fusing module (FFM) applies a series of convolutions to fuse various features, and generate density maps. Finally, we conduct some experiments on a mainstream dataset and compare them with state-of-the-art methods’ performances.

2019 ◽  
Vol 9 (14) ◽  
pp. 2917 ◽  
Author(s):  
Yan Chen ◽  
Chengming Zhang ◽  
Shouyi Wang ◽  
Jianping Li ◽  
Feng Li ◽  
...  

Using satellite remote sensing has become a mainstream approach for extracting crop spatial distribution. Making edges finer is a challenge, while simultaneously extracting crop spatial distribution information from high-resolution remote sensing images using a convolutional neural network (CNN). Based on the characteristics of the crop area in the Gaofen 2 (GF-2) images, this paper proposes an improved CNN to extract fine crop areas. The CNN comprises a feature extractor and a classifier. The feature extractor employs a spectral feature extraction unit to generate spectral features, and five coding-decoding-pair units to generate five level features. A linear model is used to fuse features of different levels, and the fusion results are up-sampled to obtain a feature map consistent with the structure of the input image. This feature map is used by the classifier to perform pixel-by-pixel classification. In this study, the SegNet and RefineNet models and 21 GF-2 images of Feicheng County, Shandong Province, China, were chosen for comparison experiment. Our approach had an accuracy of 93.26%, which is higher than those of the existing SegNet (78.12%) and RefineNet (86.54%) models. This demonstrates the superiority of the proposed method in extracting crop spatial distribution information from GF-2 remote sensing images.


Author(s):  
Chengfeng Xu ◽  
Pengpeng Zhao ◽  
Yanchi Liu ◽  
Victor S. Sheng ◽  
Jiajie Xu ◽  
...  

Session-based recommendation, which aims to predict the user's immediate next action based on anonymous sessions, is a key task in many online services (e.g., e-commerce, media streaming).  Recently, Self-Attention Network (SAN) has achieved significant success in various sequence modeling tasks without using either recurrent or convolutional network. However, SAN lacks local dependencies that exist over adjacent items and limits its capacity for learning contextualized representations of items in sequences.  In this paper, we propose a graph contextualized self-attention model (GC-SAN), which utilizes both graph neural network and self-attention mechanism, for session-based recommendation. In GC-SAN, we dynamically construct a graph structure for session sequences and capture rich local dependencies via graph neural network (GNN).  Then each session learns long-range dependencies by applying the self-attention mechanism. Finally, each session is represented as a linear combination of the global preference and the current interest of that session. Extensive experiments on two real-world datasets show that GC-SAN outperforms state-of-the-art methods consistently.


2020 ◽  
Vol 29 (15) ◽  
pp. 2050250
Author(s):  
Xiongfei Liu ◽  
Bengao Li ◽  
Xin Chen ◽  
Haiyan Zhang ◽  
Shu Zhan

This paper proposes a novel method for person image generation with arbitrary target pose. Given a person image and an arbitrary target pose, our proposed model can synthesize images with the same person but different poses. The Generative Adversarial Networks (GANs) are the major part of the proposed model. Different from the traditional GANs, we add attention mechanism to the generator in order to generate realistic-looking images, we also use content reconstruction with a pretrained VGG16 Net to keep the content consistency between generated images and target images. Furthermore, we test our model on DeepFashion and Market-1501 datasets. The experimental results show that the proposed network performs favorably against state-of-the-art methods.


2022 ◽  
Vol 16 (2) ◽  
pp. 1-20
Author(s):  
Zhenyu Zhang ◽  
Lei Zhang ◽  
Dingqi Yang ◽  
Liu Yang

Recommender algorithms combining knowledge graph and graph convolutional network are becoming more and more popular recently. Specifically, attributes describing the items to be recommended are often used as additional information. These attributes along with items are highly interconnected, intrinsically forming a Knowledge Graph (KG). These algorithms use KGs as an auxiliary data source to alleviate the negative impact of data sparsity. However, these graph convolutional network based algorithms do not distinguish the importance of different neighbors of entities in the KG, and according to Pareto’s principle, the important neighbors only account for a small proportion. These traditional algorithms can not fully mine the useful information in the KG. To fully release the power of KGs for building recommender systems, we propose in this article KRAN, a Knowledge Refining Attention Network, which can subtly capture the characteristics of the KG and thus boost recommendation performance. We first introduce a traditional attention mechanism into the KG processing, making the knowledge extraction more targeted, and then propose a refining mechanism to improve the traditional attention mechanism to extract the knowledge in the KG more effectively. More precisely, KRAN is designed to use our proposed knowledge-refining attention mechanism to aggregate and obtain the representations of the entities (both attributes and items) in the KG. Our knowledge-refining attention mechanism first measures the relevance between an entity and it’s neighbors in the KG by attention coefficients, and then further refines the attention coefficients using a “richer-get-richer” principle, in order to focus on highly relevant neighbors while eliminating less relevant neighbors for noise reduction. In addition, for the item cold start problem, we propose KRAN-CD, a variant of KRAN, which further incorporates pre-trained KG embeddings to handle cold start items. Experiments show that KRAN and KRAN-CD consistently outperform state-of-the-art baselines across different settings.


Author(s):  
Yue Yuan ◽  
Xiaofei Zhou ◽  
Shirui Pan ◽  
Qiannan Zhu ◽  
Zeliang Song ◽  
...  

Joint extraction of entities and relations is an important task in natural language processing (NLP), which aims to capture all relational triplets from plain texts. This is a big challenge due to some of the triplets extracted from one sentence may have overlapping entities. Most existing methods perform entity recognition followed by relation detection between every possible entity pairs, which usually suffers from numerous redundant operations. In this paper, we propose a relation-specific attention network (RSAN) to handle the issue. Our RSAN utilizes relation-aware attention mechanism to construct specific sentence representations for each relation, and then performs sequence labeling to extract its corresponding head and tail entities. Experiments on two public datasets show that our model can effectively extract overlapping triplets and achieve state-of-the-art performance.


2020 ◽  
Vol 8 ◽  
pp. 172-182
Author(s):  
Ji Zhang ◽  
Chengyao Chen ◽  
Pengfei Liu ◽  
Chao He ◽  
Cane Wing-Ki Leung

Target-dependent sentiment analysis (TDSA) aims to classify the sentiment of a text towards a given target. The major challenge of this task lies in modeling the semantic relatedness between a target and its context sentence. This paper proposes a novel Target-Guided Structured Attention Network (TG-SAN), which captures target-related contexts for TDSA in a fine-to-coarse manner. Given a target and its context sentence, the proposed TG-SAN first identifies multiple semantic segments from the sentence using a target-guided structured attention mechanism. It then fuses the extracted segments based on their relatedness with the target for sentiment classification. We present comprehensive comparative experiments on three benchmarks with three major findings. First, TG-SAN outperforms the state-of-the-art by up to 1.61% and 3.58% in terms of accuracy and Marco-F1, respectively. Second, it shows a strong advantage in determining the sentiment of a target when the context sentence contains multiple semantic segments. Lastly, visualization results show that the attention scores produced by TG-SAN are highly interpretable


2021 ◽  
Vol 11 (17) ◽  
pp. 7960
Author(s):  
Chang-Hwan Son

This study proposes a new attention-enhanced YOLO model that incorporates a leaf spot attention mechanism based on regions-of-interest (ROI) feature extraction into the YOLO framework for leaf disease detection. Inspired by a previous study, which revealed that leaf spot attention based on the ROI-aware feature extraction can improve leaf disease recognition accuracy significantly and outperform state-of-the-art deep learning models, this study extends the leaf spot attention model to leaf disease detection. The primary idea is that spot areas indicating leaf diseases appear only in leaves, whereas the background area does not contain useful information regarding leaf diseases. To increase the discriminative power of the feature extractor that is required in the object detection framework, it is essential to extract informative and discriminative features from the spot and leaf areas. To realize this, a new ROI-aware feature extractor, that is, a spot feature extractor was designed. To divide the leaf image into spot, leaf, and background areas, the leaf segmentation module was first pretrained, and then spot feature encoding was applied to encode spot information. Next, the ROI-aware feature extractor was connected to an ROI-aware feature fusion layer to model the leaf spot attention mechanism, and to be joined with the YOLO detection subnetwork. The experimental results confirm that the proposed ROI-aware feature extractor can improve leaf disease detection by boosting the discriminative power of the spot features. In addition, the proposed attention-enhanced YOLO model outperforms conventional state-of-the-art object detection models.


Author(s):  
Qi Zhang ◽  
Jiawen Wang ◽  
Haoran Huang ◽  
Xuanjing Huang ◽  
Yeyun Gong

In microblogging services, authors can use hashtags to mark keywords or topics. Many live social media applications (e.g., microblog retrieval, classification) can gain great benefits from these manually labeled tags. However, only a small portion of microblogs contain hashtags inputed by users. Moreover, many microblog posts contain not only textual content but also images. These visual resources also provide valuable information that may not be included in the textual content. So that it can also help to recommend hashtags more accurately. Motivated by the successful use of the attention mechanism, we propose a co-attention network incorporating textual and visual information to recommend hashtags for multimodal tweets. Experimental result on the data collected from Twitter demonstrated that the proposed method can achieve better performance than state-of-the-art methods using textual information only.


Author(s):  
Honegzhe Liu ◽  
Zhifang Deng ◽  
Cheng Xu

Gesture recognition aims at understanding dynamic gestures of the human body and is one of the most important ways of human–computer interaction; to extract more effective spatiotemporal features in gesture videos for more accurate gesture classification, a novel feature extractor network, spatiotemporal attention 3D DenseNet is proposed in this study. We extend DenseNet with 3D kernels and Refined Temporal Transition Layer based on Temporal Transition Layer, and we also explore attention mechanism in 3D ConvNets. We embed the Refined Temporal Transition Layer and attention mechanism in DenseNet3D, named the proposed network “spatiotemporal attention 3D DenseNet.” Our experiments show that our Refined Temporal Transition Layer performs better than Temporal Transition Layer and the proposed spatiotemporal attention 3D DenseNet in each modality outperforms the current state-of-the-art methods on the ChaLearn LAP Large-Scale Isolated gesture dataset. The code and pretrained model are released in https://github.com/dzf19927/STA3D .


2021 ◽  
Author(s):  
Hye-jin Shim ◽  
Ju-ho Kim ◽  
Jee-weon Jung ◽  
Ha-Jin Yu

The attention mechanism has been widely adopted in acoustic scene classification. However, we find that during the process of attention exclusively emphasizing information, it tends to excessively discard information although improving the performance. We propose a mechanism referred to as the attentive max feature map which combines two effective techniques, attention and max feature map, to further elaborate the attention mechanism and mitigate the abovementioned phenomenon. Furthermore, we explore various joint learning methods that utilize additional labels originally generated for subtask B (3-classes) on top of existing labels for subtask A (10-classes) of the DCASE2020 challenge. We expect that using two kinds of labels simultaneously would be helpful because the labels of the two subtasks differ in their degree of abstraction. Applying two proposed techniques, our proposed system achieves state-of-the-art performance among single systems on subtask A. In addition, because the model has a complexity comparable to subtask B's requirement, it shows the possibility of developing a system that fulfills the requirements of both subtasks; generalization on multiple devices and low-complexity.


Sign in / Sign up

Export Citation Format

Share Document