Relation-Based Deep Attention Network with Hybrid Memory for One-Shot Person Re-Identification

Runxuan Si; Jing Zhao; Yuhua Tang; Shaowu Yang

doi:10.3390/s21155113

Relation-Based Deep Attention Network with Hybrid Memory for One-Shot Person Re-Identification

Sensors ◽

10.3390/s21155113 ◽

2021 ◽

Vol 21 (15) ◽

pp. 5113

Author(s):

Runxuan Si ◽

Jing Zhao ◽

Yuhua Tang ◽

Shaowu Yang

Keyword(s):

Network Architecture ◽

State Of The Art ◽

Unlabeled Data ◽

Global Information ◽

Training Process ◽

Unified Framework ◽

Attention Network ◽

Hybrid Memory ◽

Model Training ◽

The One

One-shot person Re-identification, which owns one labeled sample among numerous unlabeled data for each identity, is proposed to tackle the problem of the shortage of labeled data. Considering the scenarios without sufficient labeled data, it is very challenging to keep abreast of the performance of the supervised task in which sufficient labeled samples are available. In this paper, we propose a relation-based attention network with hybrid memory, which can make full use of the global information to pay attention to the identity features for model training with the relation-based attention network. Importantly, our specially designed network architecture effectively reduces the interference of environmental noise. Moreover, we propose a hybrid memory to train the one-shot data and unlabeled data in a unified framework, which notably contributes to the performance of person Re-identification. In particular, our designed one-shot feature update mode effectively alleviates the problem of overfitting, which is caused by the lack of supervised information during the training process. Compared with state-of-the-art unsupervised and one-shot algorithms for person Re-identification, our method achieves considerable improvements of 6.7%, 4.6%, and 11.5% on Market-1501, DukeMTMC-reID, and MSMT17 datasets, respectively, and becomes the new state-of-the-art method for one-shot person Re-identification.

Download Full-text

A Dense Long Short-Term Memory Model for Enhancing the Imagery-Based Brain-Computer Interface

Computational Intelligence and Neuroscience ◽

10.1155/2021/6614677 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Xiaofei Zhang ◽

Tao Wang ◽

Qi Xiong ◽

Yina Guo

Keyword(s):

Short Term Memory ◽

Recognition Algorithm ◽

Training Process ◽

Short Term ◽

Data Set ◽

Static State ◽

Term Memory ◽

Long Short Term Memory ◽

Model Training ◽

The One

Imagery-based brain-computer interfaces (BCIs) aim to decode different neural activities into control signals by identifying and classifying various natural commands from electroencephalogram (EEG) patterns and then control corresponding equipment. However, several traditional BCI recognition algorithms have the “one person, one model” issue, where the convergence of the recognition model’s training process is complicated. In this study, a new BCI model with a Dense long short-term memory (Dense-LSTM) algorithm is proposed, which combines the event-related desynchronization (ERD) and the event-related synchronization (ERS) of the imagery-based BCI; model training and testing were conducted with its own data set. Furthermore, a new experimental platform was built to decode the neural activity of different subjects in a static state. Experimental evaluation of the proposed recognition algorithm presents an accuracy of 91.56%, which resolves the “one person one model” issue along with the difficulty of convergence in the training process.

Download Full-text

SEMI-SUPERVISED SEQUENCE CLASSIFICATION WITH HMMs

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001405004034 ◽

2005 ◽

Vol 19 (02) ◽

pp. 165-182 ◽

Cited By ~ 7

Author(s):

SHI ZHONG

Keyword(s):

Supervised Learning ◽

Learning Strategies ◽

Test Data ◽

Unlabeled Data ◽

Training Data ◽

Model Complexity ◽

Model Parameters ◽

Training Process ◽

Transductive Learning ◽

Model Training

Using unlabeled data to help supervised learning has become an increasingly attractive methodology and proven to be effective in many applications. This paper applies semi-supervised classification algorithms, based on hidden Markov models, to classify sequences. For model-based classification, semi-supervised learning amounts to using both labeled and unlabeled data to train model parameters. We examine three different strategies of using labeled and unlabeled data in the model training process. These strategies differ in how and when labeled and unlabeled data contribute to the model training process. We also compare regular semi-supervised learning, where there are separate unlabeled training data and unlabeled test data, with transductive learning where we do not differentiate between unlabeled training data and unlabeled test data. Our experimental results on synthetic and real EEG time-series show that substantially improved classification accuracy can be achieved by these semi-supervised learning strategies. The effect of model complexity on semi-supervised learning is also studied in our experiments.

Download Full-text

DR-Net: An Improved Network for Building Extraction from High Resolution Remote Sensing Image

Remote Sensing ◽

10.3390/rs13020294 ◽

2021 ◽

Vol 13 (2) ◽

pp. 294

Author(s):

Meng Chen ◽

Jianjun Wu ◽

Leizhen Liu ◽

Wenhui Zhao ◽

Feng Tian ◽

...

Keyword(s):

Neural Network ◽

Remote Sensing ◽

State Of The Art ◽

Complex Structure ◽

Building Extraction ◽

Training Process ◽

Residual Network ◽

Data Set ◽

Deep Layers ◽

The One

At present, convolutional neural networks (CNN) have been widely used in building extraction from remote sensing imagery (RSI), but there are still some bottlenecks. On the one hand, there are so many parameters in the previous network with complex structure, which will occupy lots of memories and consume much time during training process. On the other hand, low-level features extracted by shallow layers and abstract features extracted by deep layers of artificial neural network cannot be fully fused, which leads to an inaccurate building extraction from RSI. To alleviate these disadvantages, a dense residual neural network (DR-Net) was proposed in this paper. DR-Net uses a deeplabv3+Net encoder/decoder backbone, in combination with densely connected convolution neural network (DCNN) and residual network (ResNet) structure. Compared with deeplabv3+net (containing about 41 million parameters) and BRRNet (containing about 17 million parameters), DR-Net contains about 9 million parameters; So, the number of parameters reduced a lot. The experimental results for both the WHU Building Dataset and Massachusetts Building Dataset, DR-Net show better performance in building extraction than other two state-of-the-art methods. Experiments on WHU building data set showed that Intersection over Union (IoU) increased by 2.4% and F1 score increased by 1.4%; in terms of Massachusetts Building Dataset, IoU increased by 3.8% and F1 score increased by 2.9%.

Download Full-text

A Human-AI Loop Approach for Joint Keyword Discovery and Expectation Estimation in Micropost Event Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5626 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2451-2458

Author(s):

Akansha Bhardwaj ◽

Jie Yang ◽

Philippe Cudré-Mauroux

Keyword(s):

Machine Learning ◽

Real World ◽

Event Detection ◽

State Of The Art ◽

Regularization Parameter ◽

Learning Models ◽

Training Process ◽

Model Training ◽

Real World Datasets ◽

Machine Learning Models

Microblogging platforms such as Twitter are increasingly being used in event detection. Existing approaches mainly use machine learning models and rely on event-related keywords to collect the data for model training. These approaches make strong assumptions on the distribution of the relevant microposts containing the keyword – referred to as the expectation of the distribution – and use it as a posterior regularization parameter during model training. Such approaches are, however, limited as they fail to reliably estimate the informativeness of a keyword and its expectation for model training. This paper introduces a Human-AI loop approach to jointly discover informative keywords for model training while estimating their expectation. Our approach iteratively leverages the crowd to estimate both keyword-specific expectation and the disagreement between the crowd and the model in order to discover new keywords that are most beneficial for model training. These keywords and their expectation not only improve the resulting performance but also make the model training process more transparent. We empirically demonstrate the merits of our approach, both in terms of accuracy and interpretability, on multiple real-world datasets and show that our approach improves the state of the art by 24.3%.

Download Full-text

Neural Comb Filtering using Sliding Window Attention Network for Speech Enhancement

10.36227/techrxiv.15051972.v1 ◽

2021 ◽

Author(s):

Venkatesh Parvathala ◽

Sri Rama Murty Kodukula ◽

Siva Ganesh Andhavarapu

Keyword(s):

Speech Enhancement ◽

Network Architecture ◽

Deep Neural Network ◽

State Of The Art ◽

Sliding Window ◽

Comb Filter ◽

Attention Network ◽

Network Parameters ◽

Comparable Performance ◽

Speech Spectrum

<div>In this paper, we demonstrate the significance of restoring harmonics of the fundamental frequency (pitch) in deep neural network (DNN) based speech enhancement. We propose a sliding-window attention network to regress the spectral magnitude mask (SMM) from the noisy speech signal. Even though the network parameters can be estimated by minimizing the mask loss, it does not restore the pitch harmonics, especially at higher frequencies. In this paper, we propose to restore the pitch harmonics in the spectral domain by minimizing cepstral loss around the pitch peak. The network parameters are estimated using a combination of the mask loss and cepstral loss. The proposed network architecture functions like an adaptive comb filter on voiced segments, and emphasizes the pitch harmonics in the speech spectrum. The proposed approach achieves comparable performance with the state-of-the-art methods with much lesser computational complexity.</div>

Download Full-text

Age Estimation Using Expectation of Label Distribution Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/99 ◽

2018 ◽

Cited By ~ 18

Author(s):

Bin-Bin Gao ◽

Hong-Yu Zhou ◽

Jianxin Wu ◽

Xin Geng

Keyword(s):

Age Estimation ◽

Network Architecture ◽

State Of The Art ◽

Age Distribution ◽

Model Parameters ◽

Unified Framework ◽

Disk Storage ◽

Storage Overhead ◽

Evaluation Metric ◽

Label Distribution

Age estimation performance has been greatly improved by using convolutional neural network. However, existing methods have an inconsistency between the training objectives and evaluation metric, so they may be suboptimal. In addition, these methods always adopt image classification or face recognition models with a large amount of parameters, which bring expensive computation cost and storage overhead. To alleviate these issues, we design a lightweight network architecture and propose a unified framework which can jointly learn age distribution and regress age. The effectiveness of our approach has been demonstrated on apparent and real age estimation tasks. Our method achieves new state-of-the-art results using the single model with 36$\times$ fewer parameters and 2.6$\times$ reduction in inference time. Moreover, our method can achieve comparable results as the state-of-the-art even though model parameters are further reduced to 0.9M~(3.8MB disk storage). We also analyze that Ranking methods are implicitly learning label distributions.

Download Full-text

Neural Comb Filtering using Sliding Window Attention Network for Speech Enhancement

10.36227/techrxiv.15051972 ◽

2021 ◽

Author(s):

Venkatesh Parvathala ◽

Sri Rama Murty Kodukula ◽

Siva Ganesh Andhavarapu

Keyword(s):

Speech Enhancement ◽

Network Architecture ◽

Deep Neural Network ◽

State Of The Art ◽

Sliding Window ◽

Comb Filter ◽

Attention Network ◽

Network Parameters ◽

Comparable Performance ◽

Speech Spectrum

Download Full-text

HDRANet: Hybrid Dilated Residual Attention Network for SAR Image Despeckling

Remote Sensing ◽

10.3390/rs11242921 ◽

2019 ◽

Vol 11 (24) ◽

pp. 2921 ◽

Cited By ~ 1

Author(s):

Jingyu Li ◽

Ying Li ◽

Yayuan Xiao ◽

Yunpeng Bai

Keyword(s):

Network Architecture ◽

State Of The Art ◽

Speckle Noise ◽

Superior Performance ◽

Synthetic Aperture ◽

Sar Images ◽

Attention Network ◽

Quantitative Metrics ◽

Proposed Model ◽

And Performance

In order to remove speckle noise from original synthetic aperture radar (SAR) images effectively and efficiently, this paper proposes a hybrid dilated residual attention network (HDRANet) with residual learning for SAR despeckling. Firstly, HDRANet employs the hybrid dilated convolution (HDC) in lightweight network architecture to enlarge the receptive field and aggregate global information. Then, a simple yet effective attention module, convolutional block attention module (CBAM), is integrated into the proposed model to constitute a residual HDC attention block through skip connection, which further enhances representation power and performance of the model. Extensive experimental results on the synthetic and real SAR images demonstrate the superior performance of HDRANet over the state-of-the-art methods in terms of quantitative metrics and visual quality.

Download Full-text

DASOT: A Unified Framework Integrating Data Association and Single Object Tracking for Online Multi-Object Tracking

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6694 ◽

2020 ◽

Vol 34 (07) ◽

pp. 10672-10679

Author(s):

Qi Chu ◽

Wanli Ouyang ◽

Bin Liu ◽

Feng Zhu ◽

Nenghai Yu

Keyword(s):

Object Tracking ◽

Network Architecture ◽

State Of The Art ◽

Data Association ◽

Computational Cost ◽

Single Object ◽

Unified Framework ◽

Convolutional Network ◽

Comparable Performance ◽

Feature Sharing

In this paper, we propose an online multi-object tracking (MOT) approach that integrates data association and single object tracking (SOT) with a unified convolutional network (ConvNet), named DASOTNet. The intuition behind integrating data association and SOT is that they can complement each other. Following Siamese network architecture, DASOTNet consists of the shared feature ConvNet, the data association branch and the SOT branch. Data association is treated as a special re-identification task and solved by learning discriminative features for different targets in the data association branch. To handle the problem that the computational cost of SOT grows intolerably as the number of tracked objects increases, we propose an efficient two-stage tracking method in the SOT branch, which utilizes the merits of correlation features and can simultaneously track all the existing targets within one forward propagation. With feature sharing and the interaction between them, data association branch and the SOT branch learn to better complement each other. Using a multi-task objective, the whole network can be trained end-to-end. Compared with state-of-the-art online MOT methods, our method is much faster while maintaining a comparable performance.

Download Full-text

Do You Want to See the Tree? Ignore the Forest

Experimental Psychology (formerly Zeitschrift für Experimentelle Psychologie) ◽

10.1027/1618-3169/a000240 ◽

2014 ◽

Vol 61 (3) ◽

pp. 205-214 ◽

Cited By ~ 12

Author(s):

Nicolas Poirel ◽

Claire Sara Krakowski ◽

Sabrina Sayah ◽

Arlette Pineau ◽

Olivier Houdé ◽

...

Keyword(s):

Visual Recognition ◽

Local Level ◽

Local Information ◽

Visual Environment ◽

Global Information ◽

Focus Attention ◽

Global Level ◽

Hierarchical Stimuli ◽

Priming Paradigm ◽

The One

The visual environment consists of global structures (e.g., a forest) made up of local parts (e.g., trees). When compound stimuli are presented (e.g., large global letters composed of arrangements of small local letters), the global unattended information slows responses to local targets. Using a negative priming paradigm, we investigated whether inhibition is required to process hierarchical stimuli when information at the local level is in conflict with the one at the global level. The results show that when local and global information is in conflict, global information must be inhibited to process local information, but that the reverse is not true. This finding has potential direct implications for brain models of visual recognition, by suggesting that when local information is conflicting with global information, inhibitory control reduces feedback activity from global information (e.g., inhibits the forest) which allows the visual system to process local information (e.g., to focus attention on a particular tree).

Download Full-text