Using High-Level Semantic Features in Video Retrieval

The Use and Utility of High-Level Semantic Features in Video Retrieval

Lecture Notes in Computer Science - Image and Video Retrieval ◽

10.1007/11526346_17 ◽

2005 ◽

pp. 134-144 ◽

Cited By ~ 15

Author(s):

Michael G. Christel ◽

Alexander G. Hauptmann

Keyword(s):

Video Retrieval ◽

Semantic Features ◽

High Level

Download Full-text

An Efficient Module for Instance Segmentation Based on Multi-Level Features and Attention Mechanisms

Applied Sciences ◽

10.3390/app11030968 ◽

2021 ◽

Vol 11 (3) ◽

pp. 968

Author(s):

Yingchun Sun ◽

Wang Gao ◽

Shuguo Pan ◽

Tao Zhao ◽

Yahui Peng

Keyword(s):

Feature Extraction ◽

Spatial Structure ◽

Semantic Feature ◽

Semantic Features ◽

Segmentation Method ◽

Spatial Dimensions ◽

Feature Pyramid ◽

Multi Level ◽

High Level ◽

Instance Segmentation

Recently, multi-level feature networks have been extensively used in instance segmentation. However, because not all features are beneficial to instance segmentation tasks, the performance of networks cannot be adequately improved by synthesizing multi-level convolutional features indiscriminately. In order to solve the problem, an attention-based feature pyramid module (AFPM) is proposed, which integrates the attention mechanism on the basis of a multi-level feature pyramid network to efficiently and pertinently extract the high-level semantic features and low-level spatial structure features; for instance, segmentation. Firstly, we adopt a convolutional block attention module (CBAM) into feature extraction, and sequentially generate attention maps which focus on instance-related features along the channel and spatial dimensions. Secondly, we build inter-dimensional dependencies through a convolutional triplet attention module (CTAM) in lateral attention connections, which is used to propagate a helpful semantic feature map and filter redundant informative features irrelevant to instance objects. Finally, we construct branches for feature enhancement to strengthen detailed information to boost the entire feature hierarchy of the network. The experimental results on the Cityscapes dataset manifest that the proposed module outperforms other excellent methods under different evaluation metrics and effectively upgrades the performance of the instance segmentation method.

Download Full-text

Bimodal fusion of low-level visual features and high-level semantic features for near-duplicate video clip detection

Signal Processing Image Communication ◽

10.1016/j.image.2011.04.001 ◽

2011 ◽

Vol 26 (10) ◽

pp. 612-627 ◽

Cited By ~ 2

Author(s):

Hyun-seok Min ◽

Jae Young Choi ◽

Wesley De Neve ◽

Yong Man Ro

Keyword(s):

Video Clip ◽

Visual Features ◽

Semantic Features ◽

Low Level ◽

High Level ◽

Duplicate Video

Download Full-text

A feature fusion deep-projection convolution neural network for vehicle detection in aerial images

PLoS ONE ◽

10.1371/journal.pone.0250782 ◽

2021 ◽

Vol 16 (5) ◽

pp. e0250782

Author(s):

Bin Wang ◽

Bin Xu

Keyword(s):

Neural Network ◽

Feature Fusion ◽

Rapid Development ◽

Vehicle Detection ◽

Convolution Neural Network ◽

Aerial Images ◽

Semantic Features ◽

General Object ◽

High Level ◽

The Impact

With the rapid development of Unmanned Aerial Vehicles, vehicle detection in aerial images plays an important role in different applications. Comparing with general object detection problems, vehicle detection in aerial images is still a challenging research topic since it is plagued by various unique factors, e.g. different camera angle, small vehicle size and complex background. In this paper, a Feature Fusion Deep-Projection Convolution Neural Network is proposed to enhance the ability to detect small vehicles in aerial images. The backbone of the proposed framework utilizes a novel residual block named stepwise res-block to explore high-level semantic features as well as conserve low-level detail features at the same time. A specially designed feature fusion module is adopted in the proposed framework to further balance the features obtained from different levels of the backbone. A deep-projection deconvolution module is used to minimize the impact of the information contamination introduced by down-sampling/up-sampling processes. The proposed framework has been evaluated by UCAS-AOD, VEDAI, and DOTA datasets. According to the evaluation results, the proposed framework outperforms other state-of-the-art vehicle detection algorithms for aerial images.

Download Full-text

Interpretable Aspect-Aware Capsule Network for Peer Review Based Citation Count Prediction

ACM Transactions on Information Systems ◽

10.1145/3466640 ◽

2022 ◽

Vol 40 (1) ◽

pp. 1-29

Author(s):

Siqing Li ◽

Yaliang Li ◽

Wayne Xin Zhao ◽

Bolin Ding ◽

Ji-Rong Wen

Keyword(s):

Peer Review ◽

Prediction Models ◽

Citation Count ◽

Specific Aspect ◽

Semantic Features ◽

Predictive Capacity ◽

Topic Distribution ◽

Real World Datasets ◽

Data Signal ◽

High Level

Citation count prediction is an important task for estimating the future impact of research papers. Most of the existing works utilize the information extracted from the paper itself. In this article, we focus on how to utilize another kind of useful data signal (i.e., peer review text) to improve both the performance and interpretability of the prediction models. Specially, we propose a novel aspect-aware capsule network for citation count prediction based on review text. It contains two major capsule layers, namely the feature capsule layer and the aspect capsule layer, with two different routing approaches, respectively. Feature capsules encode the local semantics from review sentences as the input of aspect capsule layer, whereas aspect capsules aim to capture high-level semantic features that will be served as final representations for prediction. Besides the predictive capacity, we also enhance the model interpretability with two strategies. First, we use the topic distribution of the review text to guide the learning of aspect capsules so that each aspect capsule can represent a specific aspect in the review. Then, we use the learned aspect capsules to generate readable text for explaining the predicted citation count. Extensive experiments on two real-world datasets have demonstrated the effectiveness of the proposed model in both performance and interpretability.

Download Full-text

Temporal-Based Video Event Detection and Retrieval

International Journal of Organizational and Collective Intelligence ◽

10.4018/ijoci.2012100103 ◽

2012 ◽

Vol 3 (4) ◽

pp. 39-51

Author(s):

Min Chen

Keyword(s):

Event Detection ◽

Video Retrieval ◽

Class Imbalance ◽

Video Data ◽

Temporal Information ◽

Temporal Association ◽

Association Mining ◽

Video Content ◽

Video Content Analysis ◽

High Level

The fast proliferation of video data archives has increased the need for automatic video content analysis and semantic video retrieval. Since temporal information is critical in conveying video content, in this chapter, an effective temporal-based event detection framework is proposed to support high-level video indexing and retrieval. The core is a temporal association mining process that systematically captures characteristic temporal patterns to help identify and define interesting events. This framework effectively tackles the challenges caused by loose video structure and class imbalance issues. One of the unique characteristics of this framework is that it offers strong generality and extensibility with the capability of exploring representative event patterns with little human interference. The temporal information and event detection results can then be input into our proposed distributed video retrieval system to support the high-level semantic querying, selective video browsing and event-based video retrieval.

Download Full-text

Temporal-Based Video Event Detection and Retrieval

Machine Learning Techniques for Adaptive Multimedia Retrieval ◽

10.4018/978-1-61692-859-9.ch010 ◽

2011 ◽

pp. 214-227

Author(s):

Min Chen

Keyword(s):

Event Detection ◽

Video Retrieval ◽

Class Imbalance ◽

Video Data ◽

Temporal Information ◽

Temporal Association ◽

Association Mining ◽

Video Content ◽

Data Archives ◽

High Level

The fast proliferation of video data archives has increased the need for automatic video content analysis and semantic video retrieval. Since temporal information is critical in conveying video content, in this chapter, an effective temporal-based event detection framework is proposed to support high-level video indexing and retrieval. The core is a temporal association mining process that systematically captures characteristic temporal patterns to help identify and define interesting events. This framework effectively tackles the challenges caused by loose video structure and class imbalance issues. One of the unique characteristics of this framework is that it offers strong generality and extensibility with the capability of exploring representative event patterns with little human interference. The temporal information and event detection results can then be input into our proposed distributed video retrieval system to support the high-level semantic querying, selective video browsing and event-based video retrieval.

Download Full-text

Object Detection Based on Region Decomposition and Assembly

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018094 ◽

2019 ◽

Vol 33 ◽

pp. 8094-8101 ◽

Cited By ~ 4

Author(s):

Seung-Hwan Bae

Keyword(s):

Neural Networks ◽

Object Detection ◽

Performance Improvement ◽

Semantic Relations ◽

Detection Accuracy ◽

Semantic Features ◽

Multi Scale ◽

Object Proposals ◽

Object Region ◽

High Level

Region-based object detection infers object regions for one or more categories in an image. Due to the recent advances in deep learning and region proposal methods, object detectors based on convolutional neural networks (CNNs) have been flourishing and provided the promising detection results. However, the detection accuracy is degraded often because of the low discriminability of object CNN features caused by occlusions and inaccurate region proposals. In this paper, we therefore propose a region decomposition and assembly detector (R-DAD) for more accurate object detection.In the proposed R-DAD, we first decompose an object region into multiple small regions. To capture an entire appearance and part details of the object jointly, we extract CNN features within the whole object region and decomposed regions. We then learn the semantic relations between the object and its parts by combining the multi-region features stage by stage with region assembly blocks, and use the combined and high-level semantic features for the object classification and localization. In addition, for more accurate region proposals, we propose a multi-scale proposal layer that can generate object proposals of various scales. We integrate the R-DAD into several feature extractors, and prove the distinct performance improvement on PASCAL07/12 and MSCOCO18 compared to the recent convolutional detectors.

Download Full-text

MODELING SEMANTIC CONCEPTS AND USER PREFERENCES IN CONTENT-BASED VIDEO RETRIEVAL

International Journal of Semantic Computing ◽

10.1142/s1793351x07000159 ◽

2007 ◽

Vol 01 (03) ◽

pp. 377-402 ◽

Cited By ~ 7

Author(s):

SHU-CHING CHEN ◽

NA ZHAO ◽

MEI-LING SHYU

Keyword(s):

Video Retrieval ◽

User Preferences ◽

Retrieval Performance ◽

Video Database ◽

Database Modeling ◽

User Perceptions ◽

Audio Features ◽

Semantic Concepts ◽

The Individual ◽

High Level

In this paper, a user-centered framework is proposed for video database modeling and retrieval to provide appealing multimedia experiences on the content-based video queries. By incorporating the Hierarchical Markov Model Mediator (HMMM) mechanism, the source videos, segmented video shots, visual/audio features, semantic events, and high-level user perceptions are seamlessly integrated in a video database. With the hierarchical and stochastic design for video databases and semantic concept modeling, the proposed framework supports the retrieval for not only single events but also temporal sequences with multiple events. Additionally, an innovative method is proposed to capture the individual user's preferences by considering both the low-level features and the semantic concepts. The retrieval and ranking of video events and the temporal patterns can be updated dynamically online to satisfy individual user's interest and information requirements. Moreover, the users' feedbacks are efficiently accumulated for the offline system training process such that the overall retrieval performance can be enhanced periodically and continuously. For the evaluation of the proposed approach, a soccer video retrieval system is developed, presented, and tested to demonstrate the overall retrieval performance improvement achieved by modeling and capturing the user preferences.

Download Full-text

An image caption model incorporating high-level semantic features

Eleventh International Conference on Digital Image Processing (ICDIP 2019) ◽

10.1117/12.2540579 ◽

2019 ◽

Author(s):

Zhiwang Luo ◽

Jiwei Hu ◽

Quan Liu ◽

Jiamei Deng

Keyword(s):

Semantic Features ◽

High Level ◽

Image Caption

Download Full-text