Efficient Video Classification Using Fewer Frames

Watching a Small Portion could be as Good as Watching All: Towards Efficient Video Classification

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/98 ◽

2018 ◽

Cited By ~ 10

Author(s):

Hehe Fan ◽

Zhongwen Xu ◽

Linchao Zhu ◽

Chenggang Yan ◽

Jianjun Ge ◽

...

Keyword(s):

Large Scale ◽

Sampling Rate ◽

Computational Cost ◽

Confidence Score ◽

Video Content ◽

Video Classification ◽

Video Frames ◽

Similar Accuracy ◽

Efficient Video

We aim to significantly reduce the computational cost for classification of temporally untrimmed videos while retaining similar accuracy. Existing video classification methods sample frames with a predefined frequency over entire video. Differently, we propose an end-to-end deep reinforcement approach which enables an agent to classify videos by watching a very small portion of frames like what we do. We make two main contributions. First, information is not equally distributed in video frames along time. An agent needs to watch more carefully when a clip is informative and skip the frames if they are redundant or irrelevant. The proposed approach enables the agent to adapt sampling rate to video content and skip most of the frames without the loss of information. Second, in order to have a confident decision, the number of frames that should be watched by an agent varies greatly from one video to another. We incorporate an adaptive stop network to measure confidence score and generate timely trigger to stop the agent watching videos, which improves efficiency without loss of accuracy. Our approach reduces the computational cost significantly for the large-scale YouTube-8M dataset, while the accuracy remains the same.

Download Full-text

Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification

IEEE Access ◽

10.1109/access.2021.3132916 ◽

2021 ◽

pp. 1-1

Author(s):

Youngwan Lee ◽

Hyung-Il Kim ◽

Kimin Yun ◽

Jinyoung Moon

Keyword(s):

Temporal Aggregation ◽

Video Classification ◽

Efficient Video

Download Full-text

An efficient video classification system based on HMM in compressed domain

Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint ◽

10.1109/icics.2003.1292726 ◽

2004 ◽

Cited By ~ 1

Author(s):

Yi Haoran ◽

D. Rajan ◽

Chia Liang-Tien

Keyword(s):

Classification System ◽

Compressed Domain ◽

Video Classification ◽

Efficient Video

Download Full-text

FASTER Recurrent Networks for Efficient Video Classification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.7012 ◽

2020 ◽

Vol 34 (07) ◽

pp. 13098-13105 ◽

Cited By ~ 2

Author(s):

Linchao Zhu ◽

Du Tran ◽

Laura Sevilla-Lara ◽

Yi Yang ◽

Matt Feiszli ◽

...

Keyword(s):

Video Sequence ◽

State Of The Art ◽

Temporal Structure ◽

Computational Cost ◽

Recurrent Network ◽

Recurrent Networks ◽

Video Classification ◽

Spatio Temporal ◽

Temporal Redundancy ◽

Efficient Video

Typical video classification methods often divide a video into short clips, do inference on each clip independently, then aggregate the clip-level predictions to generate the video-level results. However, processing visually similar clips independently ignores the temporal structure of the video sequence, and increases the computational cost at inference time. In this paper, we propose a novel framework named FASTER, i.e., Feature Aggregation for Spatio-TEmporal Redundancy. FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities. The FASTER framework can integrate high quality representations from expensive models to capture subtle motion information and lightweight representations from cheap models to cover scene changes in the video. A new recurrent network (i.e., FAST-GRU) is designed to aggregate the mixture of different representations. Compared with existing approaches, FASTER can reduce the FLOPs by over 10× while maintaining the state-of-the-art accuracy across popular datasets, such as Kinetics, UCF-101 and HMDB-51.

Download Full-text

Space-Time Separate Modeling for Efficient Video Classification

Journal of Physics Conference Series ◽

10.1088/1742-6596/2024/1/012063 ◽

2021 ◽

Vol 2024 (1) ◽

pp. 012063

Author(s):

Pei Cao ◽

Shuo Wang ◽

Jinmeng Wu ◽

Yanbin Hao

Keyword(s):

Space Time ◽

Video Classification ◽

Efficient Video

Download Full-text

Selection of the efficient video data processing strategy based on the analysis of statistical digital images characteristics

Scientific journal of the Ternopil national technical university ◽

10.33108/visnyk_tntu2018.03.107 ◽

2018 ◽

Vol 91 (3) ◽

pp. 107-114

Author(s):

Mykhailo Palamar ◽

Myroslava Yavorska ◽

Mykhailo Strembitskyi ◽

Volodymyr Strembitskyi

Keyword(s):

Data Processing ◽

Digital Images ◽

Video Data ◽

Processing Strategy ◽

Efficient Video ◽

Selection Of

Download Full-text

A Robust and Efficient Video Moving Object Detection and Tracking Algorithm

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2009.01055 ◽

2009 ◽

Vol 35 (8) ◽

pp. 1055-1062

Author(s):

Shao-Hua LIU ◽

Mao-Jun ZHANG ◽

Zhi-Hui XIONG ◽

Wang CHEN

Keyword(s):

Object Detection ◽

Moving Object Detection ◽

Moving Object ◽

Tracking Algorithm ◽

Object Detection And Tracking ◽

Detection And Tracking ◽

Efficient Video

Download Full-text

Efficient Video Compression Algorithm in Secure Image Communication

SSRN Electronic Journal ◽

10.2139/ssrn.3435450 ◽

2019 ◽

Author(s):

Bernatin T ◽

Godwin premi M.S. ◽

Narmadha R ◽

Sahaya Anselin Nisha A

Keyword(s):

Video Compression ◽

Compression Algorithm ◽

Image Communication ◽

Efficient Video

Download Full-text

A lazy feature selection method for multi-label classification

Intelligent Data Analysis ◽

10.3233/ida-194878 ◽

2021 ◽

Vol 25 (1) ◽

pp. 21-34

Author(s):

Rafael B. Pereira ◽

Alexandre Plastino ◽

Bianca Zadrozny ◽

Luiz H.C. Merschmann

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Feature Selection Method ◽

Selection Method ◽

Video Classification ◽

Classification Problems ◽

Class Label ◽

New Feature ◽

Feature Selection Techniques ◽

Biomolecular Analysis

In many important application domains, such as text categorization, biomolecular analysis, scene or video classification and medical diagnosis, instances are naturally associated with more than one class label, giving rise to multi-label classification problems. This has led, in recent years, to a substantial amount of research in multi-label classification. More specifically, feature selection methods have been developed to allow the identification of relevant and informative features for multi-label classification. This work presents a new feature selection method based on the lazy feature selection paradigm and specific for the multi-label context. Experimental results show that the proposed technique is competitive when compared to multi-label feature selection techniques currently used in the literature, and is clearly more scalable, in a scenario where there is an increasing amount of data.

Download Full-text

Learning Dual-Pooling Graph Neural Networks for Few-shot Video Classification

IEEE Transactions on Multimedia ◽

10.1109/tmm.2020.3039329 ◽

2020 ◽

pp. 1-1

Author(s):

Yufan Hu ◽

Junyu Gao ◽

Changsheng Xu

Keyword(s):

Neural Networks ◽

Video Classification ◽

Graph Neural Networks

Download Full-text