Topic-based Video Analysis

Ratnabali Pal; Arif Ahmed Sekh; Debi Prosad Dogra; Samarjit Kar; Partha Pratim Roy; Dilip K. Prasad

doi:10.1145/3459089

Topic-based Video Analysis

ACM Computing Surveys ◽

10.1145/3459089 ◽

2021 ◽

Vol 54 (6) ◽

pp. 1-34

Author(s):

Ratnabali Pal ◽

Arif Ahmed Sekh ◽

Debi Prosad Dogra ◽

Samarjit Kar ◽

Partha Pratim Roy ◽

...

Keyword(s):

Computer Vision ◽

Video Analysis ◽

Visual Surveillance ◽

Video Data ◽

Camera Motion ◽

Topic Modelling ◽

Surveillance Video ◽

Analysis Computer ◽

Manual Processing ◽

Spatio Temporal

Manual processing of a large volume of video data captured through closed-circuit television is challenging due to various reasons. First, manual analysis is highly time-consuming. Moreover, as surveillance videos are recorded in dynamic conditions such as in the presence of camera motion, varying illumination, or occlusion, conventional supervised learning may not work always. Thus, computer vision-based automatic surveillance scene analysis is carried out in unsupervised ways. Topic modelling is one of the emerging fields used in unsupervised information processing. Topic modelling is used in text analysis, computer vision applications, and other areas involving spatio-temporal data. In this article, we discuss the scope, variations, and applications of topic modelling, particularly focusing on surveillance video analysis. We have provided a methodological survey on existing topic models, their features, underlying representations, characterization, and applications in visual surveillance’s perspective. Important research papers related to topic modelling in visual surveillance have been summarized and critically analyzed in this article.

Download Full-text

Automated detection of grade-crossing-trespassing near misses based on computer vision analysis of surveillance video data

Safety Science ◽

10.1016/j.ssci.2017.11.023 ◽

2018 ◽

Vol 110 ◽

pp. 276-285 ◽

Cited By ~ 4

Author(s):

Zhipeng Zhang ◽

Chintan Trivedi ◽

Xiang Liu

Keyword(s):

Computer Vision ◽

Automated Detection ◽

Video Data ◽

Surveillance Video ◽

Near Misses ◽

Grade Crossing

Download Full-text

Kinematic Analysis of Lower Limb Joint Asymmetry during Gait in People with Multiple Sclerosis

Symmetry ◽

10.3390/sym13040598 ◽

2021 ◽

Vol 13 (4) ◽

pp. 598

Author(s):

Massimiliano Pau ◽

Bruno Leban ◽

Michela Deidda ◽

Federica Putzolu ◽

Micaela Porta ◽

...

Keyword(s):

Multiple Sclerosis ◽

Lower Limb ◽

Sagittal Plane ◽

Cross Sectional Study ◽

Camera Motion ◽

Cross Sectional ◽

Temporal Parameters ◽

Wide Range ◽

Edss Score ◽

Spatio Temporal

The majority of people with Multiple Sclerosis (pwMS), report lower limb motor dysfunctions, which may relevantly affect postural control, gait and a wide range of activities of daily living. While it is quite common to observe a different impact of the disease on the two limbs (i.e., one of them is more affected), less clear are the effects of such asymmetry on gait performance. The present retrospective cross-sectional study aimed to characterize the magnitude of interlimb asymmetry in pwMS, particularly as regards the joint kinematics, using parameters derived from angle-angle diagrams. To this end, we analyzed gait patterns of 101 pwMS (55 women, 46 men, mean age 46.3, average Expanded Disability Status Scale (EDSS) score 3.5, range 1–6.5) and 81 unaffected individuals age- and sex-matched who underwent 3D computerized gait analysis carried out using an eight-camera motion capture system. Spatio-temporal parameters and kinematics in the sagittal plane at hip, knee and ankle joints were considered for the analysis. The angular trends of left and right sides were processed to build synchronized angle–angle diagrams (cyclograms) for each joint, and symmetry was assessed by computing several geometrical features such as area, orientation and Trend Symmetry. Based on cyclogram orientation and Trend Symmetry, the results show that pwMS exhibit significantly greater asymmetry in all three joints with respect to unaffected individuals. In particular, orientation values were as follows: 5.1 of pwMS vs. 1.6 of unaffected individuals at hip joint, 7.0 vs. 1.5 at knee and 6.4 vs. 3.0 at ankle (p < 0.001 in all cases), while for Trend Symmetry we obtained at hip 1.7 of pwMS vs. 0.3 of unaffected individuals, 4.2 vs. 0.5 at knee and 8.5 vs. 1.5 at ankle (p < 0.001 in all cases). Moreover, the same parameters were sensitive enough to discriminate individuals of different disability levels. With few exceptions, all the calculated symmetry parameters were found significantly correlated with the main spatio-temporal parameters of gait and the EDSS score. In particular, large correlations were detected between Trend Symmetry and gait speed (with rho values in the range of –0.58 to –0.63 depending on the considered joint, p < 0.001) and between Trend Symmetry and EDSS score (rho = 0.62 to 0.69, p < 0.001). Such results suggest not only that MS is associated with significantly marked interlimb asymmetry during gait but also that such asymmetry worsens as the disease progresses and that it has a relevant impact on gait performances.

Download Full-text

Natural Language Description of Videos for Smart Surveillance

Applied Sciences ◽

10.3390/app11093730 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3730

Author(s):

Aniqa Dilawari ◽

Muhammad Usman Ghani Khan ◽

Yasser D. Al-Otaibi ◽

Zahoor-ur Rehman ◽

Atta-ur Rahman ◽

...

Keyword(s):

Natural Language ◽

Feature Recognition ◽

Scene Recognition ◽

Video Data ◽

Surveillance Video ◽

Video Footage ◽

Parallel Pipeline ◽

September 11 Attacks ◽

Description Framework ◽

High Level

After the September 11 attacks, security and surveillance measures have changed across the globe. Now, surveillance cameras are installed almost everywhere to monitor video footage. Though quite handy, these cameras produce videos in a massive size and volume. The major challenge faced by security agencies is the effort of analyzing the surveillance video data collected and generated daily. Problems related to these videos are twofold: (1) understanding the contents of video streams, and (2) conversion of the video contents to condensed formats, such as textual interpretations and summaries, to save storage space. In this paper, we have proposed a video description framework on a surveillance dataset. This framework is based on the multitask learning of high-level features (HLFs) using a convolutional neural network (CNN) and natural language generation (NLG) through bidirectional recurrent networks. For each specific task, a parallel pipeline is derived from the base visual geometry group (VGG)-16 model. Tasks include scene recognition, action recognition, object recognition and human face specific feature recognition. Experimental results on the TRECViD, UET Video Surveillance (UETVS) and AGRIINTRUSION datasets depict that the model outperforms state-of-the-art methods by a METEOR (Metric for Evaluation of Translation with Explicit ORdering) score of 33.9%, 34.3%, and 31.2%, respectively. Our results show that our framework has distinct advantages over traditional rule-based models for the recognition and generation of natural language descriptions.

Download Full-text

Differentiating Laparoscopic Skills of Trainees with Computer Vision Based Metrics

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1071181321651263 ◽

2021 ◽

Vol 65 (1) ◽

pp. 304-308

Author(s):

Shiyu Deng ◽

Chaitanya Kulkarni ◽

Tianzi Wang ◽

Jacob Hartman-Kenzler ◽

Laura E. Barnes ◽

...

Keyword(s):

Computer Vision ◽

Transfer Task ◽

High Sensitivity ◽

Video Data ◽

Machine Learning Algorithms ◽

Fixation Rate ◽

Skill Levels ◽

Stationary Target ◽

Practice Trials ◽

Context Dependent

Context dependent gaze metrics, derived from eye movements explicitly associated with how a task is being performed, are particularly useful for formative assessment that includes feedback on specific behavioral adjustments for skill acquisitions. In laparoscopic surgery, context dependent gaze metrics are under investigated and commonly derived by either qualitatively inspecting the videos frame by frame or mapping the fixations onto a static surgical task field. This study collected eye-tracking and video data from 13 trainees practicing the peg transfer task. Machine learning algorithms in computer vision were employed to derive metrics of tool speed, fixation rate on (moving or stationary) target objects, and fixation rate on tool-object combination. Preliminary results from a clustering analysis on the measurements from 499 practice trials indicated that the metrics were able to differentiate three skill levels amongst the trainees, suggesting high sensitivity and potential of context dependent gaze metrics for surgical assessment.

Download Full-text

IDENTIFYING OVERLAPPED OBJECTS FOR VIDEO INDEXING AND MODELING IN MULTIMEDIA DATABASE SYSTEMS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213001000738 ◽

2001 ◽

Vol 10 (04) ◽

pp. 715-734 ◽

Cited By ~ 37

Author(s):

SHU-CHING CHEN ◽

MEI-LING SHYU ◽

CHENGCUI ZHANG ◽

R. L. KASHYAP

Keyword(s):

Spatial Information ◽

Video Segmentation ◽

Database Systems ◽

Video Indexing ◽

Video Data ◽

Multimedia Database ◽

Current Frame ◽

Data Indexing ◽

Spatio Temporal ◽

Input Strings

The identification of the overlapped objects is a great challenge in object tracking and video data indexing. For this purpose, a backtrack-chain-updation split algorithm is proposed to assist an unsupervised video segmentation method called the "simultaneous partition and class parameter estimation" (SPCPE) algorithm to identify the overlapped objects in the video sequence. The backtrack-chain-updation split algorithm can identify the split segment (object) and use the information in the current frame to update the previous frames in a backtrack-chain manner. The split algorithm provides more accurate temporal and spatial information of the semantic objects so that the semantic objects can be indexed and modeled by multimedia input strings and the multimedia augmented transition network (MATN) model. The MATN model is based on the ATN model that has been used in artificial intelligence (AI) areas for natural language understanding systems, and its inputs are modeled by the multimedia input strings. In this paper, we will show that the SPCPE algorithm together with the backtrack-chain-updation split algorithm can significantly enhance the efficiency of spatio-temporal video indexing by improving the accuracy of multimedia database queries related to semantic objects.

Download Full-text

Low-Rank Representation with Contextual Regularization for Moving Object Detection in Big Surveillance Video Data

2017 IEEE Third International Conference on Multimedia Big Data (BigMM) ◽

10.1109/bigmm.2017.37 ◽

2017 ◽

Cited By ~ 1

Author(s):

Bo-Hao Chen ◽

Ling-Feng Shi ◽

Xiao Ke

Keyword(s):

Object Detection ◽

Moving Object Detection ◽

Video Data ◽

Moving Object ◽

Low Rank ◽

Surveillance Video ◽

Low Rank Representation

Download Full-text

The Use of Closed-Circuit Television and Video in Suicide Prevention: Narrative Review and Future Directions

JMIR Mental Health ◽

10.2196/27663 ◽

2021 ◽

Vol 8 (5) ◽

pp. e27663

Author(s):

Sandersan Onie ◽

Xun Li ◽

Morgan Liang ◽

Arcot Sowmya ◽

Mark Erik Larsen

Keyword(s):

Computer Vision ◽

Early Intervention ◽

Suicide Attempt ◽

Detection System ◽

Automated Detection ◽

Video Data ◽

Narrative Review ◽

Closed Circuit ◽

Closed Circuit Television ◽

Area Of Interest

Background Suicide is a recognized public health issue, with approximately 800,000 people dying by suicide each year. Among the different technologies used in suicide research, closed-circuit television (CCTV) and video have been used for a wide array of applications, including assessing crisis behaviors at metro stations, and using computer vision to identify a suicide attempt in progress. However, there has been no review of suicide research and interventions using CCTV and video. Objective The objective of this study was to review the literature to understand how CCTV and video data have been used in understanding and preventing suicide. Furthermore, to more fully capture progress in the field, we report on an ongoing study to respond to an identified gap in the narrative review, by using a computer vision–based system to identify behaviors prior to a suicide attempt. Methods We conducted a search using the keywords “suicide,” “cctv,” and “video” on PubMed, Inspec, and Web of Science. We included any studies which used CCTV or video footage to understand or prevent suicide. If a study fell into our area of interest, we included it regardless of the quality as our goal was to understand the scope of how CCTV and video had been used rather than quantify any specific effect size, but we noted the shortcomings in their design and analyses when discussing the studies. Results The review found that CCTV and video have primarily been used in 3 ways: (1) to identify risk factors for suicide (eg, inferring depression from facial expressions), (2) understanding suicide after an attempt (eg, forensic applications), and (3) as part of an intervention (eg, using computer vision and automated systems to identify if a suicide attempt is in progress). Furthermore, work in progress demonstrates how we can identify behaviors prior to an attempt at a hotspot, an important gap identified by papers in the literature. Conclusions Thus far, CCTV and video have been used in a wide array of applications, most notably in designing automated detection systems, with the field heading toward an automated detection system for early intervention. Despite many challenges, we show promising progress in developing an automated detection system for preattempt behaviors, which may allow for early intervention.

Download Full-text

Real Time Video Data Mining for Surveillance Video Streams

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/3-540-36175-8_22 ◽

2003 ◽

pp. 222-233 ◽

Cited By ~ 8

Author(s):

JungHwan Oh ◽

JeongKyu Lee ◽

Sanjaykumar Kote

Keyword(s):

Data Mining ◽

Real Time ◽

Video Data ◽

Surveillance Video ◽

Video Streams

Download Full-text

A Novel Approach to Spatio-Temporal Video Analysis and Retrieval

Computer Vision/Computer Graphics CollaborationTechniques - Lecture Notes in Computer Science ◽

10.1007/978-3-642-01811-4_10 ◽

2009 ◽

pp. 106-115

Author(s):

Sameer Singh ◽

Wei Ren ◽

Maneesha Singh

Keyword(s):

Video Analysis ◽

Novel Approach ◽

Spatio Temporal

Download Full-text

Performance Evaluation of Different Cost Functions in Motion Vector Estimation

International Journal of Service Science Management Engineering and Technology ◽

10.4018/ijssmet.2014010103 ◽

2014 ◽

Vol 5 (1) ◽

pp. 45-65 ◽

Cited By ~ 12

Author(s):

Suvojit Acharjee ◽

Sayan Chakraborty ◽

Wahiba Ben Abdessalem Karaa ◽

Ahmad Taher Azar ◽

Nilanjan Dey

Keyword(s):

Video Compression ◽

Motion Vector ◽

Video Data ◽

Cost Functions ◽

Absolute Difference ◽

Surveillance Video ◽

Mean Absolute Difference ◽

Vector Estimation ◽

The Difference ◽

Temporal Redundancy

Video is an important medium in terms of information sharing in this present era. The tremendous growth of video use can be seen in the traditional multimedia application as well as in many other applications like medical videos, surveillance video etc. Raw video data is usually large in size, which demands for video compression. In different video compressing schemes, motion vector is a very important step to remove the temporal redundancy. A frame is first divided into small blocks and then motion vector for each block is computed. The difference between two blocks is evaluated by different cost functions (i.e. mean absolute difference (MAD), mean square error (MSE) etc).In this paper the performance of different cost functions was evaluated and also the most suitable cost function for motion vector estimation was found.

Download Full-text