Topic-based Video Analysis

2021 ◽  
Vol 54 (6) ◽  
pp. 1-34
Author(s):  
Ratnabali Pal ◽  
Arif Ahmed Sekh ◽  
Debi Prosad Dogra ◽  
Samarjit Kar ◽  
Partha Pratim Roy ◽  
...  

Manual processing of a large volume of video data captured through closed-circuit television is challenging due to various reasons. First, manual analysis is highly time-consuming. Moreover, as surveillance videos are recorded in dynamic conditions such as in the presence of camera motion, varying illumination, or occlusion, conventional supervised learning may not work always. Thus, computer vision-based automatic surveillance scene analysis is carried out in unsupervised ways. Topic modelling is one of the emerging fields used in unsupervised information processing. Topic modelling is used in text analysis, computer vision applications, and other areas involving spatio-temporal data. In this article, we discuss the scope, variations, and applications of topic modelling, particularly focusing on surveillance video analysis. We have provided a methodological survey on existing topic models, their features, underlying representations, characterization, and applications in visual surveillance’s perspective. Important research papers related to topic modelling in visual surveillance have been summarized and critically analyzed in this article.

Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 598
Author(s):  
Massimiliano Pau ◽  
Bruno Leban ◽  
Michela Deidda ◽  
Federica Putzolu ◽  
Micaela Porta ◽  
...  

The majority of people with Multiple Sclerosis (pwMS), report lower limb motor dysfunctions, which may relevantly affect postural control, gait and a wide range of activities of daily living. While it is quite common to observe a different impact of the disease on the two limbs (i.e., one of them is more affected), less clear are the effects of such asymmetry on gait performance. The present retrospective cross-sectional study aimed to characterize the magnitude of interlimb asymmetry in pwMS, particularly as regards the joint kinematics, using parameters derived from angle-angle diagrams. To this end, we analyzed gait patterns of 101 pwMS (55 women, 46 men, mean age 46.3, average Expanded Disability Status Scale (EDSS) score 3.5, range 1–6.5) and 81 unaffected individuals age- and sex-matched who underwent 3D computerized gait analysis carried out using an eight-camera motion capture system. Spatio-temporal parameters and kinematics in the sagittal plane at hip, knee and ankle joints were considered for the analysis. The angular trends of left and right sides were processed to build synchronized angle–angle diagrams (cyclograms) for each joint, and symmetry was assessed by computing several geometrical features such as area, orientation and Trend Symmetry. Based on cyclogram orientation and Trend Symmetry, the results show that pwMS exhibit significantly greater asymmetry in all three joints with respect to unaffected individuals. In particular, orientation values were as follows: 5.1 of pwMS vs. 1.6 of unaffected individuals at hip joint, 7.0 vs. 1.5 at knee and 6.4 vs. 3.0 at ankle (p < 0.001 in all cases), while for Trend Symmetry we obtained at hip 1.7 of pwMS vs. 0.3 of unaffected individuals, 4.2 vs. 0.5 at knee and 8.5 vs. 1.5 at ankle (p < 0.001 in all cases). Moreover, the same parameters were sensitive enough to discriminate individuals of different disability levels. With few exceptions, all the calculated symmetry parameters were found significantly correlated with the main spatio-temporal parameters of gait and the EDSS score. In particular, large correlations were detected between Trend Symmetry and gait speed (with rho values in the range of –0.58 to –0.63 depending on the considered joint, p < 0.001) and between Trend Symmetry and EDSS score (rho = 0.62 to 0.69, p < 0.001). Such results suggest not only that MS is associated with significantly marked interlimb asymmetry during gait but also that such asymmetry worsens as the disease progresses and that it has a relevant impact on gait performances.


2021 ◽  
Vol 11 (9) ◽  
pp. 3730
Author(s):  
Aniqa Dilawari ◽  
Muhammad Usman Ghani Khan ◽  
Yasser D. Al-Otaibi ◽  
Zahoor-ur Rehman ◽  
Atta-ur Rahman ◽  
...  

After the September 11 attacks, security and surveillance measures have changed across the globe. Now, surveillance cameras are installed almost everywhere to monitor video footage. Though quite handy, these cameras produce videos in a massive size and volume. The major challenge faced by security agencies is the effort of analyzing the surveillance video data collected and generated daily. Problems related to these videos are twofold: (1) understanding the contents of video streams, and (2) conversion of the video contents to condensed formats, such as textual interpretations and summaries, to save storage space. In this paper, we have proposed a video description framework on a surveillance dataset. This framework is based on the multitask learning of high-level features (HLFs) using a convolutional neural network (CNN) and natural language generation (NLG) through bidirectional recurrent networks. For each specific task, a parallel pipeline is derived from the base visual geometry group (VGG)-16 model. Tasks include scene recognition, action recognition, object recognition and human face specific feature recognition. Experimental results on the TRECViD, UET Video Surveillance (UETVS) and AGRIINTRUSION datasets depict that the model outperforms state-of-the-art methods by a METEOR (Metric for Evaluation of Translation with Explicit ORdering) score of 33.9%, 34.3%, and 31.2%, respectively. Our results show that our framework has distinct advantages over traditional rule-based models for the recognition and generation of natural language descriptions.


Author(s):  
Shiyu Deng ◽  
Chaitanya Kulkarni ◽  
Tianzi Wang ◽  
Jacob Hartman-Kenzler ◽  
Laura E. Barnes ◽  
...  

Context dependent gaze metrics, derived from eye movements explicitly associated with how a task is being performed, are particularly useful for formative assessment that includes feedback on specific behavioral adjustments for skill acquisitions. In laparoscopic surgery, context dependent gaze metrics are under investigated and commonly derived by either qualitatively inspecting the videos frame by frame or mapping the fixations onto a static surgical task field. This study collected eye-tracking and video data from 13 trainees practicing the peg transfer task. Machine learning algorithms in computer vision were employed to derive metrics of tool speed, fixation rate on (moving or stationary) target objects, and fixation rate on tool-object combination. Preliminary results from a clustering analysis on the measurements from 499 practice trials indicated that the metrics were able to differentiate three skill levels amongst the trainees, suggesting high sensitivity and potential of context dependent gaze metrics for surgical assessment.


2001 ◽  
Vol 10 (04) ◽  
pp. 715-734 ◽  
Author(s):  
SHU-CHING CHEN ◽  
MEI-LING SHYU ◽  
CHENGCUI ZHANG ◽  
R. L. KASHYAP

The identification of the overlapped objects is a great challenge in object tracking and video data indexing. For this purpose, a backtrack-chain-updation split algorithm is proposed to assist an unsupervised video segmentation method called the "simultaneous partition and class parameter estimation" (SPCPE) algorithm to identify the overlapped objects in the video sequence. The backtrack-chain-updation split algorithm can identify the split segment (object) and use the information in the current frame to update the previous frames in a backtrack-chain manner. The split algorithm provides more accurate temporal and spatial information of the semantic objects so that the semantic objects can be indexed and modeled by multimedia input strings and the multimedia augmented transition network (MATN) model. The MATN model is based on the ATN model that has been used in artificial intelligence (AI) areas for natural language understanding systems, and its inputs are modeled by the multimedia input strings. In this paper, we will show that the SPCPE algorithm together with the backtrack-chain-updation split algorithm can significantly enhance the efficiency of spatio-temporal video indexing by improving the accuracy of multimedia database queries related to semantic objects.


10.2196/27663 ◽  
2021 ◽  
Vol 8 (5) ◽  
pp. e27663
Author(s):  
Sandersan Onie ◽  
Xun Li ◽  
Morgan Liang ◽  
Arcot Sowmya ◽  
Mark Erik Larsen

Background Suicide is a recognized public health issue, with approximately 800,000 people dying by suicide each year. Among the different technologies used in suicide research, closed-circuit television (CCTV) and video have been used for a wide array of applications, including assessing crisis behaviors at metro stations, and using computer vision to identify a suicide attempt in progress. However, there has been no review of suicide research and interventions using CCTV and video. Objective The objective of this study was to review the literature to understand how CCTV and video data have been used in understanding and preventing suicide. Furthermore, to more fully capture progress in the field, we report on an ongoing study to respond to an identified gap in the narrative review, by using a computer vision–based system to identify behaviors prior to a suicide attempt. Methods We conducted a search using the keywords “suicide,” “cctv,” and “video” on PubMed, Inspec, and Web of Science. We included any studies which used CCTV or video footage to understand or prevent suicide. If a study fell into our area of interest, we included it regardless of the quality as our goal was to understand the scope of how CCTV and video had been used rather than quantify any specific effect size, but we noted the shortcomings in their design and analyses when discussing the studies. Results The review found that CCTV and video have primarily been used in 3 ways: (1) to identify risk factors for suicide (eg, inferring depression from facial expressions), (2) understanding suicide after an attempt (eg, forensic applications), and (3) as part of an intervention (eg, using computer vision and automated systems to identify if a suicide attempt is in progress). Furthermore, work in progress demonstrates how we can identify behaviors prior to an attempt at a hotspot, an important gap identified by papers in the literature. Conclusions Thus far, CCTV and video have been used in a wide array of applications, most notably in designing automated detection systems, with the field heading toward an automated detection system for early intervention. Despite many challenges, we show promising progress in developing an automated detection system for preattempt behaviors, which may allow for early intervention.


Author(s):  
Suvojit Acharjee ◽  
Sayan Chakraborty ◽  
Wahiba Ben Abdessalem Karaa ◽  
Ahmad Taher Azar ◽  
Nilanjan Dey

Video is an important medium in terms of information sharing in this present era. The tremendous growth of video use can be seen in the traditional multimedia application as well as in many other applications like medical videos, surveillance video etc. Raw video data is usually large in size, which demands for video compression. In different video compressing schemes, motion vector is a very important step to remove the temporal redundancy. A frame is first divided into small blocks and then motion vector for each block is computed. The difference between two blocks is evaluated by different cost functions (i.e. mean absolute difference (MAD), mean square error (MSE) etc).In this paper the performance of different cost functions was evaluated and also the most suitable cost function for motion vector estimation was found.


Sign in / Sign up

Export Citation Format

Share Document