Video Scene Detection Using Compact Bag of Visual Word Models

Advances in Multimedia ◽

10.1155/2018/2564963 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Muhammad Haroon ◽

Junaid Baber ◽

Ihsan Ullah ◽

Sher Muhammad Daudpota ◽

Maheen Bakhtyar ◽

...

Keyword(s):

Video Segmentation ◽

Sliding Window ◽

Visual Word ◽

Dimensional Vector ◽

Scene Detection ◽

Feature Vectors ◽

Proposed Model ◽

Video Scene ◽

Segmentation Accuracy ◽

Key Frames

Video segmentation into shots is the first step for video indexing and searching. Videos shots are mostly very small in duration and do not give meaningful insight of the visual contents. However, grouping of shots based on similar visual contents gives a better understanding of the video scene; grouping of similar shots is known as scene boundary detection or video segmentation into scenes. In this paper, we propose a model for video segmentation into visual scenes using bag of visual word (BoVW) model. Initially, the video is divided into the shots which are later represented by a set of key frames. Key frames are further represented by BoVW feature vectors which are quite short and compact compared to classical BoVW model implementations. Two variations of BoVW model are used: (1) classical BoVW model and (2) Vector of Linearly Aggregated Descriptors (VLAD) which is an extension of classical BoVW model. The similarity of the shots is computed by the distances between their key frames feature vectors within the sliding window of length L, rather comparing each shot with very long lists of shots which has been previously practiced, and the value of L is 4. Experiments on cinematic and drama videos show the effectiveness of our proposed framework. The BoVW is 25000-dimensional vector and VLAD is only 2048-dimensional vector in the proposed model. The BoVW achieves 0.90 segmentation accuracy, whereas VLAD achieves 0.83.

Download Full-text

Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation

Sensors ◽

10.3390/s21093164 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3164

Author(s):

Gayoung Jung ◽

Jonghun Lee ◽

Incheol Kim

Keyword(s):

Neural Network ◽

High Performance ◽

Sliding Window ◽

Temporal Context ◽

Scene Graph ◽

Context Reasoning ◽

Proposed Model ◽

Video Scene ◽

Spatio Temporal ◽

Graph Generation

Video scene graph generation (ViDSGG), the creation of video scene graphs that helps in deeper and better visual scene understanding, is a challenging task. Segment-based and sliding-window based methods have been proposed to perform this task. However, they all have certain limitations. This study proposes a novel deep neural network model called VSGG-Net for video scene graph generation. The model uses a sliding window scheme to detect object tracklets of various lengths throughout the entire video. In particular, the proposed model presents a new tracklet pair proposal method that evaluates the relatedness of object tracklet pairs using a pretrained neural network and statistical information. To effectively utilize the spatio-temporal context, low-level visual context reasoning is performed using a spatio-temporal context graph and a graph neural network as well as high-level semantic context reasoning. To improve the detection performance for sparse relationships, the proposed model applies a class weighting technique that adjusts the weight of sparse relationships to a higher level. This study demonstrates the positive effect and high performance of the proposed model through experiments using the benchmark dataset VidOR and VidVRD.

Download Full-text

A graph-based approach for video scene detection

2008 IEEE 16th Signal Processing, Communication and Applications Conference ◽

10.1109/siu.2008.4632545 ◽

2008 ◽

Cited By ~ 5

Author(s):

Ufuk Sakarya ◽

Ziya Telatar

Keyword(s):

Scene Detection ◽

Video Scene

Download Full-text

Video scene detection using dominant sets

2008 15th IEEE International Conference on Image Processing ◽

10.1109/icip.2008.4711694 ◽

2008 ◽

Cited By ~ 5

Author(s):

Ufuk Sakarya ◽

Ziya Telatar

Keyword(s):

Scene Detection ◽

Video Scene ◽

Dominant Sets

Download Full-text

Video Scene Detection of Burst Swimming by Fry of Farmed-raised Bluefin Tuna

2018 4th International Conference on Frontiers of Signal Processing (ICFSP) ◽

10.1109/icfsp.2018.8552079 ◽

2018 ◽

Author(s):

Koji Abe ◽

Masaru Tanaka ◽

Hitoshi Habe ◽

Yoshiaki Taniguchi ◽

Nobukazu Iguchi

Keyword(s):

Bluefin Tuna ◽

Scene Detection ◽

Video Scene ◽

Burst Swimming

Download Full-text

Semantic segmentation of gonio-photographs via adaptive ROI localisation and uncertainty estimation

BMJ Open Ophthalmology ◽

10.1136/bmjophth-2021-000898 ◽

2021 ◽

Vol 6 (1) ◽

pp. e000898

Author(s):

Andrea Peroni ◽

Anna Paviotti ◽

Mauro Campigotto ◽

Luis Abegão Pinto ◽

Carlo Alberto Cutolo ◽

...

Keyword(s):

Region Of Interest ◽

Ground Truth ◽

Semantic Segmentation ◽

Uncertainty Estimation ◽

Depth Of Field ◽

Clinical Settings ◽

Proposed Model ◽

Validation Experiment ◽

Segmentation Accuracy ◽

Ground Truth Image

ObjectiveTo develop and test a deep learning (DL) model for semantic segmentation of anatomical layers of the anterior chamber angle (ACA) in digital gonio-photographs.Methods and analysisWe used a pilot dataset of 274 ACA sector images, annotated by expert ophthalmologists to delineate five anatomical layers: iris root, ciliary body band, scleral spur, trabecular meshwork and cornea. Narrow depth-of-field and peripheral vignetting prevented clinicians from annotating part of each image with sufficient confidence, introducing a degree of subjectivity and features correlation in the ground truth. To overcome these limitations, we present a DL model, designed and trained to perform two tasks simultaneously: (1) maximise the segmentation accuracy within the annotated region of each frame and (2) identify a region of interest (ROI) based on local image informativeness. Moreover, our calibrated model provides results interpretability returning pixel-wise classification uncertainty through Monte Carlo dropout.ResultsThe model was trained and validated in a 5-fold cross-validation experiment on ~90% of available data, achieving ~91% average segmentation accuracy within the annotated part of each ground truth image of the hold-out test set. An appropriate ROI was successfully identified in all test frames. The uncertainty estimation module located correctly inaccuracies and errors of segmentation outputs.ConclusionThe proposed model improves the only previously published work on gonio-photographs segmentation and may be a valid support for the automatic processing of these images to evaluate local tissue morphology. Uncertainty estimation is expected to facilitate acceptance of this system in clinical settings.

Download Full-text

Traffic Flow Anomaly Detection Based on Robust Ridge Regression with Particle Swarm Optimization Algorithm

Mathematical Problems in Engineering ◽

10.1155/2020/3673085 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Mingzhu Tang ◽

Xiangwan Fu ◽

Huawei Wu ◽

Qi Huang ◽

Qi Zhao

Keyword(s):

Anomaly Detection ◽

Traffic Flow ◽

Ridge Regression ◽

Cross Validation ◽

Sliding Window ◽

Pso Algorithm ◽

Swarm Optimization ◽

Feature Sets ◽

Proposed Model ◽

Fold Cross Validation

Traffic flow anomaly detection is helpful to improve the efficiency and reliability of detecting fault behavior and the overall effectiveness of the traffic operation. The data detected by the traffic flow sensor contains a lot of noise due to equipment failure, environmental interference, and other factors. In the case of large traffic flow data noises, a traffic flow anomaly detection method based on robust ridge regression with particle swarm optimization (PSO) algorithm is proposed. Feature sets containing historical characteristics with a strong linear correlation and statistical characteristics using the optimal sliding window are constructed. Then by providing the feature sets inputs to the PSO-Huber-Ridge model and the model outputs the traffic flow. The Huber loss function is recommended to reduce noise interference in the traffic flow. The L2 regular term of the ridge regression is employed to reduce the degree of overfitting of the model training. A fitness function is constructed, which can balance the relative size between the k-fold cross-validation root mean square error and the k-fold cross-validation average absolute error with the control parameter η to improve the optimization efficiency of the optimization algorithm and the generalization ability of the proposed model. The hyperparameters of the robust ridge regression forecast model are optimized by the PSO algorithm to obtain the optimal hyperparameters. The traffic flow data set is used to train and validate the proposed model. Compared with other optimization methods, the proposed model has the lowest RMSE, MAE, and MAPE. Finally, the traffic flow that forecasted by the proposed model is used to perform anomaly detection. The abnormality of the error between the forecasted value and the actual value is detected by the abnormal traffic flow threshold based on the sliding window. The experimental results verify the validity of the proposed anomaly detection model.

Download Full-text

Video scene detection using graph-based representations

Signal Processing Image Communication ◽

10.1016/j.image.2010.10.001 ◽

2010 ◽

Vol 25 (10) ◽

pp. 774-783 ◽

Cited By ~ 10

Author(s):

Ufuk Sakarya ◽

Ziya Telatar

Keyword(s):

Scene Detection ◽

Video Scene

Download Full-text

Video Summarization Based on Mutual Information and Entropy Sliding Window Method

Entropy ◽

10.3390/e22111285 ◽

2020 ◽

Vol 22 (11) ◽

pp. 1285

Author(s):

WenLin Li ◽

DeYu Qi ◽

ChangJian Zhang ◽

Jing Guo ◽

JiaJun Yao

Keyword(s):

Mutual Information ◽

Video Summarization ◽

Sliding Window ◽

Information Value ◽

Second Step ◽

Step Method ◽

Speeded Up Robust Features ◽

The Third ◽

Window Method ◽

Key Frames

This paper proposes a video summarization algorithm called the Mutual Information and Entropy based adaptive Sliding Window (MIESW) method, which is specifically for the static summary of gesture videos. Considering that gesture videos usually have uncertain transition postures and unclear movement boundaries or inexplicable frames, we propose a three-step method where the first step involves browsing a video, the second step applies the MIESW method to select candidate key frames, and the third step removes most redundant key frames. In detail, the first step is to convert the video into a sequence of frames and adjust the size of the frames. In the second step, a key frame extraction algorithm named MIESW is executed. The inter-frame mutual information value is used as a metric to adaptively adjust the size of the sliding window to group similar content of the video. Then, based on the entropy value of the frame and the average mutual information value of the frame group, the threshold method is applied to optimize the grouping, and the key frames are extracted. In the third step, speeded up robust features (SURF) analysis is performed to eliminate redundant frames in these candidate key frames. The calculation of Precision, Recall, and Fmeasure are optimized from the perspective of practicality and feasibility. Experiments demonstrate that key frames extracted using our method provide high-quality video summaries and basically cover the main content of the gesture video.

Download Full-text

A Novel Fuzzy Linear Regression Sliding Window GARCH Model for Time-Series Forecasting

Applied Sciences ◽

10.3390/app10061949 ◽

2020 ◽

Vol 10 (6) ◽

pp. 1949

Author(s):

Amiratul L. Mohamad Hanapi ◽

Mahmod Othman ◽

Rajalingam Sokkalingam ◽

Nazirah Ramli ◽

Abdullah Husin ◽

...

Keyword(s):

Time Series ◽

Linear Regression ◽

Garch Model ◽

Sliding Window ◽

Time Series Forecasting ◽

Likelihood Method ◽

Fuzzy Linear Regression ◽

Specific Distribution ◽

Proposed Model ◽

The Garch Model

Generalized autoregressive conditional heteroskedasticity (GARCH) is one of the most popular models for time-series forecasting. The GARCH model uses a maximum likelihood method for parameter estimation. For the likelihood method to work, there should be a known and specific distribution. However, due to uncertainties in time-series data, a specific distribution is indeterminable. The GARCH model is also unable to capture the influence of each variance in the observation because the calculation of the long-run average variance only considers the series in its entirety, hence the information on different effects of the variances in each observation is disregarded. Therefore, in this study, a novel forecasting model dubbed a fuzzy linear regression sliding window GARCH (FLR-FSWGARCH) model was proposed; a fuzzy linear regression was combined in GARCH to estimate parameters and a fuzzy sliding window variance was developed to estimate the weight of a forecast. The proposed model promotes consistency and symmetry in the parameter estimation and forecasting, which in turn increases the accuracy of forecasts. Two datasets were used for evaluation purposes and the result of the proposed model produced forecasts that were almost similar to the actual data and outperformed existing models. The proposed model was significantly fitted and reliable for time-series forecasting.

Download Full-text

Real time video scene detection and classification

Information Processing & Management ◽

10.1016/s0306-4573(98)00067-3 ◽

1999 ◽

Vol 35 (3) ◽

pp. 381-400 ◽

Cited By ~ 26

Author(s):

John M. Gauch ◽

Susan Gauch ◽

Sylvain Bouix ◽

Xiaolan Zhu

Keyword(s):

Real Time ◽

Scene Detection ◽

Video Scene

Download Full-text