scholarly journals Object Modelling and Tracking in Videos via Multidimensional Features

2011 ◽  
Vol 2011 ◽  
pp. 1-15 ◽  
Author(s):  
Zhuhan Jiang

We propose to model a tracked object in a video sequence by locating a list of object features that are ranked according to their ability to differentiate against the image background. The Bayesian inference is utilised to derive the probabilistic location of the object in the current frame, with the prior being approximated from the previous frame and the posterior achieved via the current pixel distribution of the object. Consideration has also been made to a number of relevant aspects of object tracking including multidimensional features and the mixture of colours, textures, and object motion. The experiment of the proposed method on the video sequences has been conducted and has shown its effectiveness in capturing the target in a moving background and with nonrigid object motion.

Author(s):  
Lipeng Gu ◽  
Shaoyuan Sun ◽  
Xunhua Liu ◽  
Xiang Li

Abstract Compared with 2D multi-object tracking algorithms, 3D multi-object tracking algorithms have more research significance and broad application prospects in the unmanned vehicles research field. Aiming at the problem of 3D multi-object detection and tracking, in this paper, the multi-object tracker CenterTrack, which focuses on 2D multi-object tracking task while ignoring object 3D information, is improved mainly from two aspects of detection and tracking, and the improved network is called CenterTrack3D. In terms of detection, CenterTrack3D uses the idea of attention mechanism to optimize the way that the previous-frame image and the heatmap of previous-frame tracklets are added to the current-frame image as input, and second convolutional layer of the output head is replaced by dynamic convolution layer, which further improves the ability to detect occluded objects. In terms of tracking, a cascaded data association algorithm based on 3D Kalman filter is proposed to make full use of the 3D information of objects in the image and increase the robustness of the 3D multi-object tracker. The experimental results show that, compared with the original CenterTrack and the existing 3D multi-object tracking methods, CenterTrack3D achieves 88.75% MOTA for cars and 59.40% MOTA for pedestrians and is very competitive on the KITTI tracking benchmark test set.


Author(s):  
B. A. Zalesky

The algorithm ACT (Adaptive Color Tracker) to track objects by a moving video camera is presented. One of the features of the algorithm is the adaptation of the feature set of the tracked object to the background of the current frame. At each step, the algorithm extracts from the object features those that are more specific to the object and at the same time are at least specific to the current frame background, since the rest of the object features not only do not contribute to the separation of the tracked object from the background, but also impede its correct detection. The features of the object and background are formed based on the color representations of scenes. They can be computed in two ways. The first way is 3D-color vectors of the clustered image of the object and the background by a fast version of the well-known k-means algorithm. The second way consists in simpler and faster partitioning of the RGB-color space into 3D-parallelepipeds and subsequent replacement of the color of each pixel with the average value of all colors belonging to the same parallelepiped as the pixel color. Another specificity of the algorithm is its simplicity, which allows it to be used on small mobile computers, such as the Jetson TXT1 or TXT2.The algorithm was tested on video sequences captured by various camcorders, as well as by using the well-known TV77 data set, containing 77 different tagged video sequences. The tests have shown the efficiency of the algorithm. On the test images, its accuracy and speed overcome the characteristics of the trackers implemented in the computer vision library OpenCV 4.1.


Author(s):  
R. Bahmanyar ◽  
S. M. Azimi ◽  
P. Reinartz

Abstract. Geo-referenced real-time vehicle and person tracking in aerial imagery has a variety of applications such as traffic and large-scale event monitoring, disaster management, and also for input into predictive traffic and crowd models. However, object tracking in aerial imagery is still an unsolved challenging problem due to the tiny size of the objects as well as different scales and the limited temporal resolution of geo-referenced datasets. In this work, we propose a new approach based on Convolutional Neural Networks (CNNs) to track multiple vehicles and people in aerial image sequences. As the large number of objects in aerial images can exponentially increase the processing demands in multiple object tracking scenarios, the proposed approach utilizes the stack of micro CNNs, where each micro CNN is responsible for a single-object tracking task. We call our approach Stack of Micro-Single- Object-Tracking CNNs (SMSOT-CNN). More precisely, using a two-stream CNN, we extract a set of features from two consecutive frames for each object, with the given location of the object in the previous frame. Then, we assign each MSOT-CNN the extracted features of each object to predict the object location in the current frame. We train and validate the proposed approach on the vehicle and person sets of the KIT AIS dataset of object tracking in aerial image sequences. Results indicate the accurate and time-efficient tracking of multiple vehicles and people by the proposed approach.


Video analysis plays a vital role in commercial application, sports and military systems. Various methods are presented in literature. Mean shift algorithm is presented in this paper for basket ball tracking because it is more efficient than other that is defined by histograms. The tracking is the important block in the detection and recognition of the basket ball. Different object tracking algorithms are investigated. The performance of tracking in two video sequences is performed and the method gives 91.3% precision for video sequence 1 and 93.6% for sequence 2


Author(s):  
M O Elantcev ◽  
I O Arkhipov ◽  
R M Gafarov

The work deals with a method of eliminating the perspective distortion of an image acquired from an unmanned aerial vehicle (UAV) camera in order to transform it to match the parameters of the satellite image. The normalization is performed in one of the two ways. The first variant consists in the calculation of an image transformation matrix based on the camera position and orientation. The second variant is based on matching the current frame with the previous one. The matching results in the shift, rotation, and scale parameters that are used to obtain an initial set of pairs of corresponding keypoints. From this set four pairs are selected to calculate the perspective transformation matrix. This matrix is in turn used to obtain a new set of pairs of corresponding keypoints. The process is repeated while the number of the pairs in the new set exceeds the number in the current one. The accumulated transformation matrix is then multiplied by the transformation matrix obtained during the normalization of the previous frame. The final part presents the results of the method that show that the proposed method can improve the accuracy of the visual navigation system at low computational costs.


2009 ◽  
Vol 09 (04) ◽  
pp. 609-627 ◽  
Author(s):  
J. WANG ◽  
N. V. PATEL ◽  
W. I. GROSKY ◽  
F. FOTOUHI

In this paper, we address the problem of camera and object motion detection in the compressed domain. The estimation of camera motion and the moving object segmentation have been widely stated in a variety of context for video analysis, due to their capabilities of providing essential clues for interpreting the high-level semantics of video sequences. A novel compressed domain motion estimation and segmentation scheme is presented and applied in this paper. MPEG-2 compressed domain information, namely Motion Vectors (MV) and Discrete Cosine Transform (DCT) coefficients, is filtered and manipulated to obtain a dense and reliable Motion Vector Field (MVF) over consecutive frames. An iterative segmentation scheme based upon the generalized affine transformation model is exploited to effect the global camera motion detection. The foreground spatiotemporal objects are separated from the background using the temporal consistency check to the output of the iterative segmentation. This consistency check process can coalesce the resulting foreground blocks and weed out unqualified blocks. Illustrative examples are provided to demonstrate the efficacy of the proposed approach.


Sign in / Sign up

Export Citation Format

Share Document