Smart IDReader: Document Recognition in Video Stream

Author(s):  
Konstantin Bulatov ◽  
Vladimir V. Arlazarov ◽  
Timofey Chernov ◽  
Oleg Slavin ◽  
Dmitry Nikolaev
2021 ◽  
Vol 45 (1) ◽  
pp. 77-89
Author(s):  
O. Petrova ◽  
K. Bulatov ◽  
V.V. Arlazarov ◽  
V.L. Arlazarov

The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.


2019 ◽  
Vol 43 (5) ◽  
pp. 818-824 ◽  
Author(s):  
V.V. Arlazarov ◽  
K. Bulatov ◽  
T. Chernov ◽  
V.L. Arlazarov

A lot of research has been devoted to identity documents analysis and recognition on mobile devices. However, no publicly available datasets designed for this particular problem currently exist. There are a few datasets which are useful for associated subtasks but in order to facilitate a more comprehensive scientific and technical approach to identity document recognition more specialized datasets are required. In this paper we present a Mobile Identity Document Video dataset (MIDV-500) consisting of 500 video clips for 50 different identity document types with ground truth which allows to perform research in a wide scope of document analysis problems. The paper presents characteristics of the dataset and evaluation results for existing methods of face detection, text line recognition, and document fields data extraction. Since an important feature of identity documents is their sensitiveness as they contain personal data, all source document images used in MIDV-500 are either in public domain or distributed under public copyright licenses. The main goal of this paper is to present a dataset. However, in addition and as a baseline, we present evaluation results for existing methods for face detection, text line recognition, and document data extraction, using the presented dataset.


2021 ◽  
Vol 45 (1) ◽  
pp. 101-109
Author(s):  
M.A. Aliev ◽  
I.A. Kunina ◽  
A.V. Kazbekov ◽  
V.L. Arlazarov

During the process of document recognition in a video stream using a mobile device camera, the image quality of the document varies greatly from frame to frame. Sometimes recognition system is required not only to recognize all the specified attributes of the document, but also to select final document image of the best quality. This is necessary, for example, for archiving or providing various services; in some countries it can be required by law. In this case, recognition system needs to assess the quality of frames in the video stream and choose the “best” frame. In this paper we considered the solution to such a problem where the “best” frame means the presence of all specified attributes in a readable form in the document image. The method was set up on a private dataset, and then tested on documents from the open MIDV-2019 dataset. A practically applicable result was obtained for use in recognition systems.


2020 ◽  
Vol 39 (6) ◽  
pp. 8463-8475
Author(s):  
Palanivel Srinivasan ◽  
Manivannan Doraipandian

Rare event detections are performed using spatial domain and frequency domain-based procedures. Omnipresent surveillance camera footages are increasing exponentially due course the time. Monitoring all the events manually is an insignificant and more time-consuming process. Therefore, an automated rare event detection contrivance is required to make this process manageable. In this work, a Context-Free Grammar (CFG) is developed for detecting rare events from a video stream and Artificial Neural Network (ANN) is used to train CFG. A set of dedicated algorithms are used to perform frame split process, edge detection, background subtraction and convert the processed data into CFG. The developed CFG is converted into nodes and edges to form a graph. The graph is given to the input layer of an ANN to classify normal and rare event classes. Graph derived from CFG using input video stream is used to train ANN Further the performance of developed Artificial Neural Network Based Context-Free Grammar – Rare Event Detection (ACFG-RED) is compared with other existing techniques and performance metrics such as accuracy, precision, sensitivity, recall, average processing time and average processing power are used for performance estimation and analyzed. Better performance metrics values have been observed for the ANN-CFG model compared with other techniques. The developed model will provide a better solution in detecting rare events using video streams.


2019 ◽  
Vol 4 (91) ◽  
pp. 21-29 ◽  
Author(s):  
Yaroslav Trofimenko ◽  
Lyudmila Vinogradova ◽  
Evgeniy Ershov

2017 ◽  
Vol 2 (1) ◽  
pp. 80-87
Author(s):  
Puyda V. ◽  
◽  
Stoian. A.

Detecting objects in a video stream is a typical problem in modern computer vision systems that are used in multiple areas. Object detection can be done on both static images and on frames of a video stream. Essentially, object detection means finding color and intensity non-uniformities which can be treated as physical objects. Beside that, the operations of finding coordinates, size and other characteristics of these non-uniformities that can be used to solve other computer vision related problems like object identification can be executed. In this paper, we study three algorithms which can be used to detect objects of different nature and are based on different approaches: detection of color non-uniformities, frame difference and feature detection. As the input data, we use a video stream which is obtained from a video camera or from an mp4 video file. Simulations and testing of the algoritms were done on a universal computer based on an open-source hardware, built on the Broadcom BCM2711, quad-core Cortex-A72 (ARM v8) 64-bit SoC processor with frequency 1,5GHz. The software was created in Visual Studio 2019 using OpenCV 4 on Windows 10 and on a universal computer operated under Linux (Raspbian Buster OS) for an open-source hardware. In the paper, the methods under consideration are compared. The results of the paper can be used in research and development of modern computer vision systems used for different purposes. Keywords: object detection, feature points, keypoints, ORB detector, computer vision, motion detection, HSV model color


2013 ◽  
Vol 18 (2-3) ◽  
pp. 49-60 ◽  
Author(s):  
Damian Dudzńiski ◽  
Tomasz Kryjak ◽  
Zbigniew Mikrut

Abstract In this paper a human action recognition algorithm, which uses background generation with shadow elimination, silhouette description based on simple geometrical features and a finite state machine for recognizing particular actions is described. The performed tests indicate that this approach obtains a 81 % correct recognition rate allowing real-time image processing of a 360 X 288 video stream.


2020 ◽  
Vol 10 (19) ◽  
pp. 6885
Author(s):  
Sahar Ujan ◽  
Neda Navidi ◽  
Rene Jr Landry

Radio Frequency Interference (RFI) detection and characterization play a critical role in ensuring the security of all wireless communication networks. Advances in Machine Learning (ML) have led to the deployment of many robust techniques dealing with various types of RFI. To sidestep an unavoidable complicated feature extraction step in ML, we propose an efficient Deep Learning (DL)-based methodology using transfer learning to determine both the type of received signals and their modulation type. To this end, the scalogram of the received signals is used as the input of the pretrained convolutional neural networks (CNN), followed by a fully-connected classifier. This study considers a digital video stream as the signal of interest (SoI), transmitted in a real-time satellite-to-ground communication using DVB-S2 standards. To create the RFI dataset, the SoI is combined with three well-known jammers namely, continuous-wave interference (CWI), multi- continuous-wave interference (MCWI), and chirp interference (CI). This study investigated four well-known pretrained CNN architectures, namely, AlexNet, VGG-16, GoogleNet, and ResNet-18, for the feature extraction to recognize the visual RFI patterns directly from pixel images with minimal preprocessing. Moreover, the robustness of the proposed classifiers is evaluated by the data generated at different signal to noise ratios (SNR).


2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Matthias Ivantsits ◽  
Lennart Tautz ◽  
Simon Sündermann ◽  
Isaac Wamala ◽  
Jörg Kempfert ◽  
...  

AbstractMinimally invasive surgery is increasingly utilized for mitral valve repair and replacement. The intervention is performed with an endoscopic field of view on the arrested heart. Extracting the necessary information from the live endoscopic video stream is challenging due to the moving camera position, the high variability of defects, and occlusion of structures by instruments. During such minimally invasive interventions there is no time to segment regions of interest manually. We propose a real-time-capable deep-learning-based approach to detect and segment the relevant anatomical structures and instruments. For the universal deployment of the proposed solution, we evaluate them on pixel accuracy as well as distance measurements of the detected contours. The U-Net, Google’s DeepLab v3, and the Obelisk-Net models are cross-validated, with DeepLab showing superior results in pixel accuracy and distance measurements.


Sign in / Sign up

Export Citation Format

Share Document