Smart IDReader: Document Recognition in Video Stream

Experimental modeling the flow of character recognition results in video stream for document recognition

Eleventh International Conference on Machine Vision (ICMV 2018) ◽

10.1117/12.2522970 ◽

2019 ◽

Author(s):

Elena Andreeva ◽

Vladimir V. Arlazarov ◽

Oleg Slavin ◽

Igor Janiszewski

Keyword(s):

Character Recognition ◽

Video Stream ◽

Experimental Modeling ◽

Document Recognition

Download Full-text

Weighted combination of per-frame recognition results for text recognition in a video stream

Computer Optics ◽

10.18287/2412-6179-co-795 ◽

2021 ◽

Vol 45 (1) ◽

pp. 77-89

Author(s):

O. Petrova ◽

K. Bulatov ◽

V.V. Arlazarov ◽

V.L. Arlazarov

Keyword(s):

Video Stream ◽

Input Image ◽

Document Image ◽

Text Recognition ◽

Weighting Method ◽

Document Recognition ◽

Perspective Distortion ◽

Character Weighting ◽

Specialized Equipment ◽

Weighted Combination

The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.

Download Full-text

MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream

Computer Optics ◽

10.18287/2412-6179-2019-43-5-818-824 ◽

2019 ◽

Vol 43 (5) ◽

pp. 818-824 ◽

Cited By ~ 7

Author(s):

V.V. Arlazarov ◽

K. Bulatov ◽

T. Chernov ◽

V.L. Arlazarov

Keyword(s):

Mobile Devices ◽

Face Detection ◽

Data Extraction ◽

Personal Data ◽

Ground Truth ◽

Document Analysis ◽

Video Stream ◽

Text Line ◽

Document Recognition ◽

Identity Document

A lot of research has been devoted to identity documents analysis and recognition on mobile devices. However, no publicly available datasets designed for this particular problem currently exist. There are a few datasets which are useful for associated subtasks but in order to facilitate a more comprehensive scientific and technical approach to identity document recognition more specialized datasets are required. In this paper we present a Mobile Identity Document Video dataset (MIDV-500) consisting of 500 video clips for 50 different identity document types with ground truth which allows to perform research in a wide scope of document analysis problems. The paper presents characteristics of the dataset and evaluation results for existing methods of face detection, text line recognition, and document fields data extraction. Since an important feature of identity documents is their sensitiveness as they contain personal data, all source document images used in MIDV-500 are either in public domain or distributed under public copyright licenses. The main goal of this paper is to present a dataset. However, in addition and as a baseline, we present evaluation results for existing methods for face detection, text line recognition, and document data extraction, using the presented dataset.

Download Full-text

Algorithm for choosing the best frame in a video stream in the task of identity document recognition

Computer Optics ◽

10.18287/2412-6179-co-811 ◽

2021 ◽

Vol 45 (1) ◽

pp. 101-109

Author(s):

M.A. Aliev ◽

I.A. Kunina ◽

A.V. Kazbekov ◽

V.L. Arlazarov

Keyword(s):

Image Quality ◽

Recognition System ◽

Video Stream ◽

Document Image ◽

Document Recognition ◽

Identity Document ◽

Readable Form ◽

Recognition Systems ◽

Set Up

During the process of document recognition in a video stream using a mobile device camera, the image quality of the document varies greatly from frame to frame. Sometimes recognition system is required not only to recognize all the specified attributes of the document, but also to select final document image of the best quality. This is necessary, for example, for archiving or providing various services; in some countries it can be required by law. In this case, recognition system needs to assess the quality of frames in the video stream and choose the “best” frame. In this paper we considered the solution to such a problem where the “best” frame means the presence of all specified attributes in a readable form in the document image. The method was set up on a private dataset, and then tested on documents from the open MIDV-2019 dataset. A practically applicable result was obtained for use in recognition systems.

Download Full-text

Framework for rare event detection using Artificial Neural Network based context free grammar

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189164 ◽

2020 ◽

Vol 39 (6) ◽

pp. 8463-8475

Author(s):

Palanivel Srinivasan ◽

Manivannan Doraipandian

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Event Detection ◽

Performance Metrics ◽

Rare Events ◽

Rare Event ◽

Video Stream ◽

Context Free Grammar ◽

Artificial Neural ◽

Context Free

Rare event detections are performed using spatial domain and frequency domain-based procedures. Omnipresent surveillance camera footages are increasing exponentially due course the time. Monitoring all the events manually is an insignificant and more time-consuming process. Therefore, an automated rare event detection contrivance is required to make this process manageable. In this work, a Context-Free Grammar (CFG) is developed for detecting rare events from a video stream and Artificial Neural Network (ANN) is used to train CFG. A set of dedicated algorithms are used to perform frame split process, edge detection, background subtraction and convert the processed data into CFG. The developed CFG is converted into nodes and edges to form a graph. The graph is given to the input layer of an ANN to classify normal and rare event classes. Graph derived from CFG using input video stream is used to train ANN Further the performance of developed Artificial Neural Network Based Context-Free Grammar – Rare Event Detection (ACFG-RED) is compared with other existing techniques and performance metrics such as accuracy, precision, sensitivity, recall, average processing time and average processing power are used for performance estimation and analyzed. Better performance metrics values have been observed for the ANN-CFG model compared with other techniques. The developed model will provide a better solution in detecting rare events using video streams.

Download Full-text

ALGORITHMS FOR IMAGE PRE-PROCESSING IN THE FACE IDENTIFICATION SYSTEM IN THE VIDEO STREAM

Cherepovets State University Bulletin ◽

10.23859/1994-0637-2019-4-91-2 ◽

2019 ◽

Vol 4 (91) ◽

pp. 21-29 ◽

Cited By ~ 1

Author(s):

Yaroslav Trofimenko ◽

Lyudmila Vinogradova ◽

Evgeniy Ershov

Keyword(s):

Video Stream ◽

Face Identification ◽

Identification System ◽

The Face

Download Full-text

ON METHODS OF OBJECT DETECTION IN VIDEO STREAMS

Computer systems and network ◽

10.23939/csn2020.01.080 ◽

2017 ◽

Vol 2 (1) ◽

pp. 80-87

Author(s):

Puyda V. ◽

◽

Stoian. A.

Keyword(s):

Computer Vision ◽

Object Detection ◽

Open Source ◽

Feature Detection ◽

Video Stream ◽

Object Identification ◽

Vision Systems ◽

Modern Computer ◽

Computer Vision Systems ◽

Open Source Hardware

Detecting objects in a video stream is a typical problem in modern computer vision systems that are used in multiple areas. Object detection can be done on both static images and on frames of a video stream. Essentially, object detection means finding color and intensity non-uniformities which can be treated as physical objects. Beside that, the operations of finding coordinates, size and other characteristics of these non-uniformities that can be used to solve other computer vision related problems like object identification can be executed. In this paper, we study three algorithms which can be used to detect objects of different nature and are based on different approaches: detection of color non-uniformities, frame difference and feature detection. As the input data, we use a video stream which is obtained from a video camera or from an mp4 video file. Simulations and testing of the algoritms were done on a universal computer based on an open-source hardware, built on the Broadcom BCM2711, quad-core Cortex-A72 (ARM v8) 64-bit SoC processor with frequency 1,5GHz. The software was created in Visual Studio 2019 using OpenCV 4 on Windows 10 and on a universal computer operated under Linux (Raspbian Buster OS) for an open-source hardware. In the paper, the methods under consideration are compared. The results of the paper can be used in research and development of modern computer vision systems used for different purposes. Keywords: object detection, feature points, keypoints, ORB detector, computer vision, motion detection, HSV model color

Download Full-text

Human action recognition using simple geometric features and a finite state machine

Image Processing & Communications ◽

10.2478/v10248-012-0079-y ◽

2013 ◽

Vol 18 (2-3) ◽

pp. 49-60 ◽

Cited By ~ 2

Author(s):

Damian Dudzńiski ◽

Tomasz Kryjak ◽

Zbigniew Mikrut

Keyword(s):

Action Recognition ◽

Finite State Machine ◽

Recognition Rate ◽

Human Action Recognition ◽

Human Action ◽

Video Stream ◽

State Machine ◽

Recognition Algorithm ◽

Finite State ◽

Correct Recognition Rate

Abstract In this paper a human action recognition algorithm, which uses background generation with shadow elimination, silhouette description based on simple geometrical features and a finite state machine for recognizing particular actions is described. The performed tests indicate that this approach obtains a 81 % correct recognition rate allowing real-time image processing of a 360 X 288 video stream.

Download Full-text

An Efficient Radio Frequency Interference (RFI) Recognition and Characterization Using End-to-End Transfer Learning

Applied Sciences ◽

10.3390/app10196885 ◽

2020 ◽

Vol 10 (19) ◽

pp. 6885

Author(s):

Sahar Ujan ◽

Neda Navidi ◽

Rene Jr Landry

Keyword(s):

Feature Extraction ◽

Radio Frequency ◽

Transfer Learning ◽

Communication Networks ◽

Continuous Wave ◽

Critical Role ◽

Video Stream ◽

Radio Frequency Interference ◽

Wave Interference ◽

Continuous Wave Interference

Radio Frequency Interference (RFI) detection and characterization play a critical role in ensuring the security of all wireless communication networks. Advances in Machine Learning (ML) have led to the deployment of many robust techniques dealing with various types of RFI. To sidestep an unavoidable complicated feature extraction step in ML, we propose an efficient Deep Learning (DL)-based methodology using transfer learning to determine both the type of received signals and their modulation type. To this end, the scalogram of the received signals is used as the input of the pretrained convolutional neural networks (CNN), followed by a fully-connected classifier. This study considers a digital video stream as the signal of interest (SoI), transmitted in a real-time satellite-to-ground communication using DVB-S2 standards. To create the RFI dataset, the SoI is combined with three well-known jammers namely, continuous-wave interference (CWI), multi- continuous-wave interference (MCWI), and chirp interference (CI). This study investigated four well-known pretrained CNN architectures, namely, AlexNet, VGG-16, GoogleNet, and ResNet-18, for the feature extraction to recognize the visual RFI patterns directly from pixel images with minimal preprocessing. Moreover, the robustness of the proposed classifiers is evaluated by the data generated at different signal to noise ratios (SNR).

Download Full-text

DL-based segmentation of endoscopic scenes for mitral valve repair

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2020-0017 ◽

2020 ◽

Vol 6 (1) ◽

Author(s):

Matthias Ivantsits ◽

Lennart Tautz ◽

Simon Sündermann ◽

Isaac Wamala ◽

Jörg Kempfert ◽

...

Keyword(s):

Mitral Valve ◽

Minimally Invasive ◽

Mitral Valve Repair ◽

Video Stream ◽

Field Of View ◽

Valve Repair ◽

Distance Measurements ◽

Anatomical Structures ◽

Endoscopic Video ◽

Camera Position

AbstractMinimally invasive surgery is increasingly utilized for mitral valve repair and replacement. The intervention is performed with an endoscopic field of view on the arrested heart. Extracting the necessary information from the live endoscopic video stream is challenging due to the moving camera position, the high variability of defects, and occlusion of structures by instruments. During such minimally invasive interventions there is no time to segment regions of interest manually. We propose a real-time-capable deep-learning-based approach to detect and segment the relevant anatomical structures and instruments. For the universal deployment of the proposed solution, we evaluate them on pixel accuracy as well as distance measurements of the detected contours. The U-Net, Google’s DeepLab v3, and the Obelisk-Net models are cross-validated, with DeepLab showing superior results in pixel accuracy and distance measurements.

Download Full-text