Experimental modeling the flow of character recognition results in video stream for document recognition

Author(s):  
Elena Andreeva ◽  
Vladimir V. Arlazarov ◽  
Oleg Slavin ◽  
Igor Janiszewski
2021 ◽  
Vol 45 (1) ◽  
pp. 77-89
Author(s):  
O. Petrova ◽  
K. Bulatov ◽  
V.V. Arlazarov ◽  
V.L. Arlazarov

The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.


2019 ◽  
Vol 43 (5) ◽  
pp. 818-824 ◽  
Author(s):  
V.V. Arlazarov ◽  
K. Bulatov ◽  
T. Chernov ◽  
V.L. Arlazarov

A lot of research has been devoted to identity documents analysis and recognition on mobile devices. However, no publicly available datasets designed for this particular problem currently exist. There are a few datasets which are useful for associated subtasks but in order to facilitate a more comprehensive scientific and technical approach to identity document recognition more specialized datasets are required. In this paper we present a Mobile Identity Document Video dataset (MIDV-500) consisting of 500 video clips for 50 different identity document types with ground truth which allows to perform research in a wide scope of document analysis problems. The paper presents characteristics of the dataset and evaluation results for existing methods of face detection, text line recognition, and document fields data extraction. Since an important feature of identity documents is their sensitiveness as they contain personal data, all source document images used in MIDV-500 are either in public domain or distributed under public copyright licenses. The main goal of this paper is to present a dataset. However, in addition and as a baseline, we present evaluation results for existing methods for face detection, text line recognition, and document data extraction, using the presented dataset.


2021 ◽  
Author(s):  
Umadevi T P ◽  
Murugan A

The handwritten Multilanguage phase is the preprocessing phase that improves the image quality for better identification in the system. The main goals of preprocessing are diodes, noise suppression and line cancellation. After word processing, various attribute extraction techniques are used to process attribute properties for the identification process. Smoothing plays an important role in character recognition. The partitioning process in the word distribution strategy can be divided into global and local texts. The writer does not use this header line to write the text which creates a problem for skew correction, classification and recognition. The dataset used are HWSC and TST1. The tensor flow method is used to estimate the consistency of confusion matrix for the enhancement of the text recognition .The accuracy of the proposed method is 98%.


Author(s):  
V.V. Arlazarov ◽  
◽  
O.A. Slavin ◽  
A.V. Uskov ◽  
I.M. Janiszewski ◽  
...  

2021 ◽  
Vol 45 (1) ◽  
pp. 101-109
Author(s):  
M.A. Aliev ◽  
I.A. Kunina ◽  
A.V. Kazbekov ◽  
V.L. Arlazarov

During the process of document recognition in a video stream using a mobile device camera, the image quality of the document varies greatly from frame to frame. Sometimes recognition system is required not only to recognize all the specified attributes of the document, but also to select final document image of the best quality. This is necessary, for example, for archiving or providing various services; in some countries it can be required by law. In this case, recognition system needs to assess the quality of frames in the video stream and choose the “best” frame. In this paper we considered the solution to such a problem where the “best” frame means the presence of all specified attributes in a readable form in the document image. The method was set up on a private dataset, and then tested on documents from the open MIDV-2019 dataset. A practically applicable result was obtained for use in recognition systems.


Author(s):  
Konstantin Bulatov ◽  
Vladimir V. Arlazarov ◽  
Timofey Chernov ◽  
Oleg Slavin ◽  
Dmitry Nikolaev

2020 ◽  
Vol 39 (6) ◽  
pp. 8463-8475
Author(s):  
Palanivel Srinivasan ◽  
Manivannan Doraipandian

Rare event detections are performed using spatial domain and frequency domain-based procedures. Omnipresent surveillance camera footages are increasing exponentially due course the time. Monitoring all the events manually is an insignificant and more time-consuming process. Therefore, an automated rare event detection contrivance is required to make this process manageable. In this work, a Context-Free Grammar (CFG) is developed for detecting rare events from a video stream and Artificial Neural Network (ANN) is used to train CFG. A set of dedicated algorithms are used to perform frame split process, edge detection, background subtraction and convert the processed data into CFG. The developed CFG is converted into nodes and edges to form a graph. The graph is given to the input layer of an ANN to classify normal and rare event classes. Graph derived from CFG using input video stream is used to train ANN Further the performance of developed Artificial Neural Network Based Context-Free Grammar – Rare Event Detection (ACFG-RED) is compared with other existing techniques and performance metrics such as accuracy, precision, sensitivity, recall, average processing time and average processing power are used for performance estimation and analyzed. Better performance metrics values have been observed for the ANN-CFG model compared with other techniques. The developed model will provide a better solution in detecting rare events using video streams.


1997 ◽  
Vol 9 (1-3) ◽  
pp. 58-77
Author(s):  
Vitaly Kliatskine ◽  
Eugene Shchepin ◽  
Gunnar Thorvaldsen ◽  
Konstantin Zingerman ◽  
Valery Lazarev

In principle, printed source material should be made machine-readable with systems for Optical Character Recognition, rather than being typed once more. Offthe-shelf commercial OCR programs tend, however, to be inadequate for lists with a complex layout. The tax assessment lists that assess most nineteenth century farms in Norway, constitute one example among a series of valuable sources which can only be interpreted successfully with specially designed OCR software. This paper considers the problems involved in the recognition of material with a complex table structure, outlining a new algorithmic model based on ‘linked hierarchies’. Within the scope of this model, a variety of tables and layouts can be described and recognized. The ‘linked hierarchies’ model has been implemented in the ‘CRIPT’ OCR software system, which successfully reads tables with a complex structure from several different historical sources.


Sign in / Sign up

Export Citation Format

Share Document