Experimental modeling the flow of character recognition results in video stream for document recognition

Weighted combination of per-frame recognition results for text recognition in a video stream

Computer Optics ◽

10.18287/2412-6179-co-795 ◽

2021 ◽

Vol 45 (1) ◽

pp. 77-89

Author(s):

O. Petrova ◽

K. Bulatov ◽

V.V. Arlazarov ◽

V.L. Arlazarov

Keyword(s):

Video Stream ◽

Input Image ◽

Document Image ◽

Text Recognition ◽

Weighting Method ◽

Document Recognition ◽

Perspective Distortion ◽

Character Weighting ◽

Specialized Equipment ◽

Weighted Combination

The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.

Download Full-text

MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream

Computer Optics ◽

10.18287/2412-6179-2019-43-5-818-824 ◽

2019 ◽

Vol 43 (5) ◽

pp. 818-824 ◽

Cited By ~ 7

Author(s):

V.V. Arlazarov ◽

K. Bulatov ◽

T. Chernov ◽

V.L. Arlazarov

Keyword(s):

Mobile Devices ◽

Face Detection ◽

Data Extraction ◽

Personal Data ◽

Ground Truth ◽

Document Analysis ◽

Video Stream ◽

Text Line ◽

Document Recognition ◽

Identity Document

A lot of research has been devoted to identity documents analysis and recognition on mobile devices. However, no publicly available datasets designed for this particular problem currently exist. There are a few datasets which are useful for associated subtasks but in order to facilitate a more comprehensive scientific and technical approach to identity document recognition more specialized datasets are required. In this paper we present a Mobile Identity Document Video dataset (MIDV-500) consisting of 500 video clips for 50 different identity document types with ground truth which allows to perform research in a wide scope of document analysis problems. The paper presents characteristics of the dataset and evaluation results for existing methods of face detection, text line recognition, and document fields data extraction. Since an important feature of identity documents is their sensitiveness as they contain personal data, all source document images used in MIDV-500 are either in public domain or distributed under public copyright licenses. The main goal of this paper is to present a dataset. However, in addition and as a baseline, we present evaluation results for existing methods for face detection, text line recognition, and document data extraction, using the presented dataset.

Download Full-text

Enhanced Handwritten Document Recognition Using Confusion Matrix Analysis

10.3233/apc210131 ◽

2021 ◽

Author(s):

Umadevi T P ◽

Murugan A

Keyword(s):

Character Recognition ◽

Word Processing ◽

Noise Suppression ◽

Confusion Matrix ◽

Matrix Analysis ◽

Document Recognition ◽

Skew Correction ◽

Handwritten Document ◽

Attribute Extraction ◽

Global And Local

The handwritten Multilanguage phase is the preprocessing phase that improves the image quality for better identification in the system. The main goals of preprocessing are diodes, noise suppression and line cancellation. After word processing, various attribute extraction techniques are used to process attribute properties for the identification process. Smoothing plays an important role in character recognition. The partitioning process in the word distribution strategy can be divided into global and local texts. The writer does not use this header line to write the text which creates a problem for skew correction, classification and recognition. The dataset used are HWSC and TST1. The tensor flow method is used to estimate the consistency of confusion matrix for the enhancement of the text recognition .The accuracy of the proposed method is 98%.

Download Full-text

Modelling the Flow of Character Recognition Results in Video Stream

Bulletin of the South Ural State University Series Mathematical Modelling Programming and Computer Software ◽

10.14529/mmp180202 ◽

2018 ◽

Vol 11 (2) ◽

pp. 14-28 ◽

Cited By ~ 2

Author(s):

V.V. Arlazarov ◽

◽

O.A. Slavin ◽

A.V. Uskov ◽

I.M. Janiszewski ◽

...

Keyword(s):

Character Recognition ◽

Video Stream

Download Full-text

Algorithm for choosing the best frame in a video stream in the task of identity document recognition

Computer Optics ◽

10.18287/2412-6179-co-811 ◽

2021 ◽

Vol 45 (1) ◽

pp. 101-109

Author(s):

M.A. Aliev ◽

I.A. Kunina ◽

A.V. Kazbekov ◽

V.L. Arlazarov

Keyword(s):

Image Quality ◽

Recognition System ◽

Video Stream ◽

Document Image ◽

Document Recognition ◽

Identity Document ◽

Readable Form ◽

Recognition Systems ◽

Set Up

During the process of document recognition in a video stream using a mobile device camera, the image quality of the document varies greatly from frame to frame. Sometimes recognition system is required not only to recognize all the specified attributes of the document, but also to select final document image of the best quality. This is necessary, for example, for archiving or providing various services; in some countries it can be required by law. In this case, recognition system needs to assess the quality of frames in the video stream and choose the “best” frame. In this paper we considered the solution to such a problem where the “best” frame means the presence of all specified attributes in a readable form in the document image. The method was set up on a private dataset, and then tested on documents from the open MIDV-2019 dataset. A practically applicable result was obtained for use in recognition systems.

Download Full-text

Smart IDReader: Document Recognition in Video Stream

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) ◽

10.1109/icdar.2017.347 ◽

2017 ◽

Cited By ~ 17

Author(s):

Konstantin Bulatov ◽

Vladimir V. Arlazarov ◽

Timofey Chernov ◽

Oleg Slavin ◽

Dmitry Nikolaev

Keyword(s):

Video Stream ◽

Document Recognition

Download Full-text

Framework for rare event detection using Artificial Neural Network based context free grammar

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189164 ◽

2020 ◽

Vol 39 (6) ◽

pp. 8463-8475

Author(s):

Palanivel Srinivasan ◽

Manivannan Doraipandian

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Event Detection ◽

Performance Metrics ◽

Rare Events ◽

Rare Event ◽

Video Stream ◽

Context Free Grammar ◽

Artificial Neural ◽

Context Free

Rare event detections are performed using spatial domain and frequency domain-based procedures. Omnipresent surveillance camera footages are increasing exponentially due course the time. Monitoring all the events manually is an insignificant and more time-consuming process. Therefore, an automated rare event detection contrivance is required to make this process manageable. In this work, a Context-Free Grammar (CFG) is developed for detecting rare events from a video stream and Artificial Neural Network (ANN) is used to train CFG. A set of dedicated algorithms are used to perform frame split process, edge detection, background subtraction and convert the processed data into CFG. The developed CFG is converted into nodes and edges to form a graph. The graph is given to the input layer of an ANN to classify normal and rare event classes. Graph derived from CFG using input video stream is used to train ANN Further the performance of developed Artificial Neural Network Based Context-Free Grammar – Rare Event Detection (ACFG-RED) is compared with other existing techniques and performance metrics such as accuracy, precision, sensitivity, recall, average processing time and average processing power are used for performance estimation and analyzed. Better performance metrics values have been observed for the ANN-CFG model compared with other techniques. The developed model will provide a better solution in detecting rare events using video streams.

Download Full-text

SELECTION TECHNIQUE FOR MULTIPLE OUTPUTS OF OPTICAL CHARACTER RECOGNITION

Eurasian Journal of Mathematical and Computer Applications ◽

10.32523/2306-6172-2020-8-2-41-51 ◽

2020 ◽

Vol 8 (2) ◽

pp. 41-51

Author(s):

I.Q. Habeeb ◽

Z.Q. Al-Zaydi ◽

H.N. Abdulkhudhur

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Selection Technique ◽

Multiple Outputs ◽

Optical Character

Download Full-text

A Structured Method for the Recognition of Complex Historical Tables

History and Computing ◽

10.3366/hac.1997.9.1-3.58 ◽

1997 ◽

Vol 9 (1-3) ◽

pp. 58-77

Author(s):

Vitaly Kliatskine ◽

Eugene Shchepin ◽

Gunnar Thorvaldsen ◽

Konstantin Zingerman ◽

Valery Lazarev

Keyword(s):

Nineteenth Century ◽

Character Recognition ◽

Optical Character Recognition ◽

Complex Structure ◽

Source Material ◽

Historical Sources ◽

Tax Assessment ◽

Optical Character ◽

Algorithmic Model ◽

Machine Readable

In principle, printed source material should be made machine-readable with systems for Optical Character Recognition, rather than being typed once more. Offthe-shelf commercial OCR programs tend, however, to be inadequate for lists with a complex layout. The tax assessment lists that assess most nineteenth century farms in Norway, constitute one example among a series of valuable sources which can only be interpreted successfully with specially designed OCR software. This paper considers the problems involved in the recognition of material with a complex table structure, outlining a new algorithmic model based on ‘linked hierarchies’. Within the scope of this model, a variety of tables and layouts can be described and recognized. The ‘linked hierarchies’ model has been implemented in the ‘CRIPT’ OCR software system, which successfully reads tables with a complex structure from several different historical sources.

Download Full-text

An Ensemble Classifier for Offline Malayalam Character Recognition

3rd International Conference on Advances in Engineering Sciences and Applied Mathematics (ICAESAM’2015), March 23-24, 2015 London (UK) ◽

10.15242/iie.e0315001 ◽

2015 ◽

Keyword(s):

Character Recognition ◽

Ensemble Classifier

Download Full-text