Performance of Document Image OCR Systems for Recognizing Video Texts on Embedded Platform

Author(s):  
Tanushyam Chattopadhyay ◽  
Priyanka Sinha ◽  
Provat Biswas
2019 ◽  
Vol 2 (3) ◽  
pp. 206-215
Author(s):  
Alesya Ishchenko ◽  
Alexandr Nesteryuk ◽  
Marina Polyakova

2020 ◽  
Vol 2020 (9) ◽  
pp. 323-1-323-8
Author(s):  
Litao Hu ◽  
Zhenhua Hu ◽  
Peter Bauer ◽  
Todd J. Harris ◽  
Jan P. Allebach

Image quality assessment has been a very active research area in the field of image processing, and there have been numerous methods proposed. However, most of the existing methods focus on digital images that only or mainly contain pictures or photos taken by digital cameras. Traditional approaches evaluate an input image as a whole and try to estimate a quality score for the image, in order to give viewers an idea of how “good” the image looks. In this paper, we mainly focus on the quality evaluation of contents of symbols like texts, bar-codes, QR-codes, lines, and hand-writings in target images. Estimating a quality score for this kind of information can be based on whether or not it is readable by a human, or recognizable by a decoder. Moreover, we mainly study the viewing quality of the scanned document of a printed image. For this purpose, we propose a novel image quality assessment algorithm that is able to determine the readability of a scanned document or regions in a scanned document. Experimental results on some testing images demonstrate the effectiveness of our method.


2020 ◽  
Vol 64 (3) ◽  
pp. 30401-1-30401-14 ◽  
Author(s):  
Chih-Hsien Hsia ◽  
Ting-Yu Lin ◽  
Jen-Shiun Chiang

Abstract In recent years, the preservation of handwritten historical documents and scripts archived by digitized images has been gradually emphasized. However, the selection of different thicknesses of the paper for printing or writing is likely to make the content of the back page seep into the front page. In order to solve this, a cost-efficient document image system is proposed. In this system, the authors use Adaptive Directional Lifting-Based Discrete Wavelet Transform to transform image data from spatial domain to frequency domain and perform on high and low frequencies, respectively. For low frequencies, the authors use local threshold to remove most background information. For high frequencies, they use modified Least Mean Square training algorithm to produce a unique weighted mask and perform convolution on original frequency, respectively. Afterward, Inverse Adaptive Directional Lifting-Based Discrete Wavelet Transform is performed to reconstruct the four subband images to a resulting image with original size. Finally, a global binarization method, Otsu’s method, is applied to transform a gray scale image to a binary image as the output result. The results show that the difference in operation time of this work between a personal computer (PC) and Raspberry Pi is little. Therefore, the proposed cost-efficient document image system which performed on Raspberry Pi embedded platform has the same performance and obtains the same results as those performed on a PC.


2017 ◽  
Vol 6 (1) ◽  
pp. 10
Author(s):  
SHREEKANTH T. ◽  
SOWRABHU D ◽  
◽  

2021 ◽  
Vol 2021 (1) ◽  
Author(s):  
Wei Xiong ◽  
Lei Zhou ◽  
Ling Yue ◽  
Lirong Li ◽  
Song Wang

AbstractBinarization plays an important role in document analysis and recognition (DAR) systems. In this paper, we present our winning algorithm in ICFHR 2018 competition on handwritten document image binarization (H-DIBCO 2018), which is based on background estimation and energy minimization. First, we adopt mathematical morphological operations to estimate and compensate the document background. It uses a disk-shaped structuring element, whose radius is computed by the minimum entropy-based stroke width transform (SWT). Second, we perform Laplacian energy-based segmentation on the compensated document images. Finally, we implement post-processing to preserve text stroke connectivity and eliminate isolated noise. Experimental results indicate that the proposed method outperforms other state-of-the-art techniques on several public available benchmark datasets.


Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 3961
Author(s):  
Daniela De Venuto ◽  
Giovanni Mezzina

In this paper, we propose a breakthrough single-trial P300 detector that maximizes the information translate rate (ITR) of the brain–computer interface (BCI), keeping high recognition accuracy performance. The architecture, designed to improve the portability of the algorithm, demonstrated full implementability on a dedicated embedded platform. The proposed P300 detector is based on the combination of a novel pre-processing stage based on the EEG signals symbolization and an autoencoded convolutional neural network (CNN). The proposed system acquires data from only six EEG channels; thus, it treats them with a low-complexity preprocessing stage including baseline correction, windsorizing and symbolization. The symbolized EEG signals are then sent to an autoencoder model to emphasize those temporal features that can be meaningful for the following CNN stage. This latter consists of a seven-layer CNN, including a 1D convolutional layer and three dense ones. Two datasets have been analyzed to assess the algorithm performance: one from a P300 speller application in BCI competition III data and one from self-collected data during a fluid prototype car driving experiment. Experimental results on the P300 speller dataset showed that the proposed method achieves an average ITR (on two subjects) of 16.83 bits/min, outperforming by +5.75 bits/min the state-of-the-art for this parameter. Jointly with the speed increase, the recognition performance returned disruptive results in terms of the harmonic mean of precision and recall (F1-Score), which achieve 51.78 ± 6.24%. The same method used in the prototype car driving led to an ITR of ~33 bit/min with an F1-Score of 70.00% in a single-trial P300 detection context, allowing fluid usage of the BCI for driving purposes. The realized network has been validated on an STM32L4 microcontroller target, for complexity and implementation assessment. The implementation showed an overall resource occupation of 5.57% of the total available ROM, ~3% of the available RAM, requiring less than 3.5 ms to provide the classification outcome.


2021 ◽  
Vol 11 (15) ◽  
pp. 7169
Author(s):  
Mohamed Allouche ◽  
Tarek Frikha ◽  
Mihai Mitrea ◽  
Gérard Memmi ◽  
Faten Chaabane

To bridge the current gap between the Blockchain expectancies and their intensive computation constraints, the present paper advances a lightweight processing solution, based on a load-balancing architecture, compatible with the lightweight/embedding processing paradigms. In this way, the execution of complex operations is securely delegated to an off-chain general-purpose computing machine while the intimate Blockchain operations are kept on-chain. The illustrations correspond to an on-chain Tezos configuration and to a multiprocessor ARM embedded platform (integrated into a Raspberry Pi). The performances are assessed in terms of security, execution time, and CPU consumption when achieving a visual document fingerprint task. It is thus demonstrated that the advanced solution makes it possible for a computing intensive application to be deployed under severely constrained computation and memory resources, as set by a Raspberry Pi 3. The experimental results show that up to nine Tezos nodes can be deployed on a single Raspberry Pi 3 and that the limitation is not derived from the memory but from the computation resources. The execution time with a limited number of fingerprints is 40% higher than using a classical PC solution (value computed with 95% relative error lower than 5%).


Author(s):  
Himansh Mulchandani ◽  
Raghunandan Betha ◽  
Jinali Bagadia ◽  
Mitalee Garg ◽  
Chirag N. Paunwala ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document