Deep Strip-Based Network with Cascade Learning for Scene Text Localization

Scene text localization is a very crucial step in the issue of scene text recognition. The major challenges—such as how there are various sizes, shapes, unpredictable orientations, a wide range of colors and styles, occlusion, and local and global illumination variations—make the problem different from generic object detection. Unlike existing scene text localization methods, here we present a segmentation-based text detector which can detect an arbitrary shaped scene text by using polygon offsetting, combined with the border augmentation. This technique better distinguishes contiguous and arbitrary shaped text instances from nearby non-text regions. The quantitative experimental results on public benchmarks, ICDAR2015, ICDAR2017-MLT, ICDAR2019-MLT, and Total-Text datasets demonstrate the performance and robustness of our proposed method, compared to previous approaches which have been proposed.

Download Full-text

Canny Text Detector: Fast and Robust Scene Text Localization Algorithm

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr.2016.388 ◽

2016 ◽

Cited By ~ 48

Author(s):

Hojin Cho ◽

Myungchul Sung ◽

Bongjin Jun

Keyword(s):

Localization Algorithm ◽

Text Localization ◽

Scene Text

Download Full-text

Multi-Lingual Text Localization via Language-Specific Convolutional Neural Networks

10.5753/sibgrapi.est.2019.8333 ◽

2019 ◽

Author(s):

Jhonatas Santos de Jesus Conceição ◽

Allan Pinto ◽

Luis Decker ◽

Jose Luis Flores Campana ◽

Manuel Cordova Neira ◽

...

Keyword(s):

Specific Model ◽

Input Image ◽

Classification Model ◽

Text Localization ◽

Wide Range ◽

Scene Text ◽

Color Complex ◽

Network Method ◽

Candidate Regions ◽

Multiple Languages

Scene text localization and recognition is a topic in computer vision that aims to delimit candidate regions in an input image containing incidental scene text elements. The challenge of this research consists in devising detectors capable of dealing with a wide range of variability, such as font size, font style, color, complex background, text in different languages, among others. This work presents a comparison between two strategies of building classification models, based on a Convolution Neural Network method, to detect textual elements in multiple languages in images: (i) classification model built on a multi-lingual training scenario; and (ii) classification model built on a language-specific training scenario. The experiments designed in this work indicate that language-specific model outperforms the classification model trained over a multi-lingual scenario, with an improvement of 14.79%, 8.94%, and 11.43%, in terms of precision, recall, and F-measure values, respectively.

Download Full-text