scholarly journals A New Scheme for Text Line and Character Segmentation from Gray Scale Images of Palm Leaf Manuscript

Author(s):  
Made Windu Antara Kesiman ◽  
Jean-Christophe Burie ◽  
Jean-Marc Ogier
1996 ◽  
Vol 18 (10) ◽  
pp. 1045-1050 ◽  
Author(s):  
Seong-Whan Lee ◽  
Dong-June Lee ◽  
Hee-Seon Park

2016 ◽  
Vol 26 (1) ◽  
pp. 011011 ◽  
Author(s):  
Made Windu Antara Kesiman ◽  
Dona Valy ◽  
Jean-Christophe Burie ◽  
Erick Paulus ◽  
I. Made Gede Sunarya ◽  
...  

An optimality of an automatic character recognition for Tamil palm leaf manuscripts can be achieved only by an efficient segmentation of touching characters. In this article, the touching characters are segmented as a single character to achieve an optimum solution by the recognizer in Optical Character Recognition (OCR). The proposed method provides a novelty in touching character segmentation of Tamil palm leaf manuscripts. An initial process of separation of background image and foreground characters is applied on the palm leaf images by filtering and removing unwanted pieces of characters by noise removal methods. The thickening process overcomes the difficulty of small breakages in the characters. The aspect ratio of the character image can be used to categorize the character such as single or multi touching. Single touching is divided by yet another ways such as horizontal or vertical touching. Finally, the proposed algorithm for Horizontal and Vertical character segmentation named as HorVer method is applied on the horizontally and vertically touching characters to segment as independent character. Experimental result produces 91% of an accuracy on segmenting the touching characters in Tamil palm leaf manuscript images collected from various resources and Tamil Heritage Foundation (THF). A novelty method can be achieved in Tamil touching character segmentation by the proposed algorithm.


Author(s):  
P. Soujanya ◽  
Vijaya Kumar Koppula ◽  
Kishore Gaddam

Segmentation of text lines is one of the important steps in the Optical Character Recognition system. Text Line Segmentation is pre-processing step of word and character segmentation. Text Line Segmentation can be viewed simple for printing documents which contains distinct spaces between the lines. And it is more complex for the documents where text lines are overlap, touch, curvilinear and variation of space between text lines like in Telugu scripts and skewed documents. The main objective of this project is to investigate different text line segmentation algorithms like Projection Profiles, Run length smearing and Adaptive Run length smearing on low quality documents. These methods are experimented and compare their accuracy and results.


The process of an Optical Character Recognition (OCR) for ancient hand written documents or palm leaf manuscripts is done by means of four phases. The four phases are ‘line segmentation’, ‘word segmentation’, ‘character segmentation’, and ‘character recognition’. The colour image of palm leaf manuscripts are changed into binary images by using various pre-processing methods. The first phase of an OCR might break through the hurdles of touching lines and overlapping lines. The character recognition becomes futile when the line segmentation is erroneous. In Tamil language palm leaf manuscript recognition, there are only a handful of line segmentation methods. Moreover, the available methods are not viable to meet the required standards. This article is proposed to fill the lacuna in terms of the methods necessary for line segmentation in Tamil language document analysis. The method proposed compares its efficiency with the line segmentation algorithms work on binary images such as the Adaptive Partial Projection (APP) and A* Path Planning (A*PP). The tools and criteria of evaluation metrics are measured from ICDAR 2013 Handwriting Segmentation Contest.


Sign in / Sign up

Export Citation Format

Share Document