scholarly journals Automatic CNN-Based Arabic Numeral Spotting and Handwritten Digit Recognition by Using Deep Transfer Learning in Ottoman Population Registers

2020 ◽  
Vol 10 (16) ◽  
pp. 5430 ◽  
Author(s):  
Yekta Said Can ◽  
M. Erdem Kabadayı

Historical manuscripts and archival documentation are handwritten texts which are the backbone sources for historical inquiry. Recent developments in the digital humanities field and the need for extracting information from the historical documents have fastened the digitization processes. Cutting edge machine learning methods are applied to extract meaning from these documents. Page segmentation (layout analysis), keyword, number and symbol spotting, handwritten text recognition algorithms are tested on historical documents. For most of the languages, these techniques are widely studied and high performance techniques are developed. However, the properties of Arabic scripts (i.e., diacritics, varying script styles, diacritics, and ligatures) create additional problems for these algorithms and, therefore, the number of research is limited. In this research, we first automatically spotted the Arabic numerals from the very first series of population registers of the Ottoman Empire conducted in the mid-nineteenth century and recognized these numbers. They are important because they held information about the number of households, registered individuals and ages of individuals. We applied a red color filter to separate numerals from the document by taking advantage of the structure of the studied registers (numerals are written in red). We first used a CNN-based segmentation method for spotting these numerals. In the second part, we annotated a local Arabic handwritten digit dataset from the spotted numerals by selecting uni-digit ones and tested the Deep Transfer Learning method from large open Arabic handwritten digit datasets for digit recognition. We achieved promising results for recognizing digits in these historical documents.

2020 ◽  
Vol 6 (5) ◽  
pp. 32 ◽  
Author(s):  
Yekta Said Can ◽  
M. Erdem Kabadayı

Historical document analysis systems gain importance with the increasing efforts in the digitalization of archives. Page segmentation and layout analysis are crucial steps for such systems. Errors in these steps will affect the outcome of handwritten text recognition and Optical Character Recognition (OCR) methods, which increase the importance of the page segmentation and layout analysis. Degradation of documents, digitization errors, and varying layout styles are the issues that complicate the segmentation of historical documents. The properties of Arabic scripts such as connected letters, ligatures, diacritics, and different writing styles make it even more challenging to process Arabic script historical documents. In this study, we developed an automatic system for counting registered individuals and assigning them to populated places by using a CNN-based architecture. To evaluate the performance of our system, we created a labeled dataset of registers obtained from the first wave of population registers of the Ottoman Empire held between the 1840s and 1860s. We achieved promising results for classifying different types of objects and counting the individuals and assigning them to populated places.


Author(s):  
Rami S. Alkhawaldeh ◽  
Moatsum Alawida ◽  
Nawaf Farhan Funkur Alshdaifat ◽  
Wafa’ Za’al Alma’aitah ◽  
Ammar Almasri

Author(s):  
Yojana Swapneel Samant

The human race has shown a huge interest in machines over the years and has developed and advanced to a very large extent in this domain. Starting from the object identification and classification through pictures to editing for the captured image or video everything can be performed through machines and advanced systems, one such part of this advanced technology is deep learning or machine learning. which comes with its own individual set of modules, algorithms, and techniques. Similar to this domain one such idea which was discovered is handwritten digit recognition. This is one of such areas where development and research occur around the numerical also known as digits, where a number of possibilities, permutations, and combinations are attained to gain accurate results this can also be perceived as the ability of computers to interpret and understand the given input which is through number plates, photographs, documents or can be in a handwritten format and to convert it in digital format as an output through screens.


2020 ◽  
Vol 8 (6) ◽  
pp. 1187-1190

Arabic is the most widely used language in the world, especially the Arab League Country. Of course, in those countries often use Arabic numeral in banks and business applications, postal zip code and data entry application. This research has focused on handwriting recognition of Arabic numeral that has unlimited variation in human handwriting such as style and shape. The proposed method on the deep learning technique is Convolutional Neural Network. LeNet-5 architect also used in training and recognizing the handwritten image of Arabic numeral as much as 70000 images derived from MADbase dataset. The experimental result on 10000 images of database used is by comparing the number of epoch in training process yields, and the average accuracy is 97.67%.


Sign in / Sign up

Export Citation Format

Share Document