TAMIZHİ: Historical Tamil-Brahmi Script Recognition Using CNN and MobileNet
Computational epigraphy is the study of an ancient script where the computer science and mathematical model is relatively built for epigraphy. The Tamil-Brahmi inscriptions are the most ancient of the extant written of the Tamil. The inscriptions furnish valuable information on many aspects of life in the ancient Tamil country from a period anterior to the literary age of Sangam. The recognition of the script and systematic analysis of the script is required. The recognition of this script is complex, containing various curves for a single character and the style of writing overlap with curves and lines. Generating corpus of the script is necessary, since it is the initial step for computational epigraphy. The archaeological department has supported the raw data that helped to develop a corpus of Tamizhi. In this article, we have implemented a convolution neural network in various ways, i.e., (i) Training the CNN model from scratch a Softmax classifier in a sequential model (ii) using MobileNet: Transfer learning paradigm from a pre-trained model on a Tamizhi dataset (iii) Building Model with CNN and SVM (iv) SVM for evaluation of best accuracy to recognize handwritten Brahmi characters. To train the CNN Model an extensive TAMIZHİ handwritten Brahmi Dataset of 1lakh and 90,000 isolated samples for the character has been created and deployed. The designed dataset consists of 9 vowels and 18 consonants and 209 class so researchers can use machine learning. MobileNet outperformed among all the models implemented with the accuracy of 68.3%, whereas other algorithm ranges from 58% to 67% with respect to the Tamizhi dataset. MobileNet model is trained and tested for the dataset of vowels (8 class), consonants (18 class), and consonants vowels (26 class) with the accuracy of 98.1%, 97.7%, 97.5%, respectively.