scholarly journals Chinese Character Boxes: Single Shot Detector Network for Chinese Character Detection

2019 ◽  
Vol 9 (2) ◽  
pp. 315 ◽  
Author(s):  
Junhwan Ryu ◽  
Sungho Kim

This paper proposes a deep learning-based Chinese character detection network which is important for character recognition and translation. Detecting the correct character area is an important part of recognition and translation. Previous studies have focused on methods using projection through image pre-processing and recognition methods based on segmentation and methods using hand-crafted features such as analyzing and using features. Unfortunately, the results are vulnerable to noise. Recently, recognition or translation systems based on deep learning were dealt with as a single step from detection to translation but they failed to consider the inaccurate localization problem that arises in detectors. This paper proposes a Chinese character boxes (CCB) network that deals with a method to detect the character area more accurately using the single-shot multibox detector (SSD) as the baseline and called CCB-SSD. The proposed CCB-SSD network has a single prediction layer structure in which unnecessary layers are removed from the feature-pyramid structure. The augmentation method for training is introduced and the problem caused by the use of default boxes is solved by using the proposed non-maximum suppression (NMS). The experimental results revealed a 96.1% detection rate and 0.89 performance against the false positives per character (FPPC) which is the proposed false positive index for the character data-set and caoshu data-set used in this paper. This method showed better performance than the conventional SSD with 69.4% and 6.57 FPPC.

2020 ◽  
Vol 17 (3) ◽  
pp. 299-305 ◽  
Author(s):  
Riaz Ahmad ◽  
Saeeda Naz ◽  
Muhammad Afzal ◽  
Sheikh Rashid ◽  
Marcus Liwicki ◽  
...  

This paper presents a deep learning benchmark on a complex dataset known as KFUPM Handwritten Arabic TexT (KHATT). The KHATT data-set consists of complex patterns of handwritten Arabic text-lines. This paper contributes mainly in three aspects i.e., (1) pre-processing, (2) deep learning based approach, and (3) data-augmentation. The pre-processing step includes pruning of white extra spaces plus de-skewing the skewed text-lines. We deploy a deep learning approach based on Multi-Dimensional Long Short-Term Memory (MDLSTM) networks and Connectionist Temporal Classification (CTC). The MDLSTM has the advantage of scanning the Arabic text-lines in all directions (horizontal and vertical) to cover dots, diacritics, strokes and fine inflammation. The data-augmentation with a deep learning approach proves to achieve better and promising improvement in results by gaining 80.02% Character Recognition (CR) over 75.08% as baseline.


2020 ◽  
Vol 10 (23) ◽  
pp. 8625
Author(s):  
Yali Song ◽  
Yinghong Wen

In the positioning process of a high-speed train, cumulative error may result in a reduction in the positioning accuracy. The assisted positioning technology based on kilometer posts can be used as an effective method to correct the cumulative error. However, the traditional detection method of kilometer posts is time-consuming and complex, which greatly affects the correction efficiency. Therefore, in this paper, a kilometer post detection model based on deep learning is proposed. Firstly, the Deep Convolutional Generative Adversarial Networks (DCGAN) algorithm is introduced to construct an effective kilometer post data set. This greatly reduces the cost of real data acquisition and provides a prerequisite for the construction of the detection model. Then, by using the existing optimization as a reference and further simplifying the design of the Single Shot multibox Detector (SSD) model according to the specific application scenario of this paper, the kilometer post detection model based on an improved SSD algorithm is established. Finally, from the analysis of the experimental results, we know that the detection model established in this paper ensures both detection accuracy and efficiency. The accuracy of our model reached 98.92%, while the detection time was only 35.43 ms. Thus, our model realizes the rapid and accurate detection of kilometer posts and improves the assisted positioning technology based on kilometer posts by optimizing the detection method.


2021 ◽  
Vol 103 (1) ◽  
Author(s):  
Tiago Almeida ◽  
Vitor Santos ◽  
Oscar Martinez Mozos ◽  
Bernardo Lourenço

AbstractData Matrix patterns imprinted as passive visual landmarks have shown to be a valid solution for the self-localization of Automated Guided Vehicles (AGVs) in shop floors. However, existing Data Matrix decoding applications take a long time to detect and segment the markers in the input image. Therefore, this paper proposes a pipeline where the detector is based on a real-time Deep Learning network and the decoder is a conventional method, i.e. the implementation in libdmtx. To do so, several types of Deep Neural Networks (DNNs) for object detection were studied, trained, compared, and assessed. The architectures range from region proposals (Faster R-CNN) to single-shot methods (SSD and YOLO). This study focused on performance and processing time to select the best Deep Learning (DL) model to carry out the detection of the visual markers. Additionally, a specific data set was created to evaluate those networks. This test set includes demanding situations, such as high illumination gradients in the same scene and Data Matrix markers positioned in skewed planes. The proposed approach outperformed the best known and most used Data Matrix decoder available in libraries like libdmtx.


2020 ◽  
Vol 222 (1) ◽  
pp. 247-259 ◽  
Author(s):  
Davood Moghadas

SUMMARY Conventional geophysical inversion techniques suffer from several limitations including computational cost, nonlinearity, non-uniqueness and dimensionality of the inverse problem. Successful inversion of geophysical data has been a major challenge for decades. Here, a novel approach based on deep learning (DL) inversion via convolutional neural network (CNN) is proposed to instantaneously estimate subsurface electrical conductivity (σ) layering from electromagnetic induction (EMI) data. In this respect, a fully convolutional network was trained on a large synthetic data set generated based on 1-D EMI forward model. The accuracy of the proposed approach was examined using several synthetic scenarios. Moreover, the trained network was used to find subsurface electromagnetic conductivity images (EMCIs) from EMI data measured along two transects from Chicken Creek catchment (Brandenburg, Germany). Dipole–dipole electrical resistivity tomography data were measured as well to obtain reference subsurface σ distributions down to a 6 m depth. The inversely estimated models were juxtaposed and compared with their counterparts obtained from a spatially constrained deterministic algorithm as a standard code. Theoretical simulations demonstrated a well performance of the algorithm even in the presence of noise in data. Moreover, application of the DL inversion for subsurface imaging from Chicken Creek catchment manifested the accuracy and robustness of the proposed approach for EMI inversion. This approach returns subsurface σ distribution directly from EMI data in a single step without any iterations. The proposed strategy simplifies considerably EMI inversion and allows for rapid and accurate estimation of subsurface EMCI from multiconfiguration EMI data.


2019 ◽  
Vol 89 (6) ◽  
pp. 903-909 ◽  
Author(s):  
Ji-Hoon Park ◽  
Hye-Won Hwang ◽  
Jun-Ho Moon ◽  
Youngsung Yu ◽  
Hansuk Kim ◽  
...  

ABSTRACT Objective: To compare the accuracy and computational efficiency of two of the latest deep-learning algorithms for automatic identification of cephalometric landmarks. Materials and Methods: A total of 1028 cephalometric radiographic images were selected as learning data that trained You-Only-Look-Once version 3 (YOLOv3) and Single Shot Multibox Detector (SSD) methods. The number of target labeling was 80 landmarks. After the deep-learning process, the algorithms were tested using a new test data set composed of 283 images. Accuracy was determined by measuring the point-to-point error and success detection rate and was visualized by drawing scattergrams. The computational time of both algorithms was also recorded. Results: The YOLOv3 algorithm outperformed SSD in accuracy for 38 of 80 landmarks. The other 42 of 80 landmarks did not show a statistically significant difference between YOLOv3 and SSD. Error plots of YOLOv3 showed not only a smaller error range but also a more isotropic tendency. The mean computational time spent per image was 0.05 seconds and 2.89 seconds for YOLOv3 and SSD, respectively. YOLOv3 showed approximately 5% higher accuracy compared with the top benchmarks in the literature. Conclusions: Between the two latest deep-learning methods applied, YOLOv3 seemed to be more promising as a fully automated cephalometric landmark identification system for use in clinical practice.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Xiaoli Ma ◽  
Hongyan Xu ◽  
Xiaoqian Zhang ◽  
Haoyong Wang

With the rapid development of artificial intelligence technology, multitasking textual translation has attracted more and more attention. Especially after the application of deep learning technology, the performance of multitask translation text detection and recognition has been greatly improved. However, because multitasking contains the interference problem faced by the translated text, there is a big gap between recognition performance and actual application requirements. Aiming at multitasking and translation text detection, this paper proposes a text localization method based on multichannel multiscale detection of the largest stable extreme value region and cascade filtering. This paper selects the appropriate color channel and scale to extract the maximum stable extreme value area as the character candidate area and designs a cascaded filter from coarse to fine to remove false detections. The coarse filter is based on some simple morphological features and stroke width features, and the fine filter is trained by a two-recognition convolutional neural network. The remaining character candidate regions are merged into horizontal or multidirectional character strings through the graph model. The experimental results on the text data set prove the effectiveness of the improved deep learning network character model and the feasibility of the textual implication translation analysis method based on this model. Among them, the text contains translation character recognition results prove that the model has good description ability. The characteristics of the model determine that this method is not sensitive to the scale of the sliding window, so it performs better than the existing typical methods in retrieval tasks.


Sign in / Sign up

Export Citation Format

Share Document