scholarly journals DetReco: Object-Text Detection and Recognition Based on Deep Neural Network

2020 ◽  
Vol 2020 ◽  
pp. 1-15
Author(s):  
Fan Zhang ◽  
Jiaxing Luan ◽  
Zhichao Xu ◽  
Wei Chen

Deep learning-based object detection method has been applied in various fields, such as ITS (intelligent transportation systems) and ADS (autonomous driving systems). Meanwhile, text detection and recognition in different scenes have also attracted much attention and research effort. In this article, we propose a new object-text detection and recognition method termed “DetReco” to detect objects and texts and recognize the text contents. The proposed method is composed of object-text detection network and text recognition network. YOLOv3 is used as the algorithm for the object-text detection task and CRNN is employed to deal with the text recognition task. We combine the datasets of general objects and texts together to train the networks. At test time, the detection network detects various objects in an image. Then, the text images are passed to the text recognition network to derive the text contents. The experiments show that the proposed method achieves 78.3 mAP (mean Average Precision) for general objects and 72.8 AP (Average Precision) for texts in regard to detection performance. Furthermore, the proposed method is able to detect and recognize affine transformed or occluded texts with robustness. In addition, for the texts detected around general objects, the text contents can be used as the identifier to distinguish the object.

2021 ◽  
Vol 2137 (1) ◽  
pp. 012022
Author(s):  
Da Lu ◽  
Jia Liu ◽  
Helong Li

Abstract Recognizing irregular text in real industrial scenes is a challenging task due to the background clutter, low resolutions or distortions. In this work, an attention-based text detection and recognition method for terminals of current transformer’s secondary circuit is proposed. It consists of three major components: pre-processing, text detection and text recognition. In text recognition module, a novel spatial temporal embedding is designed to better utilize the positional information. During training, the proposed framework only requires sequence-level annotations, instead of extra fine-grained character-level boxes or segmentation masks as in previous work. Despite its simplicity, the proposed method achieves good performance on the dataset collected in actual working scene.


Author(s):  
Victor J. D. Tsai ◽  
Jyun-Han Chen ◽  
Hsun-Sheng Huang

Traffic sign detection and recognition (TSDR) has drawn considerable attention on developing intelligent transportation systems (ITS) and autonomous vehicle driving systems (AVDS) since 1980’s. Unlikely to the general TSDR systems that deal with real-time images captured by the in-vehicle cameras, this research aims on developing techniques for detecting, extracting, and positioning of traffic signs from Google Street View (GSV) images along user-selected routes for low-cost, volumetric and quick establishment of the traffic sign infrastructural database that may be associated with Google Maps. The framework and techniques employed in the proposed system are described.


Electronics ◽  
2020 ◽  
Vol 9 (4) ◽  
pp. 560 ◽  
Author(s):  
Amira Mimouna ◽  
Ihsen Alouani ◽  
Anouar Ben Khalifa ◽  
Yassin El Hillali ◽  
Abdelmalik Taleb-Ahmed ◽  
...  

A reliable environment perception is a crucial task for autonomous driving, especially in dense traffic areas. Recent improvements and breakthroughs in scene understanding for intelligent transportation systems are mainly based on deep learning and the fusion of different modalities. In this context, we introduce OLIMP: A heterOgeneous Multimodal Dataset for Advanced EnvIronMent Perception. This is the first public, multimodal and synchronized dataset that includes UWB radar data, acoustic data, narrow-band radar data and images. OLIMP comprises 407 scenes and 47,354 synchronized frames, presenting four categories: pedestrian, cyclist, car and tram. The dataset includes various challenges related to dense urban traffic such as cluttered environment and different weather conditions. To demonstrate the usefulness of the introduced dataset, we propose a fusion framework that combines the four modalities for multi object detection. The obtained results are promising and spur for future research.


Author(s):  
Hoa-Hung Nguyen ◽  
Han-You Jeong

A road network represents road objects in a given geographic area and their interconnections, and is an essential component of intelligent transportation systems (ITS) enabling emerging new applications such as dynamic route guidance, driving assistance systems, and autonomous driving. As the digitization of geospatial information becomes prevalent, a number of road networks with a wide variety of characteristics coexist. In this paper, we present an area partitioning approach to the conflation of two road networks with a large difference in level of details. Our approach first partitions the geographic area by the Network Voronoi Area Diagram (NVAD) of low-detailed road network. Next, a subgraph of high-detailed road network corresponding to a complex intersection is extracted and then aggregated into a supernode so that a high matching precision can be achieved via 1:1 node matching. To improve the matching recall, we also present a few schemes that address the problem of missing corresponding object and representation dissimilarity between these road networks. Numerical results at Yeouido, Korea's autonomous vehicle testing site, show that our area partitioning approach can significantly improve the performance of road network matching.


2019 ◽  
Vol 11 (11) ◽  
pp. 228 ◽  
Author(s):  
Giovanni Pau ◽  
Alessandro Severino ◽  
Antonino Canale

Intelligent transportation solutions and smart information and communication technologies will be the core of future smart cities. For this purpose, these topics have captivated noteworthy interest in the investigation and construction of cleverer communication protocols or the application of artificial intelligence in the connection of in-vehicle devices by wireless networks, and in in-vehicle services for autonomous driving using high-precision positioning and sensing systems. This special issue has focused on the collection of high-quality papers aimed at solving open technical problems and challenges typical of mobile communications for Intelligent Transportation Systems.


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 843
Author(s):  
Lili Miao ◽  
John Jethro Virtusio ◽  
Kai-Lung Hua

C-V2X (Cellular Vehicle-to-Everything) is a state-of-the-art wireless technology used in autonomous driving and intelligent transportation systems (ITS). This technology has extended the coverage and blind-spot detection of autonomous driving vehicles. Economically, C-V2X is much more cost-effective than the traditional sensors that are commonly used by autonomous driving vehicles. This cost-benefit makes it more practical in a large scale deployment. PC5-based C-V2X uses an RF (Radio Frequency) sidelink direct communication for low latency mission-critical vehicle sensor connectivity. Over the C-V2X radio communications, the autonomous driving vehicle’s sensor ability can now be largely enhanced to the distances as far as the network covers. In 2020, 5G is commercialized worldwide, and Taiwan is at the forefront. Operators and governments are keen to see its implications in people’s daily life brought by its low latency, high reliability, and high throughput. Autonomous driving class L3 (Conditional Automation) or L4 (Highly Automation) are good examples of 5G’s advanced applications. In these applications, the mobile networks with URLLC (Ultra-Reliable Low-Latency Communication) are perfectly demonstrated. Therefore, C-V2X evolution and 5G NR (New Radio) deployment coincide and form a new ecosystem. This ecosystem will change how people will drive and how transportation will be managed in the future. In this paper, the following topics are covered. Firstly, the benefits of C-V2X communication technology. Secondly, the standards of C-V2X and C-V2X applications for automotive road safety system which includes V2P/V2I/V2V/V2N, and artificial intelligence in VRU (Vulnerable Road User) detection, object recognition and movement prediction for collision warning and prevention. Thirdly, PC5-based C-V2X deployment status in global, especially in Taiwan. Lastly, current challenges and conclusions of C-V2X development.


Author(s):  
Fazliddin Makhmudov ◽  
Mukhriddin Mukhiddinov ◽  
Akmalbek Abdusalomov ◽  
Kuldoshbay Avazov ◽  
Utkir Khamdamov ◽  
...  

Methods for text detection and recognition in images of natural scenes have become an active research topic in computer vision and have obtained encouraging achievements over several benchmarks. In this paper, we introduce a robust yet simple pipeline that produces accurate and fast text detection and recognition for the Uzbek language in natural scene images using a fully convolutional network and the Tesseract OCR engine. First, the text detection step quickly predicts text in random orientations in full-color images with a single fully convolutional neural network, discarding redundant intermediate stages. Then, the text recognition step recognizes the Uzbek language, including both the Latin and Cyrillic alphabets, using a trained Tesseract OCR engine. Finally, the recognized text can be pronounced using the Uzbek language text-to-speech synthesizer. The proposed method was tested on the ICDAR 2013, ICDAR 2015 and MSRA-TD500 datasets, and it showed an advantage in efficiently detecting and recognizing text from natural scene images for assisting the visually impaired.


Actuators ◽  
2021 ◽  
Vol 10 (6) ◽  
pp. 120
Author(s):  
Pangwei Wang ◽  
Yunfeng Wang ◽  
Xu Wang ◽  
Ying Liu ◽  
Juan Zhang

Integration technologies of artificial intelligence (AI) and autonomous vehicles play important roles in intelligent transportation systems (ITS). In order to achieve better logistics distribution efficiency, this paper proposes an intelligent actuator of an indoor logistics system by fusing multiple involved sensors. Firstly, an actuator based on a four-wheel differential chassis is equipped with sensors, including an RGB camera, a lidar and an indoor inertial navigation system, by which autonomous driving can be realized. Secondly, cross-floor positioning can be realized by multi-node simultaneous localization and mappings (SLAM) based on the Cartographer algorithm Thirdly the actuator can communicate with elevators and take the elevator to the designated delivery floor. Finally, a novel indoor route planning strategy is designed based on an A* algorithm and genetic algorithm (GA) and an actual building is tested as a scenario. The experimental results have shown that the actuator can model the indoor mapping and develop the optimal route effectively. At the same time, the actuator displays its superiority in detecting the dynamic obstacles and actively avoiding the collision in the indoor scenario. Through communicating with indoor elevators, the final delivery task can be completed accurately by autonomous driving.


2021 ◽  
pp. 1-11
Author(s):  
Guangcun Wei ◽  
Wansheng Rong ◽  
Yongquan Liang ◽  
Xinguang Xiao ◽  
Xiang Liu

Aiming at the problem that the traditional OCR processing method ignores the inherent connection between the text detection task and the text recognition task, This paper propose a novel end-to-end text spotting framework. The framework includes three parts: shared convolutional feature network, text detector and text recognizer. By sharing convolutional feature network, the text detection network and the text recognition network can be jointly optimized at the same time. On the one hand, it can reduce the computational burden; on the other hand, it can effectively use the inherent connection between text detection and text recognition. This model add the TCM (Text Context Module) on the basis of Mask RCNN, which can effectively solve the negative sample problem in text detection tasks. This paper propose a text recognition model based on the SAM-BiLSTM (spatial attention mechanism with BiLSTM), which can more effectively extract the semantic information between characters. This model significantly surpasses state-of-the-art methods on a number of text detection and text spotting benchmarks, including ICDAR 2015, Total-Text.


Sign in / Sign up

Export Citation Format

Share Document