scholarly journals Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech

Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1205
Author(s):  
Mohammed Algabri ◽  
Hassan Mathkour ◽  
Mansour M. Alsulaiman ◽  
Mohamed A. Bencherif

This study proposes using object detection techniques to recognize sequences of articulatory features (AFs) from speech utterances by treating AFs of phonemes as multi-label objects in speech spectrogram. The proposed system, called AFD-Obj, recognizes sequence of multi-label AFs in speech signal and localizes them. AFD-Obj consists of two main stages: firstly, we formulate the problem of AFs detection as an object detection problem and prepare the data to fulfill requirement of object detectors by generating a spectral three-channel image from the speech signal and creating the corresponding annotation for each utterance. Secondly, we use annotated images to train the proposed system to detect sequences of AFs and their boundaries. We test the system by feeding spectrogram images to the system, which will recognize and localize multi-label AFs. We investigated using these AFs to detect the utterance phonemes. YOLOv3-tiny detector is selected because of its real-time property and its support for multi-label detection. We test our AFD-Obj system on Arabic and English languages using KAPD and TIMIT corpora, respectively. Additionally, we propose using YOLOv3-tiny as an Arabic phoneme detection system (i.e., PD-Obj) to recognize and localize a sequence of Arabic phonemes from whole speech utterances. The proposed AFD-Obj and PD-Obj systems achieve excellent results for Arabic corpus and comparable to the state-of-the-art method for English corpus. Moreover, we showed that using only one-scale detection is suitable for AFs detection or phoneme recognition.

2021 ◽  
Vol 11 (11) ◽  
pp. 4894
Author(s):  
Anna Scius-Bertrand ◽  
Michael Jungo ◽  
Beat Wolf ◽  
Andreas Fischer ◽  
Marc Bui

The current state of the art for automatic transcription of historical manuscripts is typically limited by the requirement of human-annotated learning samples, which are are necessary to train specific machine learning models for specific languages and scripts. Transcription alignment is a simpler task that aims to find a correspondence between text in the scanned image and its existing Unicode counterpart, a correspondence which can then be used as training data. The alignment task can be approached with heuristic methods dedicated to certain types of manuscripts, or with weakly trained systems reducing the required amount of annotations. In this article, we propose a novel learning-based alignment method based on fully convolutional object detection that does not require any human annotation at all. Instead, the object detection system is initially trained on synthetic printed pages using a font and then adapted to the real manuscripts by means of self-training. On a dataset of historical Vietnamese handwriting, we demonstrate the feasibility of annotation-free alignment as well as the positive impact of self-training on the character detection accuracy, reaching a detection accuracy of 96.4% with a YOLOv5m model without using any human annotation.


2021 ◽  
Author(s):  
S J Fiona G Sathiaraj ◽  
S J Evelyn G Sathiaraj ◽  
Laxmi Bewoor

2020 ◽  
Vol 34 (07) ◽  
pp. 10778-10785
Author(s):  
Linpu Fang ◽  
Hang Xu ◽  
Zhili Liu ◽  
Sarah Parisot ◽  
Zhenguo Li

Object detectors trained on fully-annotated data currently yield state of the art performance but require expensive manual annotations. On the other hand, weakly-supervised detectors have much lower performance and cannot be used reliably in a realistic setting. In this paper, we study the hybrid-supervised object detection problem, aiming to train a high quality detector with only a limited amount of fully-annotated data and fully exploiting cheap data with image-level labels. State of the art methods typically propose an iterative approach, alternating between generating pseudo-labels and updating a detector. This paradigm requires careful manual hyper-parameter tuning for mining good pseudo labels at each round and is quite time-consuming. To address these issues, we present EHSOD, an end-to-end hybrid-supervised object detection system which can be trained in one shot on both fully and weakly-annotated data. Specifically, based on a two-stage detector, we proposed two modules to fully utilize the information from both kinds of labels: 1) CAM-RPN module aims at finding foreground proposals guided by a class activation heat-map; 2) hybrid-supervised cascade module further refines the bounding-box position and classification with the help of an auxiliary head compatible with image-level data. Extensive experiments demonstrate the effectiveness of the proposed method and it achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data, e.g. 37.5% mAP on COCO. We will release the code and the trained models.


AI ◽  
2021 ◽  
Vol 2 (4) ◽  
pp. 552-577
Author(s):  
Mai Ibraheam ◽  
Kin Fun Li ◽  
Fayez Gebali ◽  
Leonard E. Sielecki

Object detection is one of the vital and challenging tasks of computer vision. It supports a wide range of applications in real life, such as surveillance, shipping, and medical diagnostics. Object detection techniques aim to detect objects of certain target classes in a given image and assign each object to a corresponding class label. These techniques proceed differently in network architecture, training strategy and optimization function. In this paper, we focus on animal species detection as an initial step to mitigate the negative impacts of wildlife–human and wildlife–vehicle encounters in remote wilderness regions and on highways. Our goal is to provide a summary of object detection techniques based on R-CNN models, and to enhance the performance of detecting animal species in accuracy and speed, by using four different R-CNN models and a deformable convolutional neural network. Each model is applied on three wildlife datasets, results are compared and analyzed by using four evaluation metrics. Based on the evaluation, an animal species detection system is proposed.


Author(s):  
Muhammad Ahmed ◽  
Khurram Azeem Hashmi ◽  
Alain Pagani ◽  
Marcus Liwicki ◽  
Didier Stricker ◽  
...  

Recent progress in deep learning has led to accurate and efficient generic object detection networks. Training of highly reliable models depends on large datasets with highly textured and rich images. However, in real-world scenarios, the performance of the generic object detection system decreases when (i) occlusions hide the objects, (ii) objects are present in low-light images, or (iii) they are merged with background information. In this paper, we refer to all these situations as challenging environments. With the recent rapid development in generic object detection algorithms, notable progress has been observed in the field of object detection in challenging environments. However, there is no consolidated reference to cover state-of-the-art in this domain. To the best of our knowledge, this paper presents the first comprehensive overview, covering recent approaches that have tackled the problem of object detection in challenging environments. Furthermore, we present the quantitative and qualitative performance analysis of these approaches and discuss the currently available challenging datasets. Moreover, this paper investigates the performance of current state-of-the-art generic object detection algorithms by benchmarking results on the three well-known challenging datasets. Finally, we highlight several current shortcomings and outline future directions.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 54663-54680 ◽  
Author(s):  
Mohammed Algabri ◽  
Hassan Mathkour ◽  
Mohamed Abdelkader Bencherif ◽  
Mansour Alsulaiman ◽  
Mohamed Amine Mekhtiche

Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1213
Author(s):  
Xiaoke Shen ◽  
Ioannis Stamos

Instance segmentation and object detection are significant problems in the fields of computer vision and robotics. We address those problems by proposing a novel object segmentation and detection system. First, we detect 2D objects based on RGB, depth only, or RGB-D images. A 3D convolutional-based system, named Frustum VoxNet, is proposed. This system generates frustums from 2D detection results, proposes 3D candidate voxelized images for each frustum, and uses a 3D convolutional neural network (CNN) based on these candidates voxelized images to perform the 3D instance segmentation and object detection. Results on the SUN RGB-D dataset show that our RGB-D-based system’s 3D inference is much faster than state-of-the-art methods, without a significant loss of accuracy. At the same time, we can provide segmentation and detection results using depth only images, with accuracy comparable to RGB-D-based systems. This is important since our methods can also work well in low lighting conditions, or with sensors that do not acquire RGB images. Finally, the use of segmentation as part of our pipeline increases detection accuracy, while providing at the same time 3D instance segmentation.


Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5116
Author(s):  
Muhammad Ahmed ◽  
Khurram Azeem Hashmi ◽  
Alain Pagani ◽  
Marcus Liwicki ◽  
Didier Stricker ◽  
...  

Recent progress in deep learning has led to accurate and efficient generic object detection networks. Training of highly reliable models depends on large datasets with highly textured and rich images. However, in real-world scenarios, the performance of the generic object detection system decreases when (i) occlusions hide the objects, (ii) objects are present in low-light images, or (iii) they are merged with background information. In this paper, we refer to all these situations as challenging environments. With the recent rapid development in generic object detection algorithms, notable progress has been observed in the field of deep learning-based object detection in challenging environments. However, there is no consolidated reference to cover the state of the art in this domain. To the best of our knowledge, this paper presents the first comprehensive overview, covering recent approaches that have tackled the problem of object detection in challenging environments. Furthermore, we present a quantitative and qualitative performance analysis of these approaches and discuss the currently available challenging datasets. Moreover, this paper investigates the performance of current state-of-the-art generic object detection algorithms by benchmarking results on the three well-known challenging datasets. Finally, we highlight several current shortcomings and outline future directions.


2021 ◽  
Vol 8 (1) ◽  
pp. 60-70
Author(s):  
Usama Arshad

In the last decade, object detection is one of the interesting topics that played an important role in revolutionizing the presentera. Especially when it comes to computervision, object detection is a challenging and most fundamental problem. Researchersin the last decade enhanced object detection and made many advance discoveries using thetechnological advancements. When wetalk about object detection, we also must talk about deep learning and its advancements over the time. This research work describes theadvancements in object detection over last10 years (2010-2020). Different papers published in last 10 years related to objectdetection and its types are discussed with respect to their role in advancement of object detection. This research work also describesdifferent types of object detection, which include text detection, face detection etc. It clearly describes the changes inobject detection techniques over the period of the last 10 years. The Objectdetection is divided into two groups. General detectionand Task based detection. General detection is discussed chronologically and with its different variants while task based detectionincludes many state of the art algorithms and techniques according to tasks. Wealso described the basic comparison of how somealgorithms and techniques have been updated and played a major role in advancements of different fields related to object detection.We conclude that the most important advancements happened in the last decade and the future is promising much more advancement inobject detection on the basis of work done in this decade.In the last decade, object detection is one of the interesting topics that played an important role in revolutionizing the presentera. Especially when it comes to computervision, object detection is the challenging and most fundamental problem. Researchersinlast decade enhanced object detection and made many advance discoveries using thetechnological advancements. When wetalk about object detection, we also must talk about deep learning and its advancements over the time. This research work describes theadvancements in object detection over last10 years (2010-2020). Different papers published in last 10 years related to objectdetection and its types are discussed with respect to their role in advancement of object detection. This research work also describesdifferent types of object detection, which include text detection, face detection etc. It clearly describes the changes inobject detection techniques over the period of last 10 years. The Objectdetection is divided into two groups. General detectionand Task based detection. General detection is discussed chronologically and with its different variants while task based detectionincludes many state of the art algorithms and techniques according to tasks. Wealso described the basic comparison of how somealgorithms and techniques have been updated and played a major role in advancements of different fields related to object detection.We conclude that the most important advancements happened in last decade and future is promising much more advancement inobject detection on the basis of work done in this decade.


Sign in / Sign up

Export Citation Format

Share Document