scholarly journals A 2D Convolutional Gating Mechanism for Mandarin Streaming Speech Recognition

Information ◽  
2021 ◽  
Vol 12 (4) ◽  
pp. 165
Author(s):  
Xintong Wang ◽  
Chuangang Zhao

Recent research shows recurrent neural network-Transducer (RNN-T) architecture has become a mainstream approach for streaming speech recognition. In this work, we investigate the VGG2 network as the input layer to the RNN-T in streaming speech recognition. Specifically, before the input feature is passed to the RNN-T, we introduce a gated-VGG2 block, which uses the first two layers of the VGG16 to extract contextual information in the time domain, and then use a SEnet-style gating mechanism to control what information in the channel domain is to be propagated to RNN-T. The results show that the RNN-T model with the proposed gated-VGG2 block brings significant performance improvement when compared to the existing RNN-T model, and it has a lower latency and character error rate than the Transformer-based model.

Author(s):  
Yongzhi Qu ◽  
Gregory W. Vogl ◽  
Zechao Wang

Abstract The frequency response function (FRF), defined as the ratio between the Fourier transform of the time-domain output and the Fourier transform of the time-domain input, is a common tool to analyze the relationships between inputs and outputs of a mechanical system. Learning the FRF for mechanical systems can facilitate system identification, condition-based health monitoring, and improve performance metrics, by providing an input-output model that describes the system dynamics. Existing FRF identification assumes there is a one-to-one mapping between each input frequency component and output frequency component. However, during dynamic operations, the FRF can present complex dependencies with frequency cross-correlations due to modulation effects, nonlinearities, and mechanical noise. Furthermore, existing FRFs assume linearity between input-output spectrums with varying mechanical loads, while in practice FRFs can depend on the operating conditions and show high nonlinearities. Outputs of existing neural networks are typically low-dimensional labels rather than real-time high-dimensional measurements. This paper proposes a vector regression method based on deep neural networks for the learning of runtime FRFs from measurement data under different operating conditions. More specifically, a neural network based on an encoder-decoder with a symmetric compression structure is proposed. The deep encoder-decoder network features simultaneous learning of the regression relationship between input and output embeddings, as well as a discriminative model for output spectrum classification under different operating conditions. The learning model is validated using experimental data from a high-pressure hydraulic test rig. The results show that the proposed model can learn the FRF between sensor measurements under different operating conditions with high accuracy and denoising capability. The learned FRF model provides an estimation for sensor measurements when a physical sensor is not feasible and can be used for operating condition recognition.


Author(s):  
Hyeon Bae ◽  
◽  
Youn-Tae Kim ◽  
Sungshin Kim ◽  
Sang-Hyuk Lee ◽  
...  

The motor is the workhorse of industries. The issues of preventive and condition-based maintenance, online monitoring, system fault detection, diagnosis, and prognosis are of increasing importance. This paper introduces fault detection for induction motors. Stator currents are measured by current meters and stored by time domain. The time domain is not suitable for representing current signals, so the frequency domain is applied to display signals. The Fourier Transform is employed to convert signals. After signal conversion, signal features must be extracted by signal processing such as wavelet and spectrum analysis. Features are entered in a pattern classification model such as a neural network model, a polynomial neural network, or a fuzzy inference model. This paper describes fault detection results that use Fourier and wavelet analysis. This combined approach is very useful and powerful for detecting signal features.


2017 ◽  
Vol 2017 ◽  
pp. 1-10 ◽  
Author(s):  
Dileep Kumar ◽  
Dezhan Tu ◽  
Naifu Zhu ◽  
Dibo Hou ◽  
Hongjian Zhang

Traditionally permanent acoustic sensors leak detection techniques have been proven to be very effective in water distribution pipes. However, these methods need long distance deployment and proper position of sensors and cannot be implemented on underground pipelines. An inline-inspection acoustic device is developed which consists of acoustic sensors. The device will travel by the flow of water through the pipes which record all noise events and detect small leaks. However, it records all the noise events regarding background noises, but the time domain noisy acoustic signal cannot manifest complete features such as the leak flow rate which does not distinguish the leak signal and environmental disturbance. This paper presents an algorithm structure with the modularity of wavelet and neural network, which combines the capability of wavelet transform analyzing leakage signals and classification capability of artificial neural networks. This study validates that the time domain is not evident to the complete features regarding noisy leak signals and significance of selection of mother wavelet to extract the noise event features in water distribution pipes. The simulation consequences have shown that an appropriate mother wavelet has been selected and localized to extract the features of the signal with leak noise and background noise, and by neural network implementation, the method improves the classification performance of extracted features.


2021 ◽  
Vol 38 (6) ◽  
pp. 1819-1827
Author(s):  
Jian-Da Wu ◽  
Che-Yuan Hsieh ◽  
Wen-Jun Luo

This study proposed convolutional neural network (CNN) training for different figure recognition to diagnose electric motorbike faults. Traditional motorbike maintenance is usually carried out by technicians to find the problem step by step. Many resources are wasted and time consumed in diagnosing maintenance problems. Due to rising environmental protection awareness, motorbike power systems gradually transformed from combustion engines into the electric motor. The sound amplitude generated by the combustion engine is great and may cover other faulty sounds. The electric power system sound amplitude is greatly decreased, permitting various fault diagnosis to be performed by extracting the electric motor sound. With the development of computers and image processing, deep learning neural network for picture recognition technology becomes more feasible. This study presents the motor system sound visualization for fault diagnosis. First obtain the sound signals of the motor in the five different states of the operation in the laboratory and the road test, and draw the time domain graph, frequency domain graph and spectrogram graph to be used as the test database. The results graphs of various states were trained through a CNN. The signal states were then classified to achieve fault diagnosis. Experiments and identification results show that the spectrogram and CNN method can identify motorbike faults most effectively compared to the time domain graph and the frequency domain graph.


2004 ◽  
Vol 47 (2) ◽  
pp. 173-181 ◽  
Author(s):  
G. Manteuffel ◽  
P. C. Schön

Abstract. Title of the paper: STREMODO, an innovative technique for continuous stress assessment of pigs in housing and transport Vocal utterances of animals are the results of emotional states in specific situations. Therefore, distress calls of pigs can be used as indicators of impaired welfare. An automatic system was developed that responds selectively to stress vocalisations and that registrates and records their amount in the time domain. It can be applied in housing systems, during transports and in abattoirs. The patented technique is based on sequential records of the actual sound events in short time windows (92ms) and a parsimonious coding by 12 complex parameters (LPC-coefficients). A subsequent artificial neural network trained with respective parameters from porcine stress vocalisations is able to detect stress utterances with an error rate of less than 5 % even in noisy stables.


Sign in / Sign up

Export Citation Format

Share Document