scholarly journals Recognition of Transient Environmental Sounds Based on Temporal and Frequency Features

2019 ◽  
Vol 13 (6) ◽  
pp. 803-809
Author(s):  
Shota Okubo ◽  
◽  
Zhihao Gong ◽  
Kento Fujita ◽  
Ken Sasaki

Environmental sound recognition (ESR) refers to the recognition of all sounds other than the human voice or musical sounds. Typical ESR methods utilize spectral information and variation within it with respect to time. However, in the case of transient sounds, spectral information is insufficient because only an average quantity of a given signal within a time period can be recognized. In this study, the waveform of sound signals and their spectrum were analyzed visually to extract temporal characteristics of the sound more directly. Based on the observations, features such as the initial rise time, duration, and smoothness of the sound signal; the distribution and smoothness of the spectrum; the clarity of the sustaining sound components; and the number and interval of collisions in chattering were proposed. Experimental feature values were obtained for eight transient environmental sounds, and the distributions of the values were evaluated. A recognition experiment was conducted on 11 transient sounds. The Mel-frequency cepstral coefficient (MFCC) was selected as reference. A support vector machine was adopted as the classification algorithm. The recognition rates obtained from the MFCC were below 50% for five of the 11 sounds, and the overall recognition rate was 69%. In contrast, the recognition rates obtained using the proposed features were above 50% for all sounds, and the overall rate was 86%.

2012 ◽  
Vol 2012 ◽  
pp. 1-6 ◽  
Author(s):  
Xuan Guo ◽  
Yoshiyuki Toyoda ◽  
Huankang Li ◽  
Jie Huang ◽  
Shuxue Ding ◽  
...  

Environmental sound recognition is an important function of robots and intelligent computer systems. In this research, we use a multistage perceptron neural network system for environmental sound recognition. The input data is a combination of time-variance pattern of instantaneous powers and frequency-variance pattern with instantaneous spectrum at the power peak, referred to as a time-frequency intersection pattern. Spectra of many environmental sounds change more slowly than those of speech or voice, so the intersectional time-frequency pattern will preserve the major features of environmental sounds but with drastically reduced data requirements. Two experiments were conducted using an original database and an open database created by the RWCP project. The recognition rate for 20 kinds of environmental sounds was 92%. The recognition rate of the new method was about 12% higher than methods using only an instantaneous spectrum. The results are also comparable with HMM-based methods, although those methods need to treat the time variance of an input vector series with more complicated computations.


2020 ◽  
Author(s):  
James Traer ◽  
Sam V. Norman-Haignere ◽  
Josh H. McDermott

AbstractSound is caused by physical events in the world. Do humans infer these causes when recognizing sound sources? We tested whether the recognition of common environmental sounds depends on the inference of a basic physical variable – the source intensity (i.e. the power that produces a sound). A source’s intensity can be inferred from the intensity it produces at the ear and its distance, which is normally conveyed by reverberation. Listeners could thus use intensity at the ear and reverberation to constrain recognition by inferring the underlying source intensity. Alternatively, listeners might separate these acoustic cues from their representation of a sound’s identity in the interest of invariant recognition. We compared these two hypotheses by measuring recognition accuracy for sounds with typically low or high source intensity (e.g. pepper grinders vs. trucks) that were presented across a range of intensities at the ear or with reverberation cues to distance. The recognition of low-intensity sources (e.g. pepper grinders) was impaired by high presentation intensities or reverberation that conveyed distance, either of which imply high source intensity. Neither effect occurred for high-intensity sources. The results suggest that listeners implicitly use the intensity at the ear along with distance cues to infer a source’s power and constrain its identity. The recognition of real-world sounds thus appears to depend upon the inference of their physical generative parameters, even generative parameters whose cues might otherwise be separated from the representation of a sound’s identity.


Author(s):  
Sachin Chachada ◽  
C.-C. Jay Kuo

Although research in audio recognition has traditionally focused on speech and music signals, the problem of environmental sound recognition (ESR) has received more attention in recent years. Research on ESR has significantly increased in the past decade. Recent work has focused on the appraisal of non-stationary aspects of environmental sounds, and several new features predicated on non-stationary characteristics have been proposed. These features strive to maximize their information content pertaining to signal's temporal and spectral characteristics. Furthermore, sequential learning methods have been used to capture the long-term variation of environmental sounds. In this survey, we will offer a qualitative and elucidatory survey on recent developments. It includes four parts: (i) basic environmental sound-processing schemes, (ii) stationary ESR techniques, (iii) non-stationary ESR techniques, and (iv) performance comparison of selected methods. Finally, concluding remarks and future research and development trends in the ESR field will be given.


2020 ◽  
Vol 5 (2) ◽  
pp. 609
Author(s):  
Segun Aina ◽  
Kofoworola V. Sholesi ◽  
Aderonke R. Lawal ◽  
Samuel D. Okegbile ◽  
Adeniran I. Oluwaranti

This paper presents the application of Gaussian blur filters and Support Vector Machine (SVM) techniques for greeting recognition among the Yoruba tribe of Nigeria. Existing efforts have considered different recognition gestures. However, tribal greeting postures or gestures recognition for the Nigerian geographical space has not been studied before. Some cultural gestures are not correctly identified by people of the same tribe, not to mention other people from different tribes, thereby posing a challenge of misinterpretation of meaning. Also, some cultural gestures are unknown to most people outside a tribe, which could also hinder human interaction; hence there is a need to automate the recognition of Nigerian tribal greeting gestures. This work hence develops a Gaussian Blur – SVM based system capable of recognizing the Yoruba tribe greeting postures for men and women. Videos of individuals performing various greeting gestures were collected and processed into image frames. The images were resized and a Gaussian blur filter was used to remove noise from them. This research used a moment-based feature extraction algorithm to extract shape features that were passed as input to SVM. SVM is exploited and trained to perform the greeting gesture recognition task to recognize two Nigerian tribe greeting postures. To confirm the robustness of the system, 20%, 25% and 30% of the dataset acquired from the preprocessed images were used to test the system. A recognition rate of 94% could be achieved when SVM is used, as shown by the result which invariably proves that the proposed method is efficient.


2021 ◽  
pp. 1-16
Author(s):  
First A. Wenbo Huang ◽  
Second B. Changyuan Wang ◽  
Third C. Hongbo Jia

Traditional intention inference methods rely solely on EEG, eye movement or tactile feedback, and the recognition rate is low. To improve the accuracy of a pilot’s intention recognition, a human-computer interaction intention inference method is proposed in this paper with the fusion of EEG, eye movement and tactile feedback. Firstly, EEG signals are collected near the frontal lobe of the human brain to extract features, which includes eight channels, i.e., AF7, F7, FT7, T7, AF8, F8, FT8, and T8. Secondly, the signal datas are preprocessed by baseline removal, normalization, and least-squares noise reduction. Thirdly, the support vector machine (SVM) is applied to carry out multiple binary classifications of the eye movement direction. Finally, the 8-direction recognition of the eye movement direction is realized through data fusion. Experimental results have shown that the accuracy of classification with the proposed method can reach 75.77%, 76.7%, 83.38%, 83.64%, 60.49%,60.93%, 66.03% and 64.49%, respectively. Compared with traditional methods, the classification accuracy and the realization process of the proposed algorithm are higher and simpler. The feasibility and effectiveness of EEG signals are further verified to identify eye movement directions for intention recognition.


2020 ◽  
pp. 002029402096482
Author(s):  
Sulaiman Khan ◽  
Abdul Hafeez ◽  
Hazrat Ali ◽  
Shah Nazir ◽  
Anwar Hussain

This paper presents an efficient OCR system for the recognition of offline Pashto isolated characters. The lack of an appropriate dataset makes it challenging to match against a reference and perform recognition. This research work addresses this problem by developing a medium-size database that comprises 4488 samples of handwritten Pashto character; that can be further used for experimental purposes. In the proposed OCR system the recognition task is performed using convolution neural network. The performance analysis of the proposed OCR system is validated by comparing its results with artificial neural network and support vector machine based on zoning feature extraction technique. The results of the proposed experiments shows an accuracy of 56% for the support vector machine, 78% for artificial neural network, and 80.7% for the proposed OCR system. The high recognition rate shows that the OCR system based on convolution neural network performs best among the used techniques.


Author(s):  
Hedieh Sajedi ◽  
Mehran Bahador

In this paper, a new approach for segmentation and recognition of Persian handwritten numbers is presented. This method utilizes the framing feature technique in combination with outer profile feature that we named this the adapted framing feature. In our proposed approach, segmentation of the numbers into digits has been carried out automatically. In the classification stage of the proposed method, Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN) are used. Experimentations are conducted on the IFHCDB database consisting 17,740 numeral images and HODA database consisting 102,352 numeral images. In isolated digit level on IFHCDB, the recognition rate of 99.27%, is achieved by using SVM with polynomial kernel. Furthermore, in isolated digit level on HODA, the recognition rate of 99.07% is achieved by using SVM with polynomial kernel. The experiments illustrate that applying our proposed method resulted higher accuracy compared to previous researches.


2018 ◽  
Vol 30 (03) ◽  
pp. 1850019
Author(s):  
Fatemeh Alimardani ◽  
Reza Boostani

Fingerprint verification systems have attracted much attention in secure organizations; however, conventional methods still suffer from unconvincing recognition rate for noisy fingerprint images. To design a robust verification system, in this paper, wavelet and contourlet transforms (CTS) were suggested as efficient feature extraction techniques to elicit a coverall set of descriptive features to characterize fingerprint images. Contourlet coefficients capture the smooth contours of fingerprints while wavelet coefficients reveal its rough details. Due to the high dimensionality of the elicited features, across group variance (AGV), greedy overall relevancy (GOR) and Davis–Bouldin fast feature reduction (DB-FFR) methods were adopted to remove the redundant features. These features were applied to three different classifiers including Boosting Direct Linear Discriminant Analysis (BDLDA), Support Vector Machine (SVM) and Modified Nearest Neighbor (MNN). The proposed method along with state-of-the-art methods were evaluated, over the FVC2004 dataset, in terms of genuine acceptance rate (GAR), false acceptance rate (FAR) and equal error rate (EER). The features selected by AGV were the most significant ones and provided 95.12% GAR. Applying the selected features, by the GOR method, to the modified nearest neighbor, resulted in average EER of [Formula: see text]%, which outperformed the compared methods. The comparative results imply the statistical superiority ([Formula: see text]) of the proposed approach compared to the counterparts.


2015 ◽  
Vol 13 (2) ◽  
pp. 50-58
Author(s):  
R. Khadim ◽  
R. El Ayachi ◽  
Mohamed Fakir

This paper focuses on the recognition of 3D objects using 2D attributes. In order to increase the recognition rate, the present an hybridization of three approaches to calculate the attributes of color image, this hybridization based on the combination of Zernike moments, Gist descriptors and color descriptor (statistical moments). In the classification phase, three methods are adopted: Neural Network (NN), Support Vector Machine (SVM), and k-nearest neighbor (KNN). The database COIL-100 is used in the experimental results.


Sign in / Sign up

Export Citation Format

Share Document