Recognition of Transient Environmental Sounds Based on Temporal and Frequency Features

Shota Okubo;  ; Zhihao Gong; Kento Fujita; Ken Sasaki

doi:10.20965/ijat.2019.p0803

Recognition of Transient Environmental Sounds Based on Temporal and Frequency Features

International Journal of Automation Technology ◽

10.20965/ijat.2019.p0803 ◽

2019 ◽

Vol 13 (6) ◽

pp. 803-809

Author(s):

Shota Okubo ◽

◽

Zhihao Gong ◽

Kento Fujita ◽

Ken Sasaki

Keyword(s):

Recognition Rate ◽

Sound Recognition ◽

Spectral Information ◽

Support Vector ◽

Environmental Sound ◽

Time Duration ◽

Environmental Sounds ◽

Recognition Experiment ◽

Average Quantity ◽

Feature Values

Environmental sound recognition (ESR) refers to the recognition of all sounds other than the human voice or musical sounds. Typical ESR methods utilize spectral information and variation within it with respect to time. However, in the case of transient sounds, spectral information is insufficient because only an average quantity of a given signal within a time period can be recognized. In this study, the waveform of sound signals and their spectrum were analyzed visually to extract temporal characteristics of the sound more directly. Based on the observations, features such as the initial rise time, duration, and smoothness of the sound signal; the distribution and smoothness of the spectrum; the clarity of the sustaining sound components; and the number and interval of collisions in chattering were proposed. Experimental feature values were obtained for eight transient environmental sounds, and the distributions of the values were evaluated. A recognition experiment was conducted on 11 transient sounds. The Mel-frequency cepstral coefficient (MFCC) was selected as reference. A support vector machine was adopted as the classification algorithm. The recognition rates obtained from the MFCC were below 50% for five of the 11 sounds, and the overall recognition rate was 69%. In contrast, the recognition rates obtained using the proposed features were above 50% for all sounds, and the overall rate was 86%.

Download Full-text

Environmental Sound Recognition Using Time-Frequency Intersection Patterns

Applied Computational Intelligence and Soft Computing ◽

10.1155/2012/650818 ◽

2012 ◽

Vol 2012 ◽

pp. 1-6 ◽

Cited By ~ 4

Author(s):

Xuan Guo ◽

Yoshiyuki Toyoda ◽

Huankang Li ◽

Jie Huang ◽

Shuxue Ding ◽

...

Keyword(s):

Recognition Rate ◽

Sound Recognition ◽

Environmental Sound ◽

Frequency Pattern ◽

Instantaneous Spectrum ◽

Environmental Sounds ◽

Time Frequency ◽

Time Variance ◽

Neural Network System ◽

Environmental Sound Recognition

Environmental sound recognition is an important function of robots and intelligent computer systems. In this research, we use a multistage perceptron neural network system for environmental sound recognition. The input data is a combination of time-variance pattern of instantaneous powers and frequency-variance pattern with instantaneous spectrum at the power peak, referred to as a time-frequency intersection pattern. Spectra of many environmental sounds change more slowly than those of speech or voice, so the intersectional time-frequency pattern will preserve the major features of environmental sounds but with drastically reduced data requirements. Two experiments were conducted using an original database and an open database created by the RWCP project. The recognition rate for 20 kinds of environmental sounds was 92%. The recognition rate of the new method was about 12% higher than methods using only an instantaneous spectrum. The results are also comparable with HMM-based methods, although those methods need to treat the time variance of an input vector series with more complicated computations.

Download Full-text

Causal inference in environmental sound recognition

10.1101/2020.07.13.200949 ◽

2020 ◽

Author(s):

James Traer ◽

Sam V. Norman-Haignere ◽

Josh H. McDermott

Keyword(s):

Real World ◽

High Intensity ◽

Recognition Accuracy ◽

Sound Recognition ◽

Acoustic Cues ◽

Environmental Sound ◽

Sound Sources ◽

Environmental Sounds ◽

Low Intensity ◽

The World

AbstractSound is caused by physical events in the world. Do humans infer these causes when recognizing sound sources? We tested whether the recognition of common environmental sounds depends on the inference of a basic physical variable – the source intensity (i.e. the power that produces a sound). A source’s intensity can be inferred from the intensity it produces at the ear and its distance, which is normally conveyed by reverberation. Listeners could thus use intensity at the ear and reverberation to constrain recognition by inferring the underlying source intensity. Alternatively, listeners might separate these acoustic cues from their representation of a sound’s identity in the interest of invariant recognition. We compared these two hypotheses by measuring recognition accuracy for sounds with typically low or high source intensity (e.g. pepper grinders vs. trucks) that were presented across a range of intensities at the ear or with reverberation cues to distance. The recognition of low-intensity sources (e.g. pepper grinders) was impaired by high presentation intensities or reverberation that conveyed distance, either of which imply high source intensity. Neither effect occurred for high-intensity sources. The results suggest that listeners implicitly use the intensity at the ear along with distance cues to infer a source’s power and constrain its identity. The recognition of real-world sounds thus appears to depend upon the inference of their physical generative parameters, even generative parameters whose cues might otherwise be separated from the representation of a sound’s identity.

Download Full-text

Environmental sound recognition: a survey

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2014.12 ◽

2014 ◽

Vol 3 ◽

Cited By ~ 35

Author(s):

Sachin Chachada ◽

C.-C. Jay Kuo

Keyword(s):

Spectral Characteristics ◽

Performance Comparison ◽

Sound Recognition ◽

Future Research ◽

Environmental Sound ◽

Environmental Sounds ◽

Recent Developments ◽

Music Signals ◽

Environmental Sound Recognition

Although research in audio recognition has traditionally focused on speech and music signals, the problem of environmental sound recognition (ESR) has received more attention in recent years. Research on ESR has significantly increased in the past decade. Recent work has focused on the appraisal of non-stationary aspects of environmental sounds, and several new features predicated on non-stationary characteristics have been proposed. These features strive to maximize their information content pertaining to signal's temporal and spectral characteristics. Furthermore, sequential learning methods have been used to capture the long-term variation of environmental sounds. In this survey, we will offer a qualitative and elucidatory survey on recent developments. It includes four parts: (i) basic environmental sound-processing schemes, (ii) stationary ESR techniques, (iii) non-stationary ESR techniques, and (iv) performance comparison of selected methods. Finally, concluding remarks and future research and development trends in the ESR field will be given.

Download Full-text

GESTURE RECOGNITION SYSTEM FOR NIGERIAN TRIBAL GREETING POSTURES USING SUPPORT VECTOR MACHINE

MALAYSIAN JOURNAL OF COMPUTING ◽

10.24191/mjoc.v5i2.10347 ◽

2020 ◽

Vol 5 (2) ◽

pp. 609

Author(s):

Segun Aina ◽

Kofoworola V. Sholesi ◽

Aderonke R. Lawal ◽

Samuel D. Okegbile ◽

Adeniran I. Oluwaranti

Keyword(s):

Support Vector Machine ◽

Gesture Recognition ◽

Recognition Rate ◽

Recognition Task ◽

Recognition System ◽

Human Interaction ◽

Support Vector ◽

System A ◽

Extraction Algorithm ◽

Gaussian Blur

This paper presents the application of Gaussian blur filters and Support Vector Machine (SVM) techniques for greeting recognition among the Yoruba tribe of Nigeria. Existing efforts have considered different recognition gestures. However, tribal greeting postures or gestures recognition for the Nigerian geographical space has not been studied before. Some cultural gestures are not correctly identified by people of the same tribe, not to mention other people from different tribes, thereby posing a challenge of misinterpretation of meaning. Also, some cultural gestures are unknown to most people outside a tribe, which could also hinder human interaction; hence there is a need to automate the recognition of Nigerian tribal greeting gestures. This work hence develops a Gaussian Blur – SVM based system capable of recognizing the Yoruba tribe greeting postures for men and women. Videos of individuals performing various greeting gestures were collected and processed into image frames. The images were resized and a Gaussian blur filter was used to remove noise from them. This research used a moment-based feature extraction algorithm to extract shape features that were passed as input to SVM. SVM is exploited and trained to perform the greeting gesture recognition task to recognize two Nigerian tribe greeting postures. To confirm the robustness of the system, 20%, 25% and 30% of the dataset acquired from the preprocessed images were used to test the system. A recognition rate of 94% could be achieved when SVM is used, as shown by the result which invariably proves that the proposed method is efficient.

Download Full-text

Ergonomics analysis based on intention inference

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210191 ◽

2021 ◽

pp. 1-16

Author(s):

First A. Wenbo Huang ◽

Second B. Changyuan Wang ◽

Third C. Hongbo Jia

Keyword(s):

Eye Movement ◽

Recognition Rate ◽

Tactile Feedback ◽

Movement Direction ◽

Support Vector ◽

Eeg Signals ◽

Intention Recognition ◽

Baseline Removal ◽

Inference Methods ◽

Intention Inference

Traditional intention inference methods rely solely on EEG, eye movement or tactile feedback, and the recognition rate is low. To improve the accuracy of a pilot’s intention recognition, a human-computer interaction intention inference method is proposed in this paper with the fusion of EEG, eye movement and tactile feedback. Firstly, EEG signals are collected near the frontal lobe of the human brain to extract features, which includes eight channels, i.e., AF7, F7, FT7, T7, AF8, F8, FT8, and T8. Secondly, the signal datas are preprocessed by baseline removal, normalization, and least-squares noise reduction. Thirdly, the support vector machine (SVM) is applied to carry out multiple binary classifications of the eye movement direction. Finally, the 8-direction recognition of the eye movement direction is realized through data fusion. Experimental results have shown that the accuracy of classification with the proposed method can reach 75.77%, 76.7%, 83.38%, 83.64%, 60.49%,60.93%, 66.03% and 64.49%, respectively. Compared with traditional methods, the classification accuracy and the realization process of the proposed algorithm are higher and simpler. The feasibility and effectiveness of EEG signals are further verified to identify eye movement directions for intention recognition.

Download Full-text

Pioneer dataset and recognition of Handwritten Pashto characters using Convolution Neural Networks

Measurement and Control ◽

10.1177/0020294020964826 ◽

2020 ◽

pp. 002029402096482

Author(s):

Sulaiman Khan ◽

Abdul Hafeez ◽

Hazrat Ali ◽

Shah Nazir ◽

Anwar Hussain

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Support Vector Machine ◽

Research Work ◽

Recognition Rate ◽

Recognition Task ◽

Convolution Neural Network ◽

Medium Size ◽

Support Vector ◽

Artificial Neural

This paper presents an efficient OCR system for the recognition of offline Pashto isolated characters. The lack of an appropriate dataset makes it challenging to match against a reference and perform recognition. This research work addresses this problem by developing a medium-size database that comprises 4488 samples of handwritten Pashto character; that can be further used for experimental purposes. In the proposed OCR system the recognition task is performed using convolution neural network. The performance analysis of the proposed OCR system is validated by comparing its results with artificial neural network and support vector machine based on zoning feature extraction technique. The results of the proposed experiments shows an accuracy of 56% for the support vector machine, 78% for artificial neural network, and 80.7% for the proposed OCR system. The high recognition rate shows that the OCR system based on convolution neural network performs best among the used techniques.

Download Full-text

Environmental sound recognition using MP-based features

2008 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2008.4517531 ◽

2008 ◽

Cited By ~ 24

Author(s):

Selina Chu ◽

Shrikanth Narayanan ◽

C.-C. Jay Kuo

Keyword(s):

Sound Recognition ◽

Environmental Sound ◽

Environmental Sound Recognition

Download Full-text

Persian Handwritten Number Recognition Using Adapted Framing Feature and Support Vector Machines

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026816500048 ◽

2016 ◽

Vol 15 (01) ◽

pp. 1650004 ◽

Cited By ~ 3

Author(s):

Hedieh Sajedi ◽

Mehran Bahador

Keyword(s):

Support Vector Machines ◽

Recognition Rate ◽

Nearest Neighbors ◽

Polynomial Kernel ◽

Support Vector ◽

K Nearest Neighbors ◽

New Approach ◽

Number Recognition ◽

Vector Machines

In this paper, a new approach for segmentation and recognition of Persian handwritten numbers is presented. This method utilizes the framing feature technique in combination with outer profile feature that we named this the adapted framing feature. In our proposed approach, segmentation of the numbers into digits has been carried out automatically. In the classification stage of the proposed method, Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN) are used. Experimentations are conducted on the IFHCDB database consisting 17,740 numeral images and HODA database consisting 102,352 numeral images. In isolated digit level on IFHCDB, the recognition rate of 99.27%, is achieved by using SVM with polynomial kernel. Furthermore, in isolated digit level on HODA, the recognition rate of 99.07% is achieved by using SVM with polynomial kernel. The experiments illustrate that applying our proposed method resulted higher accuracy compared to previous researches.

Download Full-text

IMPROVEMENT OF THE PERFORMANCE OF FINGERPRINT VERIFICATION USING A COMBINATORIAL APPROACH

Biomedical Engineering Applications Basis and Communications ◽

10.4015/s1016237218500199 ◽

2018 ◽

Vol 30 (03) ◽

pp. 1850019

Author(s):

Fatemeh Alimardani ◽

Reza Boostani

Keyword(s):

Nearest Neighbor ◽

State Of The Art ◽

Recognition Rate ◽

Acceptance Rate ◽

Feature Reduction ◽

Support Vector ◽

Fingerprint Verification ◽

Linear Discriminant ◽

Comparative Results ◽

Verification Systems

Fingerprint verification systems have attracted much attention in secure organizations; however, conventional methods still suffer from unconvincing recognition rate for noisy fingerprint images. To design a robust verification system, in this paper, wavelet and contourlet transforms (CTS) were suggested as efficient feature extraction techniques to elicit a coverall set of descriptive features to characterize fingerprint images. Contourlet coefficients capture the smooth contours of fingerprints while wavelet coefficients reveal its rough details. Due to the high dimensionality of the elicited features, across group variance (AGV), greedy overall relevancy (GOR) and Davis–Bouldin fast feature reduction (DB-FFR) methods were adopted to remove the redundant features. These features were applied to three different classifiers including Boosting Direct Linear Discriminant Analysis (BDLDA), Support Vector Machine (SVM) and Modified Nearest Neighbor (MNN). The proposed method along with state-of-the-art methods were evaluated, over the FVC2004 dataset, in terms of genuine acceptance rate (GAR), false acceptance rate (FAR) and equal error rate (EER). The features selected by AGV were the most significant ones and provided 95.12% GAR. Applying the selected features, by the GOR method, to the modified nearest neighbor, resulted in average EER of [Formula: see text]%, which outperformed the compared methods. The comparative results imply the statistical superiority ([Formula: see text]) of the proposed approach compared to the counterparts.

Download Full-text

Recognition of 3D Objects from 2D Views Features

Journal of Electronic Commerce in Organizations ◽

10.4018/jeco.2015040105 ◽

2015 ◽

Vol 13 (2) ◽

pp. 50-58

Author(s):

R. Khadim ◽

R. El Ayachi ◽

Mohamed Fakir

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Nearest Neighbor ◽

Color Image ◽

Recognition Rate ◽

Experimental Results ◽

Support Vector ◽

K Nearest Neighbor ◽

3D Objects ◽

Color Descriptor

This paper focuses on the recognition of 3D objects using 2D attributes. In order to increase the recognition rate, the present an hybridization of three approaches to calculate the attributes of color image, this hybridization based on the combination of Zernike moments, Gist descriptors and color descriptor (statistical moments). In the classification phase, three methods are adopted: Neural Network (NN), Support Vector Machine (SVM), and k-nearest neighbor (KNN). The database COIL-100 is used in the experimental results.

Download Full-text