A Machine Hearing System for Robust Cough Detection Based on a High-Level Representation of Band-Specific Audio Features

Jesus Monge-Alvarez; Carlos Hoyos-Barcelo; Luis Miguel San-Jose-Revuelta; Pablo Casaseca-de-la-Higuera

doi:10.1109/tbme.2018.2888998

MODELING SEMANTIC CONCEPTS AND USER PREFERENCES IN CONTENT-BASED VIDEO RETRIEVAL

International Journal of Semantic Computing ◽

10.1142/s1793351x07000159 ◽

2007 ◽

Vol 01 (03) ◽

pp. 377-402 ◽

Cited By ~ 7

Author(s):

SHU-CHING CHEN ◽

NA ZHAO ◽

MEI-LING SHYU

Keyword(s):

Video Retrieval ◽

User Preferences ◽

Retrieval Performance ◽

Video Database ◽

Database Modeling ◽

User Perceptions ◽

Audio Features ◽

Semantic Concepts ◽

The Individual ◽

High Level

In this paper, a user-centered framework is proposed for video database modeling and retrieval to provide appealing multimedia experiences on the content-based video queries. By incorporating the Hierarchical Markov Model Mediator (HMMM) mechanism, the source videos, segmented video shots, visual/audio features, semantic events, and high-level user perceptions are seamlessly integrated in a video database. With the hierarchical and stochastic design for video databases and semantic concept modeling, the proposed framework supports the retrieval for not only single events but also temporal sequences with multiple events. Additionally, an innovative method is proposed to capture the individual user's preferences by considering both the low-level features and the semantic concepts. The retrieval and ranking of video events and the temporal patterns can be updated dynamically online to satisfy individual user's interest and information requirements. Moreover, the users' feedbacks are efficiently accumulated for the offline system training process such that the overall retrieval performance can be enhanced periodically and continuously. For the evaluation of the proposed approach, a soccer video retrieval system is developed, presented, and tested to demonstrate the overall retrieval performance improvement achieved by modeling and capturing the user preferences.

Download Full-text

High-Level Analysis of Audio Features for Identifying Emotional Valence in Human Singing

Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion - AM'18 ◽

10.1145/3243274.3243313 ◽

2018 ◽

Cited By ~ 3

Author(s):

Stuart Cunningham ◽

Jonathan Weinel ◽

Richard Picking

Keyword(s):

Emotional Valence ◽

Audio Features ◽

High Level ◽

Level Analysis

Download Full-text

Interpretable Music Categorisation Based on Fuzzy Rules and High-Level Audio Features

Data Science, Learning by Latent Structures, and Knowledge Discovery - Studies in Classification, Data Analysis, and Knowledge Organization ◽

10.1007/978-3-662-44983-7_37 ◽

2015 ◽

pp. 423-432 ◽

Cited By ~ 1

Author(s):

Igor Vatolkin ◽

Günter Rudolph

Keyword(s):

Fuzzy Rules ◽

Audio Features ◽

High Level

Download Full-text

A machine-hearing system exploiting head movements for binaural sound localisation in reverberant conditions

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2015.7178461 ◽

2015 ◽

Cited By ~ 8

Author(s):

Ning Ma ◽

Tobias May ◽

Hagen Wierstorf ◽

Guy J. Brown

Keyword(s):

Head Movements ◽

Sound Localisation ◽

Machine Hearing ◽

Hearing System ◽

Binaural Sound

Download Full-text

Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro-Fuzzy System

Mathematical Problems in Engineering ◽

10.1155/2020/8478527 ◽

2020 ◽

Vol 2020 ◽

pp. 1-15

Author(s):

Gwenaelle Cunha Sergio ◽

Minho Lee

Keyword(s):

Neural Network ◽

Short Term Memory ◽

Fuzzy Inference ◽

Audio Signals ◽

Global Features ◽

Inference System ◽

Audio Features ◽

Neuro Fuzzy ◽

High Level ◽

Domain Transformation

Generating music with emotion similar to that of an input video is a very relevant issue nowadays. Video content creators and automatic movie directors benefit from maintaining their viewers engaged, which can be facilitated by producing novel material eliciting stronger emotions in them. Moreover, there is currently a demand for more empathetic computers to aid humans in applications such as augmenting the perception ability of visually- and/or hearing-impaired people. Current approaches overlook the video’s emotional characteristics in the music generation step, only consider static images instead of videos, are unable to generate novel music, and require a high level of human effort and skills. In this study, we propose a novel hybrid deep neural network that uses an Adaptive Neuro-Fuzzy Inference System to predict a video’s emotion from its visual features and a deep Long Short-Term Memory Recurrent Neural Network to generate its corresponding audio signals with similar emotional inkling. The former is able to appropriately model emotions due to its fuzzy properties, and the latter is able to model data with dynamic time properties well due to the availability of the previous hidden state information. The novelty of our proposed method lies in the extraction of visual emotional features in order to transform them into audio signals with corresponding emotional aspects for users. Quantitative experiments show low mean absolute errors of 0.217 and 0.255 in the Lindsey and DEAP datasets, respectively, and similar global features in the spectrograms. This indicates that our model is able to appropriately perform domain transformation between visual and audio features. Based on experimental results, our model can effectively generate an audio that matches the scene eliciting a similar emotion from the viewer in both datasets, and music generated by our model is also chosen more often (code available online at https://github.com/gcunhase/Emotional-Video-to-Audio-with-ANFIS-DeepRNN).

Download Full-text

Assessment of Discrimination Ability in Ipsilateral and Contralateral Ears with a Unilateral Bone-Anchored Hearing System

Ear Nose & Throat Journal ◽

10.1177/014556131709600816 ◽

2017 ◽

Vol 96 (8) ◽

pp. 297-310 ◽

Cited By ~ 1

Author(s):

Bahtiyar Celikgun ◽

M. Tayyar Kalcioglu

Keyword(s):

Bone Conduction ◽

Wide Band ◽

Speech Discrimination ◽

Discrimination Ability ◽

Men 2 ◽

Monosyllabic Words ◽

Contralateral Ear ◽

High Level ◽

Wide Band Noise ◽

Hearing System

This study aimed to determine the speech discriminatory ability of the contralateral ear of users of a unilateral bone-anchored hearing system (BAHS). The Oticon Medical Ponto Pro/Ponto Pro Power device brand was used for all patients. Five BAHS users (3 men, 2 women) participated in the study. Pure-tone air-conduction thresholds at 250 to 6,000 Hz, masked and unmasked bone-conduction thresholds at 250 to 4,000 Hz, and participants’ speech discrimination scores in both ears were determined. Speech discrimination tests were carried out in a silent environment with monosyllabic and trisyllabic word lists. After this, the ipsilateral ear (the BAHS side) was masked with wide-band noise using an insert earphone, and the word tests were repeated. A mild decrease was observed in monosyllabic words in ipsilateral masking; however, this was not found to be statistically significant. Conversely, a decrease was not observed in the repetition of trisyllabic words in any participants, even under high-level ipsilateral masking. These results suggested that unilateral BAHS application could prevent or reduce the neural deprivation of the contralateral ear.

Download Full-text

Automatic Topic Segmentation for Video Lectures Using Low and High-Level Audio Features

Proceedings of the 24th Brazilian Symposium on Multimedia and the Web - WebMedia '18 ◽

10.1145/3243082.3243096 ◽

2018 ◽

Cited By ~ 1

Author(s):

Eduardo R. Soares ◽

Eduardo Barrére

Keyword(s):

Topic Segmentation ◽

Audio Features ◽

Video Lectures ◽

High Level

Download Full-text

A deep spiking machine-hearing system for the case of invasive fish species

2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA) ◽

10.1109/inista.2017.8001126 ◽

2017 ◽

Cited By ~ 6

Author(s):

Konstantinos Demertzis ◽

Lazaros Iliadis ◽

Vardis-Dimitris Anezakis

Keyword(s):

Fish Species ◽

Invasive Fish ◽

Machine Hearing ◽

Hearing System

Download Full-text

A Machine Hearing System for Binaural Sound Localization based on Instantaneous Correlation

2018 IEEE International Symposium on Circuits and Systems (ISCAS) ◽

10.1109/iscas.2018.8351367 ◽

2018 ◽

Cited By ~ 1

Author(s):

Ying Xu ◽

Saeed Afshar ◽

Ram Kuber Singh ◽

Tara Julia Hamilton ◽

Runchun Wang ◽

...

Keyword(s):

Sound Localization ◽

Machine Hearing ◽

Hearing System ◽

Binaural Sound

Download Full-text

A Framework for Automatic Topic Segmentation in Video Lectures

10.5753/webmedia.2018.4558 ◽

2018 ◽

Author(s):

Eduardo R. Soares ◽

Eduardo Barrére

Keyword(s):

High Availability ◽

Topic Segmentation ◽

External Knowledge ◽

Video Lecture ◽

Early Fusion ◽

Point Of Interest ◽

Audio Features ◽

Current State ◽

Video Lectures ◽

High Level

Nowadays, video lectures are a very popular way to transmit knowledge, and because of that, there are many repositories with a large catalog of those videos on web. Despite all benefits that this high availability of video lectures brings, some problems also emerge from this scenario. One of these problems is that, it is very difficult find relevant content associate with those videos. Many times, students must to watch the entire video lecture to find the point of interest and, sometimes, these points are not found. For that reason, the proposal of this master’s project is to investigate and propose a novel framework based on early fusion of low and high-level audio features enriched with external knowledge from open databases for automatic topic segmentation in video lectures. We have performed preliminary experiments in two sets of video lectures using the current state of our work. The obtained results were very satisfactory, which evidences the potential of our proposal.

Download Full-text