scholarly journals A Machine Hearing System for Robust Cough Detection Based on a High-Level Representation of Band-Specific Audio Features

2019 ◽  
Vol 66 (8) ◽  
pp. 2319-2330 ◽  
Author(s):  
Jesus Monge-Alvarez ◽  
Carlos Hoyos-Barcelo ◽  
Luis Miguel San-Jose-Revuelta ◽  
Pablo Casaseca-de-la-Higuera
2007 ◽  
Vol 01 (03) ◽  
pp. 377-402 ◽  
Author(s):  
SHU-CHING CHEN ◽  
NA ZHAO ◽  
MEI-LING SHYU

In this paper, a user-centered framework is proposed for video database modeling and retrieval to provide appealing multimedia experiences on the content-based video queries. By incorporating the Hierarchical Markov Model Mediator (HMMM) mechanism, the source videos, segmented video shots, visual/audio features, semantic events, and high-level user perceptions are seamlessly integrated in a video database. With the hierarchical and stochastic design for video databases and semantic concept modeling, the proposed framework supports the retrieval for not only single events but also temporal sequences with multiple events. Additionally, an innovative method is proposed to capture the individual user's preferences by considering both the low-level features and the semantic concepts. The retrieval and ranking of video events and the temporal patterns can be updated dynamically online to satisfy individual user's interest and information requirements. Moreover, the users' feedbacks are efficiently accumulated for the offline system training process such that the overall retrieval performance can be enhanced periodically and continuously. For the evaluation of the proposed approach, a soccer video retrieval system is developed, presented, and tested to demonstrate the overall retrieval performance improvement achieved by modeling and capturing the user preferences.


2020 ◽  
Vol 2020 ◽  
pp. 1-15
Author(s):  
Gwenaelle Cunha Sergio ◽  
Minho Lee

Generating music with emotion similar to that of an input video is a very relevant issue nowadays. Video content creators and automatic movie directors benefit from maintaining their viewers engaged, which can be facilitated by producing novel material eliciting stronger emotions in them. Moreover, there is currently a demand for more empathetic computers to aid humans in applications such as augmenting the perception ability of visually- and/or hearing-impaired people. Current approaches overlook the video’s emotional characteristics in the music generation step, only consider static images instead of videos, are unable to generate novel music, and require a high level of human effort and skills. In this study, we propose a novel hybrid deep neural network that uses an Adaptive Neuro-Fuzzy Inference System to predict a video’s emotion from its visual features and a deep Long Short-Term Memory Recurrent Neural Network to generate its corresponding audio signals with similar emotional inkling. The former is able to appropriately model emotions due to its fuzzy properties, and the latter is able to model data with dynamic time properties well due to the availability of the previous hidden state information. The novelty of our proposed method lies in the extraction of visual emotional features in order to transform them into audio signals with corresponding emotional aspects for users. Quantitative experiments show low mean absolute errors of 0.217 and 0.255 in the Lindsey and DEAP datasets, respectively, and similar global features in the spectrograms. This indicates that our model is able to appropriately perform domain transformation between visual and audio features. Based on experimental results, our model can effectively generate an audio that matches the scene eliciting a similar emotion from the viewer in both datasets, and music generated by our model is also chosen more often (code available online at https://github.com/gcunhase/Emotional-Video-to-Audio-with-ANFIS-DeepRNN).


2017 ◽  
Vol 96 (8) ◽  
pp. 297-310 ◽  
Author(s):  
Bahtiyar Celikgun ◽  
M. Tayyar Kalcioglu

This study aimed to determine the speech discriminatory ability of the contralateral ear of users of a unilateral bone-anchored hearing system (BAHS). The Oticon Medical Ponto Pro/Ponto Pro Power device brand was used for all patients. Five BAHS users (3 men, 2 women) participated in the study. Pure-tone air-conduction thresholds at 250 to 6,000 Hz, masked and unmasked bone-conduction thresholds at 250 to 4,000 Hz, and participants’ speech discrimination scores in both ears were determined. Speech discrimination tests were carried out in a silent environment with monosyllabic and trisyllabic word lists. After this, the ipsilateral ear (the BAHS side) was masked with wide-band noise using an insert earphone, and the word tests were repeated. A mild decrease was observed in monosyllabic words in ipsilateral masking; however, this was not found to be statistically significant. Conversely, a decrease was not observed in the repetition of trisyllabic words in any participants, even under high-level ipsilateral masking. These results suggested that unilateral BAHS application could prevent or reduce the neural deprivation of the contralateral ear.


2018 ◽  
Author(s):  
Eduardo R. Soares ◽  
Eduardo Barrére

Nowadays, video lectures are a very popular way to transmit knowledge, and because of that, there are many repositories with a large catalog of those videos on web. Despite all benefits that this high availability of video lectures brings, some problems also emerge from this scenario. One of these problems is that, it is very difficult find relevant content associate with those videos. Many times, students must to watch the entire video lecture to find the point of interest and, sometimes, these points are not found. For that reason, the proposal of this master’s project is to investigate and propose a novel framework based on early fusion of low and high-level audio features enriched with external knowledge from open databases for automatic topic segmentation in video lectures. We have performed preliminary experiments in two sets of video lectures using the current state of our work. The obtained results were very satisfactory, which evidences the potential of our proposal.


Sign in / Sign up

Export Citation Format

Share Document