musical audio
Recently Published Documents


TOTAL DOCUMENTS

80
(FIVE YEARS 6)

H-INDEX

13
(FIVE YEARS 0)

Electronics ◽  
2021 ◽  
Vol 10 (13) ◽  
pp. 1518
Author(s):  
António S. Pinto ◽  
Sebastian Böck ◽  
Jaime S. Cardoso ◽  
Matthew E. P. Davies

The extraction of the beat from musical audio signals represents a foundational task in the field of music information retrieval. While great advances in performance have been achieved due the use of deep neural networks, significant shortcomings still remain. In particular, performance is generally much lower on musical content that differs from that which is contained in existing annotated datasets used for neural network training, as well as in the presence of challenging musical conditions such as rubato. In this paper, we positioned our approach to beat tracking from a real-world perspective where an end-user targets very high accuracy on specific music pieces and for which the current state of the art is not effective. To this end, we explored the use of targeted fine-tuning of a state-of-the-art deep neural network based on a very limited temporal region of annotated beat locations. We demonstrated the success of our approach via improved performance across existing annotated datasets and a new annotation-correction approach for evaluation. Furthermore, we highlighted the ability of content-specific fine-tuning to learn both what is and what is not the beat in challenging musical conditions.


Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1349
Author(s):  
Stefan Lattner ◽  
Javier Nistal

Lossy audio codecs compress (and decompress) digital audio streams by removing information that tends to be inaudible in human perception. Under high compression rates, such codecs may introduce a variety of impairments in the audio signal. Many works have tackled the problem of audio enhancement and compression artifact removal using deep-learning techniques. However, only a few works tackle the restoration of heavily compressed audio signals in the musical domain. In such a scenario, there is no unique solution for the restoration of the original signal. Therefore, in this study, we test a stochastic generator of a Generative Adversarial Network (GAN) architecture for this task. Such a stochastic generator, conditioned on highly compressed musical audio signals, could one day generate outputs indistinguishable from high-quality releases. Therefore, the present study may yield insights into more efficient musical data storage and transmission. We train stochastic and deterministic generators on MP3-compressed audio signals with 16, 32, and 64 kbit/s. We perform an extensive evaluation of the different experiments utilizing objective metrics and listening tests. We find that the models can improve the quality of the audio signals over the MP3 versions for 16 and 32 kbit/s and that the stochastic generators are capable of generating outputs that are closer to the original signals than those of the deterministic generators.


2020 ◽  
Author(s):  
Paul Alexander Bloom ◽  
Ella Bartlett ◽  
Nicholas Kathios ◽  
Sameah Algharazi ◽  
Matthew Siegelman ◽  
...  

Familiar music facilitates memory retrieval in adults with Alzheimer's disease and other forms of dementia. This raises the possibility that music can be used as a rehabilitative tool to aid memory abilities more generally. However, the mechanisms behind this effect, and its generality, are unclear because of a lack of parallel work in healthy aging. In particular, exposure to familiar music enhances spontaneous recall of memories directly cued by the music, but it is unknown whether such effects extend to deliberate recall more generally — e.g., to memories not directly linked to the music being played. It is also unclear whether familiar music cues boost recall of specific episodes versus more generalized semantic memories, or whether its effects are partly driven by domain-general mechanisms (e.g., improved mood). In the current study, we will examine the effects of familiar music on deliberate recall and differentiate potential underlying mechanisms. We will expose healthy adults ages 65-80 years old (N = 75) to familiar music clips from earlier in life, unfamiliar music clips, and non-musical audio clips across three study sessions. Immediately after each clip, we will assess free recall of remote memories for pre-selected events. Those memories will then be scored for episodic and semantic details using the Autobiographical Interview. We hypothesize that familiar music may enhance recall of specific events, such that participants will recall more episodic details after exposure to familiar music than unfamiliar music or non-musical audio. We will also test a competing hypothesis that familiar music may prompt more general recollections of periods of life, and thus will increase recall of semantic details in comparison with the unfamiliar music and no-music conditions. The results of this study will advance knowledge of the mechanisms by which music affects memory, with potential implications for the use of music as a therapeutic device for declining memory.


Fingerprint design is the cornerstone of the audio recognition systems in which aims robustness and fast retrieval. Short-term Fourier transform and Mel-spectral representations are common for the task in mind, however these extraction methods suffer from being unstable and having limited spectral-spatial resolution. Scattering wavelet transform (SWT) provides another approach to these limitations by recovering information loss, while ensuring translation invariance and stability. We propose a two-stage feature extraction framework using SWT coupled with deep Siamese hashing model for musical audio recognition. Similarity-preserving hashes are the final fingerprints and in the projected embedding space, similarity is defined by a distance metric. Hashing model is trained by roughly aligned and non-matching audio snippets to model musical audio data via two-layer scattering spectrum. Our proposed framework provides competitive performance results to identify audio signals superimposed with environmental noise which can be modeled as real-world obstacles for music recognition. With a very compact storage footprint (256 bytes/sec.), we achieve 98.2% ROC AUC score on GTZAN dataset


Sign in / Sign up

Export Citation Format

Share Document