Psychoacoustic measures of blind audio source separation performance

Mingu Lee; Inseok Heo; Nakjin Choi; Koeng‐Mo Sung

doi:10.1121/1.2935809

Audio Source Separation using Sparse Representations

Machine Audition ◽

10.4018/978-1-61520-919-4.ch010 ◽

2010 ◽

pp. 246-265 ◽

Cited By ~ 1

Author(s):

Andrew Nesbit ◽

Maria G. Jafar ◽

Emmanuel Vincent ◽

Mark D. Plumbley

Keyword(s):

Source Separation ◽

Feedback System ◽

Decomposition Methods ◽

Audio Coding ◽

Future Research ◽

Separation Performance ◽

Audio Source Separation ◽

Sparse Component Analysis ◽

Good Signal ◽

Music Signals

The authors address the problem of audio source separation, namely, the recovery of audio signals from recordings of mixtures of those signals. The sparse component analysis framework is a powerful method for achieving this. Sparse orthogonal transforms, in which only few transform coefficients differ significantly from zero, are developed; once the signal has been transformed, energy is apportioned from each transform coefficient to each estimated source, and, finally, the signal is reconstructed using the inverse transform. The overriding aim of this chapter is to demonstrate how this framework, as exemplified here by two different decomposition methods which adapt to the signal to represent it sparsely, can be used to solve different problems in different mixing scenarios. To address the instantaneous (neither delays nor echoes) and underdetermined (more sources than mixtures) mixing model, a lapped orthogonal transform is adapted to the signal by selecting a basis from a library of predetermined bases. This method is highly related to the windowing methods used in the MPEG audio coding framework. In considering the anechoic (delays but no echoes) and determined (equal number of sources and mixtures) mixing case, a greedy adaptive transform is used based on orthogonal basis functions that are learned from the observed data, instead of being selected from a predetermined library of bases. This is found to encode the signal characteristics, by introducing a feedback system between the bases and the observed data. Experiments on mixtures of speech and music signals demonstrate that these methods give good signal approximations and separation performance, and indicate promising directions for future research.

Download Full-text

Real-time blind audio source separation: performance assessment on an advanced digital signal processor

The Journal of Supercomputing ◽

10.1007/s11227-014-1252-4 ◽

2014 ◽

Vol 70 (3) ◽

pp. 1555-1576 ◽

Cited By ~ 3

Author(s):

Danilo Pani ◽

Alessandro Pani ◽

Luigi Raffo

Keyword(s):

Performance Assessment ◽

Real Time ◽

Digital Signal Processor ◽

Source Separation ◽

Digital Signal ◽

Separation Performance ◽

Audio Source Separation ◽

Signal Processor

Download Full-text

Integrating Dilated Convolution into DenseLSTM for Audio Source Separation

Applied Sciences ◽

10.3390/app11020789 ◽

2021 ◽

Vol 11 (2) ◽

pp. 789

Author(s):

Woon-Haeng Heo ◽

Hyemi Kim ◽

Oh-Wook Kwon

Keyword(s):

Deep Learning ◽

Speech Signal ◽

Source Separation ◽

Series Data ◽

Separation Performance ◽

Time Frequency ◽

Dilated Convolution ◽

Audio Source Separation ◽

Music Signal ◽

Learning Architectures

Herein, we proposed a multi-scale multi-band dilated time-frequency densely connected convolutional network (DenseNet) with long short-term memory (LSTM) for audio source separation. Because the spectrogram of the acoustic signal can be thought of as images as well as time series data, it is suitable for convolutional recurrent neural network (CRNN) architecture. We improved the audio source separation performance by applying the dilated block with a dilated convolution to CRNN architecture. The dilated block has the role of effectively increasing the receptive field in the spectrogram. In addition, it was designed in consideration of the acoustic characteristics that the frequency axis and the time axis in the spectrogram are changed by independent influences such as speech rate and pitch. In speech enhancement experiments, we estimated the speech signal using various deep learning architectures from a signal in which the music, noise, and speech were mixed. We conducted the subjective evaluation on the estimated speech signal. In addition, speech quality, intelligibility, separation, and speech recognition performance were also measured. In music signal separation, we estimated the music signal using several deep learning architectures from the mixture of the music and speech signal. After that, the separation performance and music identification accuracy were measured using the estimated music signal. Overall, the proposed architecture shows the best performance compared to other deep learning architectures not only in speech experiments but also in music experiments.

Download Full-text

A Bayesian Hierarchical Model for Blind Audio Source Separation

2020 28th European Signal Processing Conference (EUSIPCO) ◽

10.23919/eusipco47968.2020.9287348 ◽

2021 ◽

Author(s):

Yaron Laufer ◽

Sharon Gannot

Keyword(s):

Hierarchical Model ◽

Source Separation ◽

Bayesian Hierarchical Model ◽

Bayesian Hierarchical ◽

Audio Source Separation

Download Full-text

Monoaural Audio Source Separation Using Variational Autoencoders

10.21437/interspeech.2018-1140 ◽

2018 ◽

Cited By ~ 3

Author(s):

Laxmi Pandey ◽

Anurendra Kumar ◽

Vinay Namboodiri

Keyword(s):

Source Separation ◽

Audio Source Separation

Download Full-text

Towards Listening to 10 People Simultaneously: An Efficient Permutation Invariant Training of Audio Source Separation Using Sinkhorn’s Algorithm

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414508 ◽

2021 ◽

Author(s):

Hideyuki Tachibana

Keyword(s):

Source Separation ◽

Audio Source Separation

Download Full-text

Real Time Blind Audio Source Separation Based on Machine Learning Algorithms

2020 2nd Novel Intelligent and Leading Emerging Sciences Conference (NILES) ◽

10.1109/niles50944.2020.9257891 ◽

2020 ◽

Author(s):

Arwa Alghamdi ◽

Graham Healy ◽

Hoda Abdelhafez

Keyword(s):

Machine Learning ◽

Real Time ◽

Learning Algorithms ◽

Source Separation ◽

Machine Learning Algorithms ◽

Audio Source Separation

Download Full-text

Enhanced Audio Source Separation and Musical Component Analysis

2020 IEEE International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC) ◽

10.1109/isssc50941.2020.9358850 ◽

2020 ◽

Author(s):

Tanmay Bhagwat ◽

Shubham Deolalkar ◽

Jayesh Lokhande ◽

Leena Ragha

Keyword(s):

Source Separation ◽

Component Analysis ◽

Audio Source Separation

Download Full-text

Multichannel Audio Source Separation With Deep Neural Networks

IEEE/ACM Transactions on Audio Speech and Language Processing ◽

10.1109/taslp.2016.2580946 ◽

2016 ◽

Vol 24 (9) ◽

pp. 1652-1664 ◽

Cited By ~ 92

Author(s):

Aditya Arie Nugraha ◽

Antoine Liutkus ◽

Emmanuel Vincent

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Source Separation ◽

Multichannel Audio ◽

Audio Source Separation

Download Full-text

Blind Complex Source Separation Based on Cyclostationary Statistics

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.519-520.1051 ◽

2014 ◽

Vol 519-520 ◽

pp. 1051-1056

Author(s):

Jie Guo ◽

An Quan Wei ◽

Lei Tang

Keyword(s):

Blind Source Separation ◽

Source Separation ◽

Separation Performance ◽

Separation Algorithm ◽

Complex Signals ◽

Cyclic Frequency ◽

Complex Source ◽

Simulation Results ◽

Better Than ◽

Separation Model

This paper analyzed a blind source separation algorithm based on cyclic frequency of complex signals. Under the blind source separation model, we firstly gave several useful assumptions. Then we discussed the derivation of the BSS algorithm, including the complex signals and the normalization situation. Later, we analyzed the complex WCW-CS algorithm, which was compared with NGA, NEASI and NGA-CS algorithms. Simulation results show that the complex WCW-CS algorithm has the best convergence and separation performance. It can also effectively separate mixed image signals, whose performance was better than NGA algorithm.

Download Full-text