Source Separation Using Dilated Time-Frequency DenseNet for Music Identification in Broadcast Contents

Woon-Haeng Heo; Hyemi Kim; Oh-Wook Kwon

doi:10.3390/app10051727

Source Separation Using Dilated Time-Frequency DenseNet for Music Identification in Broadcast Contents

Applied Sciences ◽

10.3390/app10051727 ◽

2020 ◽

Vol 10 (5) ◽

pp. 1727 ◽

Cited By ~ 3

Author(s):

Woon-Haeng Heo ◽

Hyemi Kim ◽

Oh-Wook Kwon

Keyword(s):

High Frequency ◽

Contextual Information ◽

Source Separation ◽

Background Music ◽

Separation Scheme ◽

Singing Voice ◽

Time Frequency ◽

Dilated Convolution ◽

Music Identification ◽

Singing Voice Separation

We propose a source separation architecture using dilated time-frequency DenseNet for background music identification of broadcast content. We apply source separation techniques to the mixed signals of music and speech. For the source separation purpose, we propose a new architecture to add a time-frequency dilated convolution to the conventional DenseNet in order to effectively increase the receptive field in the source separation scheme. In addition, we apply different convolutions to each frequency band of the spectrogram in order to reflect the different frequency characteristics of the low- and high-frequency bands. To verify the performance of the proposed architecture, we perform singing-voice separation and music-identification experiments. As a result, we confirm that the proposed architecture produces the best performance in both experiments because it uses the dilated convolution to reflect wide contextual information.

Download Full-text

Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2018.8461822 ◽

2018 ◽

Cited By ~ 7

Author(s):

Stylianos Ioannis Mimilakis ◽

Konstantinos Drossos ◽

Joao F. Santos ◽

Gerald Schuller ◽

Tuomas Virtanen ◽

...

Keyword(s):

Singing Voice ◽

Time Frequency ◽

Singing Voice Separation

Download Full-text

Monaural Singing Voice Separation Using Fusion-Net with Time-Frequency Masking

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) ◽

10.1109/apsipaasc47483.2019.9023055 ◽

2019 ◽

Author(s):

Feng Li ◽

Kaizhi Qian ◽

Mark Hasegawa-Johnson ◽

Masato Akagi

Keyword(s):

Singing Voice ◽

Time Frequency ◽

Singing Voice Separation

Download Full-text

Time-frequency clustering with weighted and contextual information for convolutive blind source separation

2014 IEEE Workshop on Statistical Signal Processing (SSP) ◽

10.1109/ssp.2014.6884599 ◽

2014 ◽

Author(s):

Ingrid Jafari ◽

Matt Atcheson ◽

Roberto Togneri ◽

Sven Nordholm

Keyword(s):

Blind Source Separation ◽

Contextual Information ◽

Source Separation ◽

Time Frequency ◽

Convolutive Blind Source Separation

Download Full-text

Monaural Music-Speech Source Separation Based on Convolutional Neural Network for Background Music Identification in TV Shows

The Journal of Korean Institute of Communications and Information Sciences ◽

10.7840/kics.2020.45.5.855 ◽

2020 ◽

Vol 45 (5) ◽

pp. 855-866

Author(s):

Hyemi Kimw ◽

Woon-Haeng Heo ◽

Junghyun Kim ◽

Jihyun Park

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Source Separation ◽

Background Music ◽

Music Identification ◽

Tv Shows

Download Full-text

On the perceptual relevance of objective source separation measures for singing voice separation

2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) ◽

10.1109/waspaa.2015.7336923 ◽

2015 ◽

Cited By ~ 4

Author(s):

Udit Gupta ◽

Elliot Moore ◽

Alexander Lerch

Keyword(s):

Source Separation ◽

Singing Voice ◽

Singing Voice Separation

Download Full-text

Latent time-frequency component analysis: A novel pitch-based approach for singing voice separation

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2015.7177946 ◽

2015 ◽

Cited By ~ 1

Author(s):

Xiu Zhang ◽

Wei Li ◽

Bilei Zhu

Keyword(s):

Component Analysis ◽

Frequency Component ◽

Singing Voice ◽

Time Frequency ◽

Latent Time ◽

Singing Voice Separation

Download Full-text

Integrating Dilated Convolution into DenseLSTM for Audio Source Separation

Applied Sciences ◽

10.3390/app11020789 ◽

2021 ◽

Vol 11 (2) ◽

pp. 789

Author(s):

Woon-Haeng Heo ◽

Hyemi Kim ◽

Oh-Wook Kwon

Keyword(s):

Deep Learning ◽

Speech Signal ◽

Source Separation ◽

Series Data ◽

Separation Performance ◽

Time Frequency ◽

Dilated Convolution ◽

Audio Source Separation ◽

Music Signal ◽

Learning Architectures

Herein, we proposed a multi-scale multi-band dilated time-frequency densely connected convolutional network (DenseNet) with long short-term memory (LSTM) for audio source separation. Because the spectrogram of the acoustic signal can be thought of as images as well as time series data, it is suitable for convolutional recurrent neural network (CRNN) architecture. We improved the audio source separation performance by applying the dilated block with a dilated convolution to CRNN architecture. The dilated block has the role of effectively increasing the receptive field in the spectrogram. In addition, it was designed in consideration of the acoustic characteristics that the frequency axis and the time axis in the spectrogram are changed by independent influences such as speech rate and pitch. In speech enhancement experiments, we estimated the speech signal using various deep learning architectures from a signal in which the music, noise, and speech were mixed. We conducted the subjective evaluation on the estimated speech signal. In addition, speech quality, intelligibility, separation, and speech recognition performance were also measured. In music signal separation, we estimated the music signal using several deep learning architectures from the mixture of the music and speech signal. After that, the separation performance and music identification accuracy were measured using the estimated music signal. Overall, the proposed architecture shows the best performance compared to other deep learning architectures not only in speech experiments but also in music experiments.

Download Full-text

Electrophysiological aftereffects of high-frequency transcranial random noise stimulation (hf-tRNS): an EEG investigation

Experimental Brain Research ◽

10.1007/s00221-021-06142-4 ◽

2021 ◽

Author(s):

Filippo Ghin ◽

Louise O’Hare ◽

Andrea Pavan

Keyword(s):

High Frequency ◽

Discrimination Task ◽

Temporal Dynamics ◽

Random Noise ◽

Decomposition Analysis ◽

Oscillatory Activity ◽

Motion Direction ◽

Time Frequency ◽

Direction Discrimination ◽

Transcranial Random Noise Stimulation

AbstractThere is evidence that high-frequency transcranial random noise stimulation (hf-tRNS) is effective in improving behavioural performance in several visual tasks. However, so far there has been limited research into the spatial and temporal characteristics of hf-tRNS-induced facilitatory effects. In the present study, electroencephalogram (EEG) was used to investigate the spatial and temporal dynamics of cortical activity modulated by offline hf-tRNS on performance on a motion direction discrimination task. We used EEG to measure the amplitude of motion-related VEPs over the parieto-occipital cortex, as well as oscillatory power spectral density (PSD) at rest. A time–frequency decomposition analysis was also performed to investigate the shift in event-related spectral perturbation (ERSP) in response to the motion stimuli between the pre- and post-stimulation period. The results showed that the accuracy of the motion direction discrimination task was not modulated by offline hf-tRNS. Although the motion task was able to elicit motion-dependent VEP components (P1, N2, and P2), none of them showed any significant change between pre- and post-stimulation. We also found a time-dependent increase of the PSD in alpha and beta bands regardless of the stimulation protocol. Finally, time–frequency analysis showed a modulation of ERSP power in the hf-tRNS condition for gamma activity when compared to pre-stimulation periods and Sham stimulation. Overall, these results show that offline hf-tRNS may induce moderate aftereffects in brain oscillatory activity.

Download Full-text

Comparison of wavelet continuous transform and signal averaged ECG for high frequency content and late potential detection in healthy individuals

European Heart Journal ◽

10.1093/ehjci/ehaa946.3447 ◽

2020 ◽

Vol 41 (Supplement_2) ◽

Author(s):

D Garcia Iglesias ◽

J.M Rubin Lopez ◽

D Perez Diez ◽

C Moris De La Tassa ◽

F.J De Cos Juez ◽

...

Keyword(s):

High Frequency ◽

Classical Method ◽

Total Power ◽

Frequency Content ◽

Healthy Individuals ◽

Time Frequency ◽

Qt Intervals ◽

Long Time ◽

Power Content ◽

High Frequency Content

Abstract Introduction The Signal Averaged ECG (SAECG) is a classical method forSudden Cardiac Death (SCD) risk assessment, by means of Late Potentials (LP) in the filtered QRS (fQRS)[1]. But it is highly dependent on noise and require long time records, which make it tedious to use. Wavelet Continuous Transform (WCT) meanwhile is easier to use, and may let us to measure the High Frequency Content (HFC) of the QRS and QT intervals, which also correlates with the risk of SCD [2,3]. Whether the HFC of the QRS and QT measured with the WCT is a possible subrogate of LP, has never been demonstrated. Objective To demonstrate if there is any relationship between the HFC measured with the WCT and the LP analyzed with the SAECG. Methods Data from 50 consecutive healthy individuals. The standard ECG was digitally collected for 3 consecutive minutes. For the WCT Analysis 8 consecutive QT complexes were used and for the SAECG Analysis all available QRS were used. The time-frequency data of each QT complex were collected using the WCT as previously described [3] and the Total, QRS and QT power were obtained from each patient. For the SAECG, bipolar X, Y and Z leads were used with a bidirectional filter at 40 to 250 Hz [1]. LP were defined as less than 0.05 z in the terminal part of the filtered QRS and the duration (SAECG LP duration) and root mean square (SAECG LP Content) of this LP were calculated. Pearson's test was used to correlate the Power content with WCT analysis and the LP in the SAECG. Results There is a strong correlation between Total Power and the SAECG LP content (r=0.621, p<0.001). Both ST Power (r=0.567, p<0.001) and QRS Power (r=0.404, p=0.004) are related with the SAECG LP content. No correlation were found between the Power content (Total, QRS or ST Power) and the SAECG LP duration. Also no correlation was found between de SAECG LP content and duration. Conclusions Total, QRS and ST Power measured with the WCT are good surrogates of SAECG LP content. No correlation were found between WCT analysis and the SAECG LP duration. Also no correlation was found between the SAECG LP content and duration. This can be of high interest, since WCT is an easier technique, not needing long recordings and being less affected by noise. Funding Acknowledgement Type of funding source: None

Download Full-text

Linear Multichannel Blind Source Separation based on Time-Frequency Mask Obtained by Harmonic/Percussive Sound Separation

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9413494 ◽

2021 ◽

Author(s):

Soichiro Oyabu ◽

Daichi Kitamura ◽

Kohei Yatabe

Keyword(s):

Blind Source Separation ◽

Source Separation ◽

Time Frequency ◽

Sound Separation

Download Full-text