scholarly journals Source Separation Using Dilated Time-Frequency DenseNet for Music Identification in Broadcast Contents

2020 ◽  
Vol 10 (5) ◽  
pp. 1727 ◽  
Author(s):  
Woon-Haeng Heo ◽  
Hyemi Kim ◽  
Oh-Wook Kwon

We propose a source separation architecture using dilated time-frequency DenseNet for background music identification of broadcast content. We apply source separation techniques to the mixed signals of music and speech. For the source separation purpose, we propose a new architecture to add a time-frequency dilated convolution to the conventional DenseNet in order to effectively increase the receptive field in the source separation scheme. In addition, we apply different convolutions to each frequency band of the spectrogram in order to reflect the different frequency characteristics of the low- and high-frequency bands. To verify the performance of the proposed architecture, we perform singing-voice separation and music-identification experiments. As a result, we confirm that the proposed architecture produces the best performance in both experiments because it uses the dilated convolution to reflect wide contextual information.

2021 ◽  
Vol 11 (2) ◽  
pp. 789
Author(s):  
Woon-Haeng Heo ◽  
Hyemi Kim ◽  
Oh-Wook Kwon

Herein, we proposed a multi-scale multi-band dilated time-frequency densely connected convolutional network (DenseNet) with long short-term memory (LSTM) for audio source separation. Because the spectrogram of the acoustic signal can be thought of as images as well as time series data, it is suitable for convolutional recurrent neural network (CRNN) architecture. We improved the audio source separation performance by applying the dilated block with a dilated convolution to CRNN architecture. The dilated block has the role of effectively increasing the receptive field in the spectrogram. In addition, it was designed in consideration of the acoustic characteristics that the frequency axis and the time axis in the spectrogram are changed by independent influences such as speech rate and pitch. In speech enhancement experiments, we estimated the speech signal using various deep learning architectures from a signal in which the music, noise, and speech were mixed. We conducted the subjective evaluation on the estimated speech signal. In addition, speech quality, intelligibility, separation, and speech recognition performance were also measured. In music signal separation, we estimated the music signal using several deep learning architectures from the mixture of the music and speech signal. After that, the separation performance and music identification accuracy were measured using the estimated music signal. Overall, the proposed architecture shows the best performance compared to other deep learning architectures not only in speech experiments but also in music experiments.


Author(s):  
Filippo Ghin ◽  
Louise O’Hare ◽  
Andrea Pavan

AbstractThere is evidence that high-frequency transcranial random noise stimulation (hf-tRNS) is effective in improving behavioural performance in several visual tasks. However, so far there has been limited research into the spatial and temporal characteristics of hf-tRNS-induced facilitatory effects. In the present study, electroencephalogram (EEG) was used to investigate the spatial and temporal dynamics of cortical activity modulated by offline hf-tRNS on performance on a motion direction discrimination task. We used EEG to measure the amplitude of motion-related VEPs over the parieto-occipital cortex, as well as oscillatory power spectral density (PSD) at rest. A time–frequency decomposition analysis was also performed to investigate the shift in event-related spectral perturbation (ERSP) in response to the motion stimuli between the pre- and post-stimulation period. The results showed that the accuracy of the motion direction discrimination task was not modulated by offline hf-tRNS. Although the motion task was able to elicit motion-dependent VEP components (P1, N2, and P2), none of them showed any significant change between pre- and post-stimulation. We also found a time-dependent increase of the PSD in alpha and beta bands regardless of the stimulation protocol. Finally, time–frequency analysis showed a modulation of ERSP power in the hf-tRNS condition for gamma activity when compared to pre-stimulation periods and Sham stimulation. Overall, these results show that offline hf-tRNS may induce moderate aftereffects in brain oscillatory activity.


2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
D Garcia Iglesias ◽  
J.M Rubin Lopez ◽  
D Perez Diez ◽  
C Moris De La Tassa ◽  
F.J De Cos Juez ◽  
...  

Abstract Introduction The Signal Averaged ECG (SAECG) is a classical method forSudden Cardiac Death (SCD) risk assessment, by means of Late Potentials (LP) in the filtered QRS (fQRS)[1]. But it is highly dependent on noise and require long time records, which make it tedious to use. Wavelet Continuous Transform (WCT) meanwhile is easier to use, and may let us to measure the High Frequency Content (HFC) of the QRS and QT intervals, which also correlates with the risk of SCD [2,3]. Whether the HFC of the QRS and QT measured with the WCT is a possible subrogate of LP, has never been demonstrated. Objective To demonstrate if there is any relationship between the HFC measured with the WCT and the LP analyzed with the SAECG. Methods Data from 50 consecutive healthy individuals. The standard ECG was digitally collected for 3 consecutive minutes. For the WCT Analysis 8 consecutive QT complexes were used and for the SAECG Analysis all available QRS were used. The time-frequency data of each QT complex were collected using the WCT as previously described [3] and the Total, QRS and QT power were obtained from each patient. For the SAECG, bipolar X, Y and Z leads were used with a bidirectional filter at 40 to 250 Hz [1]. LP were defined as less than 0.05 z in the terminal part of the filtered QRS and the duration (SAECG LP duration) and root mean square (SAECG LP Content) of this LP were calculated. Pearson's test was used to correlate the Power content with WCT analysis and the LP in the SAECG. Results There is a strong correlation between Total Power and the SAECG LP content (r=0.621, p<0.001). Both ST Power (r=0.567, p<0.001) and QRS Power (r=0.404, p=0.004) are related with the SAECG LP content. No correlation were found between the Power content (Total, QRS or ST Power) and the SAECG LP duration. Also no correlation was found between de SAECG LP content and duration. Conclusions Total, QRS and ST Power measured with the WCT are good surrogates of SAECG LP content. No correlation were found between WCT analysis and the SAECG LP duration. Also no correlation was found between the SAECG LP content and duration. This can be of high interest, since WCT is an easier technique, not needing long recordings and being less affected by noise. Funding Acknowledgement Type of funding source: None


Sign in / Sign up

Export Citation Format

Share Document