Improving Singing Voice Separation Using Curriculum Learning on Recurrent Neural Networks

Seungtae Kang; Jeong-Sik Park ; Gil-Jin Jang

doi:10.3390/app10072465

Improving Singing Voice Separation Using Curriculum Learning on Recurrent Neural Networks

Applied Sciences ◽

10.3390/app10072465 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2465

Author(s):

Seungtae Kang ◽

Jeong-Sik Park ◽

Gil-Jin Jang

Keyword(s):

Single Channel ◽

The Other ◽

Superior Performance ◽

Difficulty Level ◽

Difficult Case ◽

Single Source ◽

Singing Voice ◽

Learning Framework ◽

Easy Case ◽

Singing Voice Separation

Single-channel singing voice separation has been considered a difficult task, as it requires predicting two different audio sources independently from mixed vocal and instrument sounds recorded by a single microphone. We propose a new singing voice separation approach based on the curriculum learning framework, in which learning is started with only easy examples and then task difficulty is gradually increased. In this study, we regard the data providing obviously dominant characteristics of a single source as an easy case and the other data as a difficult case. To quantify the dominance property between two sources, we define a dominance factor that determines a difficulty level according to relative intensity between vocal sound and instrument sound. If a given data is determined to provide obviously dominant characteristics of a single source according to the factor, it is regarded as an easy case; otherwise, it belongs to a difficult case. Early stages in the learning focus on easy cases, thus allowing rapidly learning overall characteristics of each source. On the other hand, later stages handle difficult cases, allowing more careful and sophisticated learning. In experiments conducted on three song datasets, the proposed approach demonstrated superior performance compared to the conventional approaches.

Download Full-text

Towards Real-Time Single-Channel Singing-Voice Separation with Pruned Multi-Scaled Densenets

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp40776.2020.9053542 ◽

2020 ◽

Cited By ~ 1

Author(s):

Markus Huber ◽

Gunther Schindler ◽

Christian Schorkhuber ◽

Wolfgang Roth ◽

Franz Pernkopf ◽

...

Keyword(s):

Real Time ◽

Single Channel ◽

Singing Voice ◽

Singing Voice Separation

Download Full-text

HMD-ARG: hierarchical multi-task deep learning for annotating antibiotic resistance genes

Microbiome ◽

10.1186/s40168-021-01002-3 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Yu Li ◽

Zeling Xu ◽

Wenkai Han ◽

Huiluo Cao ◽

Ramzan Umarov ◽

...

Keyword(s):

Antibiotic Resistance ◽

Deep Learning ◽

Resistance Genes ◽

Antibiotic Resistance Genes ◽

Third Party ◽

Superior Performance ◽

Beta Lactamase ◽

Learning Framework ◽

Sequence Encoding ◽

Global Threat

Abstract Background The spread of antibiotic resistance has become one of the most urgent threats to global health, which is estimated to cause 700,000 deaths each year globally. Its surrogates, antibiotic resistance genes (ARGs), are highly transmittable between food, water, animal, and human to mitigate the efficacy of antibiotics. Accurately identifying ARGs is thus an indispensable step to understanding the ecology, and transmission of ARGs between environmental and human-associated reservoirs. Unfortunately, the previous computational methods for identifying ARGs are mostly based on sequence alignment, which cannot identify novel ARGs, and their applications are limited by currently incomplete knowledge about ARGs. Results Here, we propose an end-to-end Hierarchical Multi-task Deep learning framework for ARG annotation (HMD-ARG). Taking raw sequence encoding as input, HMD-ARG can identify, without querying against existing sequence databases, multiple ARG properties simultaneously, including if the input protein sequence is an ARG, and if so, what antibiotic family it is resistant to, what resistant mechanism the ARG takes, and if the ARG is an intrinsic one or acquired one. In addition, if the predicted antibiotic family is beta-lactamase, HMD-ARG further predicts the subclass of beta-lactamase that the ARG is resistant to. Comprehensive experiments, including cross-fold validation, third-party dataset validation in human gut microbiota, wet-experimental functional validation, and structural investigation of predicted conserved sites, demonstrate not only the superior performance of our method over the state-of-art methods, but also the effectiveness and robustness of the proposed method. Conclusions We propose a hierarchical multi-task method, HMD-ARG, which is based on deep learning and can provide detailed annotations of ARGs from three important aspects: resistant antibiotic class, resistant mechanism, and gene mobility. We believe that HMD-ARG can serve as a powerful tool to identify antibiotic resistance genes and, therefore mitigate their global threat. Our method and the constructed database are available at http://www.cbrc.kaust.edu.sa/HMDARG/.

Download Full-text

Singing Voice Separation and Vocal F0 Estimation Based on Mutual Combination of Robust Principal Component Analysis and Subharmonic Summation

IEEE/ACM Transactions on Audio Speech and Language Processing ◽

10.1109/taslp.2016.2577879 ◽

2016 ◽

Vol 24 (11) ◽

pp. 2084-2095 ◽

Cited By ~ 14

Author(s):

Yukara Ikemiya ◽

Katsutoshi Itoyama ◽

Kazuyoshi Yoshii

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Singing Voice ◽

Robust Principal Component Analysis ◽

Singing Voice Separation

Download Full-text

Semi-Supervised Singing Voice Separation With Noisy Self-Training

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9413723 ◽

2021 ◽

Author(s):

Zhepei Wang ◽

Ritwik Giri ◽

Umut Isik ◽

Jean-Marc Valin ◽

Arvindh Krishnaswamy

Keyword(s):

Singing Voice ◽

Singing Voice Separation

Download Full-text

A recurrent encoder-decoder approach with skip-filtering connections for monaural singing voice separation

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) ◽

10.1109/mlsp.2017.8168117 ◽

2017 ◽

Cited By ~ 11

Author(s):

Stylianos Ioannis Mimilakis ◽

Konstantinos Drossos ◽

Tuomas Virtanen ◽

Gerald Schuller

Keyword(s):

Singing Voice ◽

Singing Voice Separation

Download Full-text

Effects of Speech Synthesis on the Proofreading Efficiency of Postsecondary Students with Learning Disabilities

Learning Disability Quarterly ◽

10.2307/1511201 ◽

1995 ◽

Vol 18 (2) ◽

pp. 141-158 ◽

Cited By ~ 58

Author(s):

Marshall H. Raskind ◽

Eleanor Higgins

Keyword(s):

Learning Disabilities ◽

Speech Synthesis ◽

Written Language ◽

The Other ◽

Read Aloud ◽

Synthesis Condition ◽

Superior Performance ◽

Postsecondary Students ◽

Students With Learning Disabilities ◽

Synthesis System

This study investigated the effects of speech synthesis on the proofreading efficiency of postsecondary students with learning disabilities. Subjects proofread self-generated written language samples under three conditions: (a) using a speech synthesis system that simultaneously highlighted and “spoke” words on a computer monitor, (b) having the text read aloud to them by another person, and (c) receiving no assistance. Using the speech synthesis system enabled subjects to detect a significantly higher percentage of total errors than either of the other two proofreading conditions. In addition, subjects were able to locate a significantly higher percentage of capitalization, spelling, usage and typographical errors under the speech synthesis condition. However, having the text read aloud by another person significantly outperformed the other conditions in finding “grammar-mechanical” errors. Results are discussed with regard to underlying reasons for the overall superior performance of the speech synthesis system and the implications of using speech synthesis as a compensatory writing aid for postsecondary students with learning disabilities.

Download Full-text

Multi-Stage Non-Negative Matrix Factorization for Monaural Singing Voice Separation

IEEE Transactions on Audio Speech and Language Processing ◽

10.1109/tasl.2013.2266773 ◽

2013 ◽

Vol 21 (10) ◽

pp. 2096-2107 ◽

Cited By ~ 29

Author(s):

Bilei Zhu ◽

Wei Li ◽

Ruijiang Li ◽

Xiangyang Xue

Keyword(s):

Matrix Factorization ◽

Singing Voice ◽

Multi Stage ◽

Singing Voice Separation ◽

Non Negative Matrix Factorization

Download Full-text

Comparison of the productivity of Texel and Rouge de l’Ouest ewes and their crosses

Animal Science ◽

10.1017/s1357729800053224 ◽

2002 ◽

Vol 75 (3) ◽

pp. 459-468 ◽

Cited By ~ 4

Author(s):

L. E. R. Dawson ◽

A. F. Carson ◽

L. O. W. McClinton

Keyword(s):

Growth Rate ◽

Weight Gain ◽

Live Weight ◽

The Other ◽

Superior Performance ◽

Live Weight Gain ◽

Low Levels ◽

Lamb Growth ◽

Average Figure

AbstractAn experiment was undertaken to compare the productivity of crossbred ewes, produced by crossing Texel sires with Rouge de l’Ouest (Rouge) dams and Rouge sires with Texel dams, relative to purebred Texel and Rouge ewes. The purebred and crossbred ewes were crossed with Rouge and Texel sires. The proportion of productive ewes was similar in the purebred and crossbred ewes with an average figure of 0·92. Irrespective of crossing sire, Rouge ewes produced 0·48 more lambs per ewe lambed than Texel ewes (P < 0·001). The two crossbred ewe types (Texel ✕ Rouge and Rouge ✕ Texel) each produced similar numbers of lambs (on average 1·92 lambs per ewe lambed). Individual heterosis values for ewe fertility and prolificacy were small and not significant (–1·67 for the proportion of productive ewes and –3·14 for the number of lambs born per ewe lambed). Maternal heterosis values were also not significant but were of larger magnitude (6·26 for ewe fertility and 3·12 for prolificacy). Lamb mortality (number of lambs born dead per ewe lambed) at birth was similar for purebred Rouge (0·44) and Texel (0·30) ewes and was significantly reduced by crossbred matings and mating the crossbred ewes (individual heterosis –30·68, P < 0·10; maternal heterosis –80·23, P < 0·001). Individual and maternal heterosis values for lamb growth rate from birth to six weeks were 8 (P < 0·05) and 4 (P > 0·05) respectively. Lamb growth rate from birth to weaning was significantly lower in lambs from Texel ewes compared with those from the other genotypes (P < 0·05). Individual and maternal heterosis values for live-weight gain from birth to weaning were 5 (P < 0·10) and 5 (P < 0·01). The results of the current study demonstrate the superior performance of purebred Rouge ewes compared with purebred Texel ewes in terms of prolificacy and lamb growth rate from birth to weaning. However, both breeds had high lamb mortality at birth. Crossbreeding led to the production of hybrid ewes which had relatively high prolificacy with low levels of dystocia and lamb mortality.

Download Full-text

Neural networks to learn protein sequence-function relationships from deep mutational scanning data

10.1101/2020.10.25.353946 ◽

2020 ◽

Author(s):

Sam Gelman ◽

Philip A. Romero ◽

Anthony Gitter

Keyword(s):

Protein Structure ◽

Protein Sequence ◽

Internal Representation ◽

Superior Performance ◽

Network Architectures ◽

Convolutional Network ◽

Learning Framework ◽

And Function ◽

Multiple Neural Network ◽

Function Mapping

ABSTRACTThe mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties. We present a supervised deep learning framework to learn the sequence-function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence-function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Our software is available from https://github.com/gitter-lab/nn4dms.

Download Full-text

Authoring Bioconductor workflows with BiocWorkflowTools

F1000Research ◽

10.12688/f1000research.14399.1 ◽

2018 ◽

Vol 7 ◽

pp. 431

Author(s):

Mike L. Smith ◽

Andrzej K. Oleś ◽

Wolfgang Huber

Keyword(s):

Data Analysis ◽

The Other ◽

Technical Solution ◽

Single Source ◽

Journal Publication ◽

Source Document ◽

Manual Intervention ◽

Starting Point ◽

The Moment ◽

High Degree

The Bioconductor Gateway on the F1000Research platform is a channel for peer-reviewed and citable publication of end-to-end data analysis workflows rooted in the Bioconductor ecosystem. In addition to the largely static journal publication, it is hoped that authors will also deposit their workflows as executable documents on Bioconductor, where the benefits of regular code testing and easy updating can be realized. Ideally these two endpoints would be met from a single source document. However, so far this has not been easy, due to lack of a technical solution that meets both the requirements of the F1000Research article submission format and the executable documents on Bioconductor. Submission to the platform requires a LaTeX file, which many authors traditionally have produced by writing an Rnw document for Sweave or knitr. On the other hand, to produce the HTML rendering of the document hosted by Bioconductor, the most straightforward starting point is the R Markdown format. Tools such as pandoc enable conversion between many formats, but typically a high degree of manual intervention used to be required to satisfactorily handle aspects such as floating figures, cross-references, literature references, and author affiliations. The BiocWorkflowTools package aims to solve this problem by enabling authors to work with R Markdown right up until the moment they wish to submit to the platform.

Download Full-text