Vocal Melody Extraction via HRNet-Based Singing Voice Separation and Encoder-Decoder-Based F0 Estimation

Yongwei Gao; Xulong Zhang; Wei Li

doi:10.3390/electronics10030298

Vocal Melody Extraction via HRNet-Based Singing Voice Separation and Encoder-Decoder-Based F0 Estimation

Electronics ◽

10.3390/electronics10030298 ◽

2021 ◽

Vol 10 (3) ◽

pp. 298

Author(s):

Yongwei Gao ◽

Xulong Zhang ◽

Wei Li

Keyword(s):

Information Retrieval ◽

High Resolution ◽

Fundamental Frequency ◽

State Of The Art ◽

Main Difficulty ◽

Harmonic Structure ◽

Singing Voice ◽

Pitch Estimation ◽

Music Information ◽

Singing Voice Separation

Vocal melody extraction is an important and challenging task in music information retrieval. One main difficulty is that, most of the time, various instruments and singing voices are mixed according to harmonic structure, making it hard to identify the fundamental frequency (F0) of a singing voice. Therefore, reducing the interference of accompaniment is beneficial to pitch estimation of the singing voice. In this paper, we first adopted a high-resolution network (HRNet) to separate vocals from polyphonic music, then designed an encoder-decoder network to estimate the vocal F0 values. Experiment results demonstrate that the effectiveness of the HRNet-based singing voice separation method in reducing the interference of accompaniment on the extraction of vocal melody, and the proposed vocal melody extraction (VME) system outperforms other state-of-the-art algorithms in most cases.

Download Full-text

Contextual music information retrieval and recommendation: State of the art and challenges

Computer Science Review ◽

10.1016/j.cosrev.2012.04.002 ◽

2012 ◽

Vol 6 (2-3) ◽

pp. 89-119 ◽

Cited By ~ 100

Author(s):

Marius Kaminskas ◽

Francesco Ricci

Keyword(s):

Information Retrieval ◽

State Of The Art ◽

Music Information Retrieval ◽

Music Information

Download Full-text

A Robust Cover Song Identification System with Two-Level Similarity Fusion and Post-Processing

Applied Sciences ◽

10.3390/app8081383 ◽

2018 ◽

Vol 8 (8) ◽

pp. 1383 ◽

Cited By ~ 2

Author(s):

Mingyu Li ◽

Ning Chen

Keyword(s):

Information Retrieval ◽

State Of The Art ◽

Identification System ◽

Post Processing ◽

Similarity Functions ◽

Processing Level ◽

Retrieval Scheme ◽

The Common ◽

Cover Song ◽

Music Information

Similarity measurement plays an important role in various information retrieval tasks. In this paper, a music information retrieval scheme based on two-level similarity fusion and post-processing is proposed. At the similarity fusion level, to take full advantage of the common and complementary properties among different descriptors and different similarity functions, first, the track-by-track similarity graphs generated from the same descriptor but different similarity functions are fused with the similarity network fusion (SNF) technique. Then, the obtained first-level fused similarities based on different descriptors are further fused with the mixture Markov model (MMM) technique. At the post-processing level, diffusion is first performed on the two-level fused similarity graph to utilize the underlying track manifold contained within it. Then, a mutual proximity (MP) algorithm is adopted to refine the diffused similarity scores, which helps to reduce the bad influence caused by the “hubness” phenomenon contained in the scores. The performance of the proposed scheme is tested in the cover song identification (CSI) task on three cover song datasets (Covers80, Covers40, and Second Hand Songs (SHS)). The experimental results demonstrate that the proposed scheme outperforms state-of-the-art CSI schemes based on single similarity or similarity fusion.

Download Full-text

Symbolic Melodic Similarity: State of the Art and Future Challenges

Computer Music Journal ◽

10.1162/comj_a_00359 ◽

2016 ◽

Vol 40 (2) ◽

pp. 70-83 ◽

Cited By ~ 9

Author(s):

Valerio Velardo ◽

Mauro Vallati ◽

Steven Jan

Keyword(s):

Information Retrieval ◽

Comparative Analysis ◽

State Of The Art ◽

Music Information Retrieval ◽

The State ◽

Information Retrieval Evaluation ◽

Future Challenges ◽

Music Information

Fostered by the introduction of the Music Information Retrieval Evaluation Exchange (MIREX) competition, the number of systems that calculate symbolic melodic similarity has recently increased considerably. To understand the state of the art, we provide a comparative analysis of existing algorithms. The analysis is based on eight criteria that help to characterize the systems, highlighting strengths and weaknesses. We also propose a taxonomy that classifies algorithms based on their approach. Both taxonomy and criteria are fruitfully exploited to provide input for new, forthcoming research in the area.

Download Full-text

An Effective Framework for Speech and Music Segregation

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/4/9 ◽

2020 ◽

Vol 17 (4) ◽

pp. 507-514

Author(s):

Sidra Sajid ◽

Ali Javed ◽

Aun Irtaza

Keyword(s):

Information Retrieval ◽

Single Channel ◽

State Of The Art ◽

Binary Mask ◽

Multimedia Information Retrieval ◽

Time Frequency ◽

Wide Range ◽

Ideal Binary Mask ◽

Music Information ◽

Singer Identification

Speech and music segregation from a single channel is a challenging task due to background interference and intermingled signals of voice and music channels. It is of immense importance due to its utility in wide range of applications such as music information retrieval, singer identification, lyrics recognition and alignment. This paper presents an effective method for speech and music segregation. Considering the repeating nature of music, we first detect the local repeating structures in the signal using a locally defined window for each segment. After detecting the repeating structure, we extract them and perform separation using a soft time-frequency mask. We apply an ideal binary mask to enhance the speech and music intelligibility. We evaluated the proposed method on the mixtures set at -5 dB, 0 dB, 5 dB from Multimedia Information Retrieval-1000 clips (MIR-1K) dataset. Experimental results demonstrate that the proposed method for speech and music segregation outperforms the existing state-of-the-art methods in terms of Global-Normalized-Signal-to-Distortion Ratio (GNSDR) values

Download Full-text

Singing voice identification and lyrics transcription for music information retrieval invited paper

2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD) ◽

10.1109/sped.2013.6682644 ◽

2013 ◽

Cited By ~ 2

Author(s):

Annamaria Mesaros

Keyword(s):

Information Retrieval ◽

Music Information Retrieval ◽

Singing Voice ◽

Voice Identification ◽

Music Information

Download Full-text

Multipitch estimation using judge-based model

Bulletin of the Polish Academy of Sciences Technical Sciences ◽

10.2478/bpasts-2014-0081 ◽

2014 ◽

Vol 62 (4) ◽

pp. 751-757

Author(s):

K. Rychlicki-Kicior ◽

B. Stasiak

Keyword(s):

Information Retrieval ◽

Fundamental Frequency ◽

Music Information Retrieval ◽

Real Life ◽

Musical Instruments ◽

Music Information ◽

Test Database

Abstract Multipitch estimation, also known as multiple fundamental frequency (F0) estimation, is an important part of the Music Information Retrieval (MIR) field. Although there have been many different approaches proposed, none of them has ever exceeded the abilities of a trained musician. In this work, an iterative cancellation method is analysed, being applied to three different sound representations - salience spectrum obtained using Constant-Q Transform, cepstrum and enhanced autocorrelation result. Real-life recordings of different musical instruments are used as a database and the parameters of the solution are optimized using a simple yet effective metaheuristic approach - the Luus-Jaakola algorithm. The presented approach results in 85% efficiency on the test database.

Download Full-text

On the suitability of state-of-the-art music information retrieval methods for analyzing, categorizing and accessing non-Western and ethnic music collections

Signal Processing ◽

10.1016/j.sigpro.2009.09.014 ◽

2010 ◽

Vol 90 (4) ◽

pp. 1032-1048 ◽

Cited By ~ 22

Author(s):

Thomas Lidy ◽

Carlos N. Silla ◽

Olmo Cornelis ◽

Fabien Gouyon ◽

Andreas Rauber ◽

...

Keyword(s):

Information Retrieval ◽

State Of The Art ◽

Music Information Retrieval ◽

Art Music ◽

Ethnic Music ◽

Information Retrieval Methods ◽

Music Information

Download Full-text

The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval

Journal of New Music Research ◽

10.1080/09298215.2014.894533 ◽

2014 ◽

Vol 43 (2) ◽

pp. 147-172 ◽

Cited By ~ 32

Author(s):

Bob L. Sturm

Keyword(s):

Information Retrieval ◽

State Of The Art ◽

Music Information Retrieval ◽

The State ◽

Future Research ◽

Music Information

Download Full-text

Tools for Music Information Retrieval and Playing

Intelligent Music Information Systems ◽

10.4018/978-1-59904-663-1.ch006 ◽

2011 ◽

pp. 120-145

Author(s):

Antonello D’Aguanno

Keyword(s):

Feature Extraction ◽

Information Retrieval ◽

Performance Analysis ◽

State Of The Art ◽

Automatic Recognition ◽

Musical Instruments ◽

Pitch Tracking ◽

Ieee Standard ◽

Points Of View ◽

Music Information

State-of-the-art MIR issues are presented and discussed both from the symbolic and audio points of view. As for the symbolic aspects, different approaches are presented in order to provide an overview of the different available solutions for particular MIR tasks. This section ends with an overview of MX, the IEEE standard XML language specifically designed to support interchange between musical notation, performance, analysis, and retrieval applications. As for the audio level, first we focus on blind tasks like beat and tempo tracking, pitch tracking and automatic recognition of musical instruments. Then we present algorithms that work both on compressed and uncompressed data. We analyze the relationships between MIR and feature extraction presenting examples of possible applications. Finally we focus on automatic music synchronization and we introduce a new audio player that supports the MX logic layer and allows to play both score and audio coherently.

Download Full-text

Distortion and Rock Guitar Harmony

Music Perception An Interdisciplinary Journal ◽

10.1525/mp.2019.36.4.335 ◽

2019 ◽

Vol 36 (4) ◽

pp. 335-352 ◽

Cited By ~ 1

Author(s):

Jan-Peter Herbst

Keyword(s):

Heavy Metal ◽

Information Retrieval ◽

Individual Differences ◽

Rock Music ◽

Music Information Retrieval ◽

Harmonic Structure ◽

Acoustic Features ◽

Electric Guitar ◽

Listening Test ◽

Music Information

Research on rock harmony accords with common practice in guitar playing in that power chords (fifth interval) with an indeterminate chord quality as well as major chords are preferred to more complex chords when played with a distorted tone. This study explored the interrelated effects of distortion and harmonic structure on acoustic features and perceived pleasantness of electric guitar chords. Extracting psychoacoustic parameters from guitar tones with Music Information Retrieval technology revealed that the level of distortion and the complexity of interval relations affects sensorial pleasantness. A listening test demonstrated power and major chords being perceived as significantly more pleasant than minor and altered dominant chords when being played with an overdriven or distorted guitar tone. This result accords with musical practice within rock genres. Rather clean rock styles such as blues or classic rock use major chords frequently, whereas subgenres with more distorted guitars such as heavy metal largely prefer power chords. Considering individual differences, electric guitar players rated overdriven and distorted chords as significantly more pleasant. Results were ambiguous in terms of gender but indicated that women perceive distorted guitar tones as less pleasant than men. Rock music listeners were more tolerant of sensorial unpleasant sounds.

Download Full-text