scholarly journals Vocal Melody Extraction via HRNet-Based Singing Voice Separation and Encoder-Decoder-Based F0 Estimation

Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 298
Author(s):  
Yongwei Gao ◽  
Xulong Zhang ◽  
Wei Li

Vocal melody extraction is an important and challenging task in music information retrieval. One main difficulty is that, most of the time, various instruments and singing voices are mixed according to harmonic structure, making it hard to identify the fundamental frequency (F0) of a singing voice. Therefore, reducing the interference of accompaniment is beneficial to pitch estimation of the singing voice. In this paper, we first adopted a high-resolution network (HRNet) to separate vocals from polyphonic music, then designed an encoder-decoder network to estimate the vocal F0 values. Experiment results demonstrate that the effectiveness of the HRNet-based singing voice separation method in reducing the interference of accompaniment on the extraction of vocal melody, and the proposed vocal melody extraction (VME) system outperforms other state-of-the-art algorithms in most cases.

2018 ◽  
Vol 8 (8) ◽  
pp. 1383 ◽  
Author(s):  
Mingyu Li ◽  
Ning Chen

Similarity measurement plays an important role in various information retrieval tasks. In this paper, a music information retrieval scheme based on two-level similarity fusion and post-processing is proposed. At the similarity fusion level, to take full advantage of the common and complementary properties among different descriptors and different similarity functions, first, the track-by-track similarity graphs generated from the same descriptor but different similarity functions are fused with the similarity network fusion (SNF) technique. Then, the obtained first-level fused similarities based on different descriptors are further fused with the mixture Markov model (MMM) technique. At the post-processing level, diffusion is first performed on the two-level fused similarity graph to utilize the underlying track manifold contained within it. Then, a mutual proximity (MP) algorithm is adopted to refine the diffused similarity scores, which helps to reduce the bad influence caused by the “hubness” phenomenon contained in the scores. The performance of the proposed scheme is tested in the cover song identification (CSI) task on three cover song datasets (Covers80, Covers40, and Second Hand Songs (SHS)). The experimental results demonstrate that the proposed scheme outperforms state-of-the-art CSI schemes based on single similarity or similarity fusion.


2016 ◽  
Vol 40 (2) ◽  
pp. 70-83 ◽  
Author(s):  
Valerio Velardo ◽  
Mauro Vallati ◽  
Steven Jan

Fostered by the introduction of the Music Information Retrieval Evaluation Exchange (MIREX) competition, the number of systems that calculate symbolic melodic similarity has recently increased considerably. To understand the state of the art, we provide a comparative analysis of existing algorithms. The analysis is based on eight criteria that help to characterize the systems, highlighting strengths and weaknesses. We also propose a taxonomy that classifies algorithms based on their approach. Both taxonomy and criteria are fruitfully exploited to provide input for new, forthcoming research in the area.


2020 ◽  
Vol 17 (4) ◽  
pp. 507-514
Author(s):  
Sidra Sajid ◽  
Ali Javed ◽  
Aun Irtaza

Speech and music segregation from a single channel is a challenging task due to background interference and intermingled signals of voice and music channels. It is of immense importance due to its utility in wide range of applications such as music information retrieval, singer identification, lyrics recognition and alignment. This paper presents an effective method for speech and music segregation. Considering the repeating nature of music, we first detect the local repeating structures in the signal using a locally defined window for each segment. After detecting the repeating structure, we extract them and perform separation using a soft time-frequency mask. We apply an ideal binary mask to enhance the speech and music intelligibility. We evaluated the proposed method on the mixtures set at -5 dB, 0 dB, 5 dB from Multimedia Information Retrieval-1000 clips (MIR-1K) dataset. Experimental results demonstrate that the proposed method for speech and music segregation outperforms the existing state-of-the-art methods in terms of Global-Normalized-Signal-to-Distortion Ratio (GNSDR) values


2014 ◽  
Vol 62 (4) ◽  
pp. 751-757
Author(s):  
K. Rychlicki-Kicior ◽  
B. Stasiak

Abstract Multipitch estimation, also known as multiple fundamental frequency (F0) estimation, is an important part of the Music Information Retrieval (MIR) field. Although there have been many different approaches proposed, none of them has ever exceeded the abilities of a trained musician. In this work, an iterative cancellation method is analysed, being applied to three different sound representations - salience spectrum obtained using Constant-Q Transform, cepstrum and enhanced autocorrelation result. Real-life recordings of different musical instruments are used as a database and the parameters of the solution are optimized using a simple yet effective metaheuristic approach - the Luus-Jaakola algorithm. The presented approach results in 85% efficiency on the test database.


Author(s):  
Antonello D’Aguanno

State-of-the-art MIR issues are presented and discussed both from the symbolic and audio points of view. As for the symbolic aspects, different approaches are presented in order to provide an overview of the different available solutions for particular MIR tasks. This section ends with an overview of MX, the IEEE standard XML language specifically designed to support interchange between musical notation, performance, analysis, and retrieval applications. As for the audio level, first we focus on blind tasks like beat and tempo tracking, pitch tracking and automatic recognition of musical instruments. Then we present algorithms that work both on compressed and uncompressed data. We analyze the relationships between MIR and feature extraction presenting examples of possible applications. Finally we focus on automatic music synchronization and we introduce a new audio player that supports the MX logic layer and allows to play both score and audio coherently.


2019 ◽  
Vol 36 (4) ◽  
pp. 335-352 ◽  
Author(s):  
Jan-Peter Herbst

Research on rock harmony accords with common practice in guitar playing in that power chords (fifth interval) with an indeterminate chord quality as well as major chords are preferred to more complex chords when played with a distorted tone. This study explored the interrelated effects of distortion and harmonic structure on acoustic features and perceived pleasantness of electric guitar chords. Extracting psychoacoustic parameters from guitar tones with Music Information Retrieval technology revealed that the level of distortion and the complexity of interval relations affects sensorial pleasantness. A listening test demonstrated power and major chords being perceived as significantly more pleasant than minor and altered dominant chords when being played with an overdriven or distorted guitar tone. This result accords with musical practice within rock genres. Rather clean rock styles such as blues or classic rock use major chords frequently, whereas subgenres with more distorted guitars such as heavy metal largely prefer power chords. Considering individual differences, electric guitar players rated overdriven and distorted chords as significantly more pleasant. Results were ambiguous in terms of gender but indicated that women perceive distorted guitar tones as less pleasant than men. Rock music listeners were more tolerant of sensorial unpleasant sounds.


Sign in / Sign up

Export Citation Format

Share Document