scholarly journals Matching Subsequence Music Retrieval in a Software Integration Environment

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Zhencong Li ◽  
Qin Yao ◽  
Wanzhi Ma

This paper firstly introduces the basic knowledge of music, proposes the detailed design of a music retrieval system based on the knowledge of music, and analyzes the feature extraction algorithm and matching algorithm by using the features of music. Feature extraction of audio data is the important research of this paper. In this paper, the main melody features, MFCC features, GFCC features, and rhythm features, are extracted from audio data and a feature fusion algorithm is proposed to achieve the fusion of GFCC features and rhythm features to form new features under the processing of principal component analysis (PCA) dimensionality reduction. After learning the main melody features, MFCC features, GFCC features, and rhythm features, based on the property that PCA dimensionality reduction can effectively reduce noise and improve retrieval efficiency, this paper proposes vector fusion by dimensionality reduction of GFCC features and rhythm features. The matching retrieval of audio features is an important task in music retrieval. In this paper, the DTW algorithm is chosen as the main algorithm for retrieving music. The classification retrieval of music is also achieved by the K-nearest neighbor algorithm. In this paper, after implementing the research and improvement of algorithms, these algorithms are integrated into the system to achieve audio preprocessing, feature extraction, feature postprocessing, and matching retrieval. This article uses 100 different kinds of MP3 format music as the music library and randomly selects 4 pieces each time, and it tests the system under different system parameters, recording duration, and environmental noise. Through the research of this paper, the efficiency of music retrieval is improved and theoretical support is provided for the design of music retrieval software integration system.

Author(s):  
Shengzhou Yi ◽  
Koshiro Mochitomi ◽  
Isao Suzuki ◽  
Xueting Wang ◽  
Toshihiko Yamasaki

In the study, a multimodal neural network is proposed to automatically predict the evaluation of a professional consultant team for press conferences using text and audio data. Seven publicly available press conference videos were collected, and all the Q&A pairs between speakers and journalists were annotated by the consultant team. The proposed multimodal neural network consists of a language model, an audio model, and a feature fusion network. The word representation is made up by a token embedding using ELMo and a type embedding. The language model is an LSTM with an attention layer. The audio model is based on a six-layer CNN to extract segmental feature as well as an attention network to measure the importance of each segment. Two approaches of feature fusion are proposed: a shared attention network and the production of text features and audio features. The former can explain the importance between speech content and speaking style. The latter achieved the best performance with the average accuracy of 60.1% for all evaluation criteria.


2020 ◽  
Vol 10 (14) ◽  
pp. 4966 ◽  
Author(s):  
Maryam Nisa ◽  
Jamal Hussain Shah ◽  
Shansa Kanwal ◽  
Mudassar Raza ◽  
Muhammad Attique Khan ◽  
...  

As the number of internet users increases so does the number of malicious attacks using malware. The detection of malicious code is becoming critical, and the existing approaches need to be improved. Here, we propose a feature fusion method to combine the features extracted from pre-trained AlexNet and Inception-v3 deep neural networks with features attained using segmentation-based fractal texture analysis (SFTA) of images representing the malware code. In this work, we use distinctive pre-trained models (AlexNet and Inception-V3) for feature extraction. The purpose of deep convolutional neural network (CNN) feature extraction from two models is to improve the malware classifier accuracy, because both models have characteristics and qualities to extract different features. This technique produces a fusion of features to build a multimodal representation of malicious code that can be used to classify the grayscale images, separating the malware into 25 malware classes. The features that are extracted from malware images are then classified using different variants of support vector machine (SVM), k-nearest neighbor (KNN), decision tree (DT), and other classifiers. To improve the classification results, we also adopted data augmentation based on affine image transforms. The presented method is evaluated on a Malimg malware image dataset, achieving an accuracy of 99.3%, which makes it the best among the competing approaches.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Yuhua Li ◽  
Zhiqiang He ◽  
Sunan Wang ◽  
Zicheng Wang ◽  
Wanwei Huang

In order to improve recognition accuracy of clothing style and fully exploit the advantages of deep learning in extracting deep semantic features from global to local features of clothing images, this paper utilizes the target detection technology and deep residual network (ResNet) to extract comprehensive clothing features, which aims at focusing on clothing itself in the process of feature extraction procedure. Based on that, we propose a multideep feature fusion algorithm for clothing image style recognition. First, we use the improved target detection model to extract the global area, main part, and part areas of clothing, which constitute the image, so as to weaken the influence of the background and other interference factors. Then, the three parts were inputted, respectively, to improve ResNet for feature extraction, which has been trained beforehand. The ResNet model is improved by optimizing the convolution layer in the residual block and adjusting the order of the batch-normalized layer and the activation layer. Finally, the multicategory fusion features were obtained by combining the overall features of the clothing image from the global area, the main part, to the part areas. The experimental results show that the proposed algorithm eliminates the influence of interference factors, makes the recognition process focus on clothing itself, greatly improves the accuracy of the clothing style recognition, and is better than the traditional deep residual network-based methods.


2018 ◽  
Author(s):  
I Wayan Agus Surya Darma

Balinese character recognition is a technique to recognize feature or pattern of Balinese character. Feature of Balinese character is generated through feature extraction process. This research using handwritten Balinese character. Feature extraction is a process to obtain the feature of character. In this research, feature extraction process generated semantic and direction feature of handwritten Balinese character. Recognition is using K-Nearest Neighbor algorithm to recognize 81 handwritten Balinese character. The feature of Balinese character images tester are compared with reference features. Result of the recognition system with K=3 and reference=10 is achieved a success rate of 97,53%.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1274
Author(s):  
Daniel Bonet-Solà ◽  
Rosa Ma Alsina-Pagès

Acoustic event detection and analysis has been widely developed in the last few years for its valuable application in monitoring elderly or dependant people, for surveillance issues, for multimedia retrieval, or even for biodiversity metrics in natural environments. For this purpose, sound source identification is a key issue to give a smart technological answer to all the aforementioned applications. Diverse types of sounds and variate environments, together with a number of challenges in terms of application, widen the choice of artificial intelligence algorithm proposal. This paper presents a comparative study on combining several feature extraction algorithms (Mel Frequency Cepstrum Coefficients (MFCC), Gammatone Cepstrum Coefficients (GTCC), and Narrow Band (NB)) with a group of machine learning algorithms (k-Nearest Neighbor (kNN), Neural Networks (NN), and Gaussian Mixture Model (GMM)), tested over five different acoustic environments. This work has the goal of detailing a best practice method and evaluate the reliability of this general-purpose algorithm for all the classes. Preliminary results show that most of the combinations of feature extraction and machine learning present acceptable results in most of the described corpora. Nevertheless, there is a combination that outperforms the others: the use of GTCC together with kNN, and its results are further analyzed for all the corpora.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Federico Calesella ◽  
Alberto Testolin ◽  
Michele De Filippo De Grazia ◽  
Marco Zorzi

AbstractMultivariate prediction of human behavior from resting state data is gaining increasing popularity in the neuroimaging community, with far-reaching translational implications in neurology and psychiatry. However, the high dimensionality of neuroimaging data increases the risk of overfitting, calling for the use of dimensionality reduction methods to build robust predictive models. In this work, we assess the ability of four well-known dimensionality reduction techniques to extract relevant features from resting state functional connectivity matrices of stroke patients, which are then used to build a predictive model of the associated deficits based on cross-validated regularized regression. In particular, we investigated the prediction ability over different neuropsychological scores referring to language, verbal memory, and spatial memory domains. Principal Component Analysis (PCA) and Independent Component Analysis (ICA) were the two best methods at extracting representative features, followed by Dictionary Learning (DL) and Non-Negative Matrix Factorization (NNMF). Consistent with these results, features extracted by PCA and ICA were found to be the best predictors of the neuropsychological scores across all the considered cognitive domains. For each feature extraction method, we also examined the impact of the regularization method, model complexity (in terms of number of features that entered in the model) and quality of the maps that display predictive edges in the resting state networks. We conclude that PCA-based models, especially when combined with L1 (LASSO) regularization, provide optimal balance between prediction accuracy, model complexity, and interpretability.


2021 ◽  
pp. 1-18
Author(s):  
R.S. Rampriya ◽  
Sabarinathan ◽  
R. Suganya

In the near future, combo of UAV (Unmanned Aerial Vehicle) and computer vision will play a vital role in monitoring the condition of the railroad periodically to ensure passenger safety. The most significant module involved in railroad visual processing is obstacle detection, in which caution is obstacle fallen near track gage inside or outside. This leads to the importance of detecting and segment the railroad as three key regions, such as gage inside, rails, and background. Traditional railroad segmentation methods depend on either manual feature selection or expensive dedicated devices such as Lidar, which is typically less reliable in railroad semantic segmentation. Also, cameras mounted on moving vehicles like a drone can produce high-resolution images, so segmenting precise pixel information from those aerial images has been challenging due to the railroad surroundings chaos. RSNet is a multi-level feature fusion algorithm for segmenting railroad aerial images captured by UAV and proposes an attention-based efficient convolutional encoder for feature extraction, which is robust and computationally efficient and modified residual decoder for segmentation which considers only essential features and produces less overhead with higher performance even in real-time railroad drone imagery. The network is trained and tested on a railroad scenic view segmentation dataset (RSSD), which we have built from real-time UAV images and achieves 0.973 dice coefficient and 0.94 jaccard on test data that exhibits better results compared to the existing approaches like a residual unit and residual squeeze net.


2021 ◽  
Vol 13 (10) ◽  
pp. 1950
Author(s):  
Cuiping Shi ◽  
Xin Zhao ◽  
Liguo Wang

In recent years, with the rapid development of computer vision, increasing attention has been paid to remote sensing image scene classification. To improve the classification performance, many studies have increased the depth of convolutional neural networks (CNNs) and expanded the width of the network to extract more deep features, thereby increasing the complexity of the model. To solve this problem, in this paper, we propose a lightweight convolutional neural network based on attention-oriented multi-branch feature fusion (AMB-CNN) for remote sensing image scene classification. Firstly, we propose two convolution combination modules for feature extraction, through which the deep features of images can be fully extracted with multi convolution cooperation. Then, the weights of the feature are calculated, and the extracted deep features are sent to the attention mechanism for further feature extraction. Next, all of the extracted features are fused by multiple branches. Finally, depth separable convolution and asymmetric convolution are implemented to greatly reduce the number of parameters. The experimental results show that, compared with some state-of-the-art methods, the proposed method still has a great advantage in classification accuracy with very few parameters.


2009 ◽  
Vol 03 (02) ◽  
pp. 209-234 ◽  
Author(s):  
YI YU ◽  
KAZUKI JOE ◽  
VINCENT ORIA ◽  
FABIAN MOERCHEN ◽  
J. STEPHEN DOWNIE ◽  
...  

Research on audio-based music retrieval has primarily concentrated on refining audio features to improve search quality. However, much less work has been done on improving the time efficiency of music audio searches. Representing music audio documents in an indexable format provides a mechanism for achieving efficiency. To address this issue, in this work Exact Locality Sensitive Mapping (ELSM) is suggested to join the concatenated feature sets and soft hash values. On this basis we propose audio-based music indexing techniques, ELSM and Soft Locality Sensitive Hash (SoftLSH) using an optimized Feature Union (FU) set of extracted audio features. Two contributions are made here. First, the principle of similarity-invariance is applied in summarizing audio feature sequences and utilized in training semantic audio representations based on regression. Second, soft hash values are pre-calculated to help locate the searching range more accurately and improve collision probability among features similar to each other. Our algorithms are implemented in a demonstration system to show how to retrieve and evaluate multi-version audio documents. Experimental evaluation over a real "multi-version" audio dataset confirms the practicality of ELSM and SoftLSH with FU and proves that our algorithms are effective for both multi-version detection (online query, one-query vs. multi-object) and same content detection (batch queries, multi-queries vs. one-object).


Sign in / Sign up

Export Citation Format

Share Document