Joint Detection and Classification of Singing Voice Melody Using Convolutional Recurrent Neural Networks

Sangeun Kum; Juhan Nam

doi:10.3390/app9071324

Joint Detection and Classification of Singing Voice Melody Using Convolutional Recurrent Neural Networks

Applied Sciences ◽

10.3390/app9071324 ◽

2019 ◽

Vol 9 (7) ◽

pp. 1324 ◽

Cited By ~ 2

Author(s):

Sangeun Kum ◽

Juhan Nam

Keyword(s):

Loss Function ◽

State Of The Art ◽

Joint Detection ◽

Singing Voice ◽

Pitch Estimation ◽

Proposed Model ◽

Pitch Contours ◽

Multi Level ◽

Extraction Performance

Singing melody extraction essentially involves two tasks: one is detecting the activity of a singing voice in polyphonic music, and the other is estimating the pitch of a singing voice in the detected voiced segments. In this paper, we present a joint detection and classification (JDC) network that conducts the singing voice detection and the pitch estimation simultaneously. The JDC network is composed of the main network that predicts the pitch contours of the singing melody and an auxiliary network that facilitates the detection of the singing voice. The main network is built with a convolutional recurrent neural network with residual connections and predicts pitch labels that cover the vocal range with a high resolution, as well as non-voice status. The auxiliary network is trained to detect the singing voice using multi-level features shared from the main network. The two optimization processes are tied with a joint melody loss function. We evaluate the proposed model on multiple melody extraction and vocal detection datasets, including cross-dataset evaluation. The experiments demonstrate how the auxiliary network and the joint melody loss function improve the melody extraction performance. Furthermore, the results show that our method outperforms state-of-the-art algorithms on the datasets.

Download Full-text

ECG Heartbeat Classification Based on an Improved ResNet-18 Model

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/6649970 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Enbiao Jing ◽

Haiyang Zhang ◽

ZhiGang Li ◽

Yazhi Liu ◽

Zhanlin Ji ◽

...

Keyword(s):

State Of The Art ◽

Classification Performance ◽

Classification Models ◽

Heartbeat Classification ◽

Ecg Signals ◽

Residual Structure ◽

Proposed Model ◽

Model Training ◽

Electrocardiogram Ecg

Based on a convolutional neural network (CNN) approach, this article proposes an improved ResNet-18 model for heartbeat classification of electrocardiogram (ECG) signals through appropriate model training and parameter adjustment. Due to the unique residual structure of the model, the utilized CNN layered structure can be deepened in order to achieve better classification performance. The results of applying the proposed model to the MIT-BIH arrhythmia database demonstrate that the model achieves higher accuracy (96.50%) compared to other state-of-the-art classification models, while specifically for the ventricular ectopic heartbeat class, its sensitivity is 93.83% and the precision is 97.44%.

Download Full-text

Image-Based Malware Classification Using VGG19 Network and Spatial Convolutional Attention

Electronics ◽

10.3390/electronics10192444 ◽

2021 ◽

Vol 10 (19) ◽

pp. 2444

Author(s):

Mazhar Javed Awan ◽

Osama Ahmed Masood ◽

Mazin Abed Mohammed ◽

Awais Yasin ◽

Azlan Mohd Zain ◽

...

Keyword(s):

High Performance ◽

State Of The Art ◽

Malware Detection ◽

Detection Methods ◽

Learning Framework ◽

Malware Classification ◽

Proposed Model ◽

Communication Devices ◽

Digital Assets

In recent years the amount of malware spreading through the internet and infecting computers and other communication devices has tremendously increased. To date, countless techniques and methodologies have been proposed to detect and neutralize these malicious agents. However, as new and automated malware generation techniques emerge, a lot of malware continues to be produced, which can bypass some state-of-the-art malware detection methods. Therefore, there is a need for the classification and detection of these adversarial agents that can compromise the security of people, organizations, and countless other forms of digital assets. In this paper, we propose a spatial attention and convolutional neural network (SACNN) based on deep learning framework for image-based classification of 25 well-known malware families with and without class balancing. Performance was evaluated on the Malimg benchmark dataset using precision, recall, specificity, precision, and F1 score on which our proposed model with class balancing reached 97.42%, 97.95%, 97.33%, 97.11%, and 97.32%. We also conducted experiments on SACNN with class balancing on benign class, also produced above 97%. The results indicate that our proposed model can be used for image-based malware detection with high performance, despite being simpler as compared to other available solutions.

Download Full-text

A Novel Architecture to Classify Histopathology Images Using Convolutional Neural Networks

Applied Sciences ◽

10.3390/app10082929 ◽

2020 ◽

Vol 10 (8) ◽

pp. 2929 ◽

Cited By ~ 2

Author(s):

Ibrahem Kandel ◽

Mauro Castelli

Keyword(s):

Neural Network ◽

Neural Networks ◽

State Of The Art ◽

Treatment Plan ◽

Tissue Structure ◽

Activation Functions ◽

Proposed Model ◽

Histopathology Images ◽

Fully Connected

Histopathology is the study of tissue structure under the microscope to determine if the cells are normal or abnormal. Histopathology is a very important exam that is used to determine the patients’ treatment plan. The classification of histopathology images is very difficult to even an experienced pathologist, and a second opinion is often needed. Convolutional neural network (CNN), a particular type of deep learning architecture, obtained outstanding results in computer vision tasks like image classification. In this paper, we propose a novel CNN architecture to classify histopathology images. The proposed model consists of 15 convolution layers and two fully connected layers. A comparison between different activation functions was performed to detect the most efficient one, taking into account two different optimizers. To train and evaluate the proposed model, the publicly available PatchCamelyon dataset was used. The dataset consists of 220,000 annotated images for training and 57,000 unannotated images for testing. The proposed model achieved higher performance compared to the state-of-the-art architectures with an AUC of 95.46%.

Download Full-text

Histopathological Classification of Breast Cancer Images Using a Multi-Scale Input and Multi-Feature Network

Cancers ◽

10.3390/cancers12082031 ◽

2020 ◽

Vol 12 (8) ◽

pp. 2031 ◽

Cited By ~ 2

Author(s):

Taimoor Shakeel Sheikh ◽

Yonghee Lee ◽

Migyung Cho

Keyword(s):

State Of The Art ◽

Texture Features ◽

Feature Maps ◽

Histopathological Classification ◽

Multi Scale ◽

Machine Learning Methods ◽

Proposed Model ◽

Benchmark Datasets ◽

Histopathological Images

Diagnosis of pathologies using histopathological images can be time-consuming when many images with different magnification levels need to be analyzed. State-of-the-art computer vision and machine learning methods can help automate the diagnostic pathology workflow and thus reduce the analysis time. Automated systems can also be more efficient and accurate, and can increase the objectivity of diagnosis by reducing operator variability. We propose a multi-scale input and multi-feature network (MSI-MFNet) model, which can learn the overall structures and texture features of different scale tissues by fusing multi-resolution hierarchical feature maps from the network’s dense connectivity structure. The MSI-MFNet predicts the probability of a disease on the patch and image levels. We evaluated the performance of our proposed model on two public benchmark datasets. Furthermore, through ablation studies of the model, we found that multi-scale input and multi-feature maps play an important role in improving the performance of the model. Our proposed model outperformed the existing state-of-the-art models by demonstrating better accuracy, sensitivity, and specificity.

Download Full-text

EnCNN-UPMWS: Waste Classification by a CNN Ensemble Using the UPM Weighting Strategy

Electronics ◽

10.3390/electronics10040427 ◽

2021 ◽

Vol 10 (4) ◽

pp. 427

Author(s):

Hua Zheng ◽

Yu Gu

Keyword(s):

Ensemble Learning ◽

State Of The Art ◽

Learning Model ◽

Predicted Probability ◽

Proposed Model ◽

Potential Applications ◽

Weighting Strategy ◽

Validation Set ◽

Weight Coefficients

The accurate and effective classification of household solid waste (HSW) is an indispensable component in the current procedure of waste disposal. In this paper, a novel ensemble learning model called EnCNN-UPMWS, which is based on convolutional neural networks (CNNs) and an unequal precision measurement weighting strategy (UPMWS), is proposed for the classification of HSW via waste images. First, three state-of-the-art CNNs, namely GoogLeNet, ResNet-50, and MobileNetV2, are used as ingredient classifiers to separately predict and obtain three predicted probability vectors, which are significant elements that affect the prediction performance by providing complementary information about the patterns to be classified. Then, the UPMWS is introduced to determine the weight coefficients of the ensemble models. The actual one-hot encoding labels of the validation set and the predicted probability vectors from the CNN ensemble are creatively used to calculate the weights for each classifier during the training phase, which can bring the aggregated prediction vector closer to the target label and improve the performance of the ensemble model. The proposed model was applied to two datasets, namely TrashNet (an open-access dataset) and FourTrash, which was constructed by collecting a total of 47,332 common HSW images containing four types of waste (wet waste, recyclables, harmful waste, and dry waste). The experimental results demonstrate the effectiveness of the proposed method in terms of its accuracy and F1-scores. Moreover, it was found that the UPMWS can simply and effectively enhance the performance of the ensemble learning model, and has potential applications in similar tasks of classification via ensemble learning.

Download Full-text

Adaptive Feature Pyramid Network to Predict Crisp Boundaries via NMS Layer and ODS F-Measure Loss Function

Information ◽

10.3390/info13010032 ◽

2022 ◽

Vol 13 (1) ◽

pp. 32

Author(s):

Gang Sun ◽

Hancheng Yu ◽

Xiangtao Jiang ◽

Mingkui Feng

Keyword(s):

Edge Detection ◽

Loss Function ◽

State Of The Art ◽

Cross Entropy ◽

Post Processing ◽

Multi Scale ◽

Feature Pyramid ◽

Multi Level ◽

Different Levels ◽

F Measure

Edge detection is one of the fundamental computer vision tasks. Recent methods for edge detection based on a convolutional neural network (CNN) typically employ the weighted cross-entropy loss. Their predicted results being thick and needing post-processing before calculating the optimal dataset scale (ODS) F-measure for evaluation. To achieve end-to-end training, we propose a non-maximum suppression layer (NMS) to obtain sharp boundaries without the need for post-processing. The ODS F-measure can be calculated based on these sharp boundaries. So, the ODS F-measure loss function is proposed to train the network. Besides, we propose an adaptive multi-level feature pyramid network (AFPN) to better fuse different levels of features. Furthermore, to enrich multi-scale features learned by AFPN, we introduce a pyramid context module (PCM) that includes dilated convolution to extract multi-scale features. Experimental results indicate that the proposed AFPN achieves state-of-the-art performance on the BSDS500 dataset (ODS F-score of 0.837) and the NYUDv2 dataset (ODS F-score of 0.780).

Download Full-text

Automatic X-ray COVID-19 Lung Image Classification System based on Multi-Level Thresholding and Support Vector Machine

10.1101/2020.03.30.20047787 ◽

2020 ◽

Cited By ~ 9

Author(s):

Lamia Nabil Mahdy ◽

Kadry Ali Ezzat ◽

Haytham H. Elmousalami ◽

Hassan Aboul Ella ◽

Aboul Ella Hassanien

Keyword(s):

Early Detection ◽

Support Vector ◽

Average Sensitivity ◽

X Ray ◽

Infected People ◽

Proposed Model ◽

Medical Health Care ◽

Multi Level ◽

Sensitivity Specificity

AbstractThe early detection of SARS-CoV-2, the causative agent of (COVID-19) is now a critical task for the clinical practitioners. The COVID-19 spread is announced as pandemic outbreak between people worldwide by WHO since 11/ March/ 2020. In this consequence, it is top critical priority to become aware of the infected people so that prevention procedures can be processed to minimize the COVID-19 spread and to begin early medical health care of those infected persons. In this paper, the deep studying based totally methodology is usually recommended for the detection of COVID-19 infected patients using X-ray images. The help vector gadget classifies the corona affected X-ray images from others through usage of the deep features. The technique is useful for the clinical practitioners for early detection of COVID-19 infected patients. The suggested system of multi-level thresholding plus SVM presented high accuracy in classification of the infected lung with Covid-19. All images were of the same size and stored in JPEG format with 512 * 512 pixels. The average sensitivity, specificity, and accuracy of the lung classification using the proposed model results were 95.76%, 99.7%, and 97.48%, respectively.

Download Full-text

Gradient Harmonized Single-Stage Detector

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018577 ◽

2019 ◽

Vol 33 ◽

pp. 8577-8584 ◽

Cited By ~ 45

Author(s):

Buyu Li ◽

Yu Liu ◽

Xiaogang Wang

Keyword(s):

Loss Function ◽

State Of The Art ◽

Gradient Flow ◽

Parameter Tuning ◽

Substantial Improvement ◽

Single Stage ◽

Future Research ◽

Great Success ◽

Proposed Model ◽

Ablation Study

Despite the great success of two-stage detectors, single-stage detector is still a more elegant and efficient way, yet suffers from the two well-known disharmonies during training, i.e. the huge difference in quantity between positive and negative examples as well as between easy and hard examples. In this work, we first point out that the essential effect of the two disharmonies can be summarized in term of the gradient. Further, we propose a novel gradient harmonizing mechanism (GHM) to be a hedging for the disharmonies. The philosophy behind GHM can be easily embedded into both classification loss function like cross-entropy (CE) and regression loss function like smooth-L1 (SL1) loss. To this end, two novel loss functions called GHM-C and GHM-R are designed to balancing the gradient flow for anchor classification and bounding box refinement, respectively. Ablation study on MS COCO demonstrates that without laborious hyper-parameter tuning, both GHM-C and GHM-R can bring substantial improvement for single-stage detector. Without any whistles and bells, the proposed model achieves 41.6 mAP on COCO testdev set which surpass the state-of-the-art method, Focal Loss (FL) + SL1, by 0.8. The code1 is released to facilitate future research.

Download Full-text

TReC: Transferred ResNet and CBAM for Detecting Brain Diseases

Frontiers in Neuroinformatics ◽

10.3389/fninf.2021.781551 ◽

2021 ◽

Vol 15 ◽

Author(s):

Yuteng Xiao ◽

Hongsheng Yin ◽

Shui-Hua Wang ◽

Yu-Dong Zhang

Keyword(s):

State Of The Art ◽

Specific Model ◽

Brain Diseases ◽

Small Scale ◽

Brain Images ◽

New Approach ◽

Proposed Model ◽

Residual Block ◽

Fully Connected

Early diagnosis of pathological brains leads to early interventions in brain diseases, which may help control the illness conditions, prolong the life of patients, and even cure them. Therefore, the classification of brain diseases is a challenging but helpful task. However, it is hard to collect brain images, and the superabundance of images is also a great challenge for computing resources. This study proposes a new approach named TReC: Transferred Residual Networks (ResNet)-Convolutional Block Attention Module (CBAM), a specific model for small-scale samples, to detect brain diseases based on MRI. At first, the ResNet model, which is pre-trained on the ImageNet dataset, serves as initialization. Subsequently, a simple attention mechanism named CBAM is introduced and added into every ResNet residual block. At the same time, the fully connected (FC) layers of the ResNet are replaced with new FC layers, which meet the goal of classification. Finally, all the parameters of our model, such as the ResNet, the CBAM, and new FC layers, are retrained. The effectiveness of the proposed model is evaluated on brain magnetic resonance (MR) datasets for multi-class and two-class tasks. Compared with other state-of-the-art models, our model reaches the best performance for two-class and multi-class tasks on brain diseases.

Download Full-text

HAF-SVG: Hierarchical Stochastic Video Generation with Aligned Features

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/138 ◽

2020 ◽

Author(s):

Zhihui Lin ◽

Chun Yuan ◽

Maomao Li

Keyword(s):

Prediction Accuracy ◽

Spatial Information ◽

State Of The Art ◽

Independence Assumption ◽

Main Challenge ◽

Proposed Model ◽

Generation Network ◽

Multi Level ◽

The One

Stochastic video generation methods predict diverse videos based on observed frames, where the main challenge lies in modeling the complex future uncertainty and generating realistic frames. Numerous of Recurrent-VAE-based methods have achieved state-of-the-art results. However, on the one hand, the independence assumption of the variables of approximate posterior limits the inference performance. On the other hand, although these methods adopt skip connections between encoder and decoder to utilize multi-level features, they still produce blurry generation due to the spatial misalignment between encoder and decoder features at different time steps. In this paper, we propose a hierarchical recurrent VAE with a feature aligner, which can not only relax the independence assumption in typical VAE but also use a feature aligner to enable the decoder to obtain the aligned spatial information from the last observed frames. The proposed model is named Hierarchical Stochastic Video Generation network with Aligned Features, referred to as HAF-SVG. Experiments on Moving-MNIST, BAIR, and KTH datasets demonstrate that hierarchical structure is helpful for modeling more accurate future uncertainty, and the feature aligner is beneficial to generate realistic frames. Besides, the HAF-SVG exceeds SVG on both prediction accuracy and the quality of generated frames.

Download Full-text