A Review of Deep Learning Based Methods for Acoustic Scene Classification

Jakob Abeßer

doi:10.3390/app10062020

A Review of Deep Learning Based Methods for Acoustic Scene Classification

Applied Sciences ◽

10.3390/app10062020 ◽

2020 ◽

Vol 10 (6) ◽

pp. 2020 ◽

Cited By ~ 6

Author(s):

Jakob Abeßer

Keyword(s):

Deep Learning ◽

Data Augmentation ◽

Real Life ◽

Network Architectures ◽

Data Preparation ◽

Scene Classification ◽

Feature Representations ◽

Audio Recordings ◽

Number Of Publications

The number of publications on acoustic scene classification (ASC) in environmental audio recordings has constantly increased over the last few years. This was mainly stimulated by the annual Detection and Classification of Acoustic Scenes and Events (DCASE) competition with its first edition in 2013. All competitions so far involved one or multiple ASC tasks. With a focus on deep learning based ASC algorithms, this article summarizes and groups existing approaches for data preparation, i.e., feature representations, feature pre-processing, and data augmentation, and for data modeling, i.e., neural network architectures and learning paradigms. Finally, the paper discusses current algorithmic limitations and open challenges in order to preview possible future developments towards the real-life application of ASC systems.

Download Full-text

An Imbalanced Image Classification Method for the Cell Cycle Phase

Information ◽

10.3390/info12060249 ◽

2021 ◽

Vol 12 (6) ◽

pp. 249

Author(s):

Xin Jin ◽

Yuanwen Zou ◽

Zhongbing Huang

Keyword(s):

Cell Cycle ◽

Deep Learning ◽

Image Classification ◽

Classification Accuracy ◽

Data Augmentation ◽

Cycle Phase ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Cellular Life

The cell cycle is an important process in cellular life. In recent years, some image processing methods have been developed to determine the cell cycle stages of individual cells. However, in most of these methods, cells have to be segmented, and their features need to be extracted. During feature extraction, some important information may be lost, resulting in lower classification accuracy. Thus, we used a deep learning method to retain all cell features. In order to solve the problems surrounding insufficient numbers of original images and the imbalanced distribution of original images, we used the Wasserstein generative adversarial network-gradient penalty (WGAN-GP) for data augmentation. At the same time, a residual network (ResNet) was used for image classification. ResNet is one of the most used deep learning classification networks. The classification accuracy of cell cycle images was achieved more effectively with our method, reaching 83.88%. Compared with an accuracy of 79.40% in previous experiments, our accuracy increased by 4.48%. Another dataset was used to verify the effect of our model and, compared with the accuracy from previous results, our accuracy increased by 12.52%. The results showed that our new cell cycle image classification system based on WGAN-GP and ResNet is useful for the classification of imbalanced images. Moreover, our method could potentially solve the low classification accuracy in biomedical images caused by insufficient numbers of original images and the imbalanced distribution of original images.

Download Full-text

An Analysis of State-of-the-art Activation Functions For Supervised Deep Neural Network

10.31219/osf.io/2zk6a ◽

2021 ◽

Author(s):

Anh Nguyen ◽

Khoa Pham ◽

Dat Ngo ◽

Thanh Ngo ◽

Lam Pham

Keyword(s):

Neural Network ◽

Supervised Classification ◽

Deep Neural Network ◽

State Of The Art ◽

Network Architectures ◽

Activation Functions ◽

Scene Classification ◽

Learning Network ◽

Deep Learning Network

This paper provides an analysis of state-of-the-art activation functions with respect to supervised classification of deep neural network. These activation functions comprise of Rectified Linear Units (ReLU), Exponential Linear Unit (ELU), Scaled Exponential Linear Unit (SELU), Gaussian Error Linear Unit (GELU), and the Inverse Square Root Linear Unit (ISRLU). To evaluate, experiments over two deep learning network architectures integrating these activation functions are conducted. The first model, basing on Multilayer Perceptron (MLP), is evaluated with MNIST dataset to perform these activation functions.Meanwhile, the second model, likely VGGish-based architecture, is applied for Acoustic Scene Classification (ASC) Task 1A in DCASE 2018 challenge, thus evaluate whether these activation functions work well in different datasets as well as different network architectures.

Download Full-text

Classification of Brain Tumor MRIs Using Deep Learning and Data Augmentation

Advances in Intelligent Systems and Computing - Progress in Advanced Computing and Intelligent Engineering ◽

10.1007/978-981-33-4299-6_6 ◽

2021 ◽

pp. 69-83

Author(s):

Gulshansingh Bhagbut ◽

Zahra Mungloo-Dilmohamud

Keyword(s):

Deep Learning ◽

Brain Tumor ◽

Data Augmentation

Download Full-text

PulseNetOne: Fast Unsupervised Pruning of Convolutional Neural Networks for Remote Sensing

Remote Sensing ◽

10.3390/rs12071092 ◽

2020 ◽

Vol 12 (7) ◽

pp. 1092

Author(s):

David Browne ◽

Michael Giering ◽

Steven Prestwich

Keyword(s):

Remote Sensing ◽

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Networks ◽

Data Augmentation ◽

Recognition Task ◽

Scene Recognition ◽

Training Data ◽

Learning Approach ◽

Scene Classification

Scene classification is an important aspect of image/video understanding and segmentation. However, remote-sensing scene classification is a challenging image recognition task, partly due to the limited training data, which causes deep-learning Convolutional Neural Networks (CNNs) to overfit. Another difficulty is that images often have very different scales and orientation (viewing angle). Yet another is that the resulting networks may be very large, again making them prone to overfitting and unsuitable for deployment on memory- and energy-limited devices. We propose an efficient deep-learning approach to tackle these problems. We use transfer learning to compensate for the lack of data, and data augmentation to tackle varying scale and orientation. To reduce network size, we use a novel unsupervised learning approach based on k-means clustering, applied to all parts of the network: most network reduction methods use computationally expensive supervised learning methods, and apply only to the convolutional or fully connected layers, but not both. In experiments, we set new standards in classification accuracy on four remote-sensing and two scene-recognition image datasets.

Download Full-text

Comparison of Deep Learning, Data Augmentation and Bag of-Visual-Words for Classification of Imbalanced Image Datasets

Communications in Computer and Information Science - Recent Trends in Image Processing and Pattern Recognition ◽

10.1007/978-981-13-9181-1_49 ◽

2019 ◽

pp. 561-571 ◽

Cited By ~ 4

Author(s):

Manisha Saini ◽

Seba Susan

Keyword(s):

Deep Learning ◽

Data Augmentation ◽

Bag Of Visual Words ◽

Visual Words ◽

Learning Data ◽

Image Datasets

Download Full-text

Identification and classification of dental implant systems using various deep learning‐based convolutional neural network architectures

Clinical Oral Implants Research ◽

10.1111/clr.175_13509 ◽

2019 ◽

Vol 30 (S19) ◽

pp. 217-217

Author(s):

Lee Jae‐Hong

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Dental Implant ◽

Network Architectures ◽

Neural Network Architectures

Download Full-text

Large-Scale Whale-Call Classification by Transfer Learning on Multi-Scale Waveforms and Time-Frequency Features

Applied Sciences ◽

10.3390/app9051020 ◽

2019 ◽

Vol 9 (5) ◽

pp. 1020 ◽

Cited By ~ 6

Author(s):

Lilun Zhang ◽

Dezhi Wang ◽

Changchun Bao ◽

Yongxian Wang ◽

Kele Xu

Keyword(s):

Transfer Learning ◽

Large Scale ◽

Data Augmentation ◽

Feature Representation ◽

Biological Research ◽

Time Frequency ◽

Feature Representations ◽

Multi Scale ◽

Data Driven Approach

Whale vocal calls contain valuable information and abundant characteristics that are important for classification of whale sub-populations and related biological research. In this study, an effective data-driven approach based on pre-trained Convolutional Neural Networks (CNN) using multi-scale waveforms and time-frequency feature representations is developed in order to perform the classification of whale calls from a large open-source dataset recorded by sensors carried by whales. Specifically, the classification is carried out through a transfer learning approach by using pre-trained state-of-the-art CNN models in the field of computer vision. 1D raw waveforms and 2D log-mel features of the whale-call data are respectively used as the input of CNN models. For raw waveform input, windows are applied to capture multiple sketches of a whale-call clip at different time scales and stack the features from different sketches for classification. When using the log-mel features, the delta and delta-delta features are also calculated to produce a 3-channel feature representation for analysis. In the training, a 4-fold cross-validation technique is employed to reduce the overfitting effect, while the Mix-up technique is also applied to implement data augmentation in order to further improve the system performance. The results show that the proposed method can improve the accuracies by more than 20% in percentage for the classification into 16 whale pods compared with the baseline method using groups of 2D shape descriptors of spectrograms and the Fisher discriminant scores on the same dataset. Moreover, it is shown that classifications based on log-mel features have higher accuracies than those based directly on raw waveforms. The phylogeny graph is also produced to significantly illustrate the relationships among the whale sub-populations.

Download Full-text

Classification and Visualisation of Normal and Abnormal Radiographs; A Comparison between Eleven Convolutional Neural Network Architectures

Sensors ◽

10.3390/s21165381 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5381

Author(s):

Ananda Ananda ◽

Kwun Ho Ngan ◽

Cefa Karabağ ◽

Aram Ter-Sarkisov ◽

Eduardo Alonso ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Data Augmentation ◽

Kappa Coefficient ◽

Network Architectures ◽

Radiographic Images ◽

Cohen's Kappa ◽

Neural Network Architectures ◽

Activation Mapping

This paper investigates the classification of radiographic images with eleven convolutional neural network (CNN) architectures (GoogleNet, VGG-19, AlexNet, SqueezeNet, ResNet-18, Inception-v3, ResNet-50, VGG-16, ResNet-101, DenseNet-201 and Inception-ResNet-v2). The CNNs were used to classify a series of wrist radiographs from the Stanford Musculoskeletal Radiographs (MURA) dataset into two classes—normal and abnormal. The architectures were compared for different hyper-parameters against accuracy and Cohen’s kappa coefficient. The best two results were then explored with data augmentation. Without the use of augmentation, the best results were provided by Inception-ResNet-v2 (Mean accuracy = 0.723, Mean kappa = 0.506). These were significantly improved with augmentation to Inception-ResNet-v2 (Mean accuracy = 0.857, Mean kappa = 0.703). Finally, Class Activation Mapping was applied to interpret activation of the network against the location of an anomaly in the radiographs.

Download Full-text

Deep learning in remote sensing scene classification: a data augmentation enhanced convolutional neural network framework

GIScience & Remote Sensing ◽

10.1080/15481603.2017.1323377 ◽

2017 ◽

Vol 54 (5) ◽

pp. 741-758 ◽

Cited By ~ 58

Author(s):

Xingrui Yu ◽

Xiaomin Wu ◽

Chunbo Luo ◽

Peng Ren

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Deep Learning ◽

Convolutional Neural Network ◽

Data Augmentation ◽

Scene Classification

Download Full-text

Classification and Visualisation of Normal and Abnormal Radiographs; a comparison between Eleven Convolutional Neural Network Architectures

10.1101/2021.06.16.21259014 ◽

2021 ◽

Author(s):

Ananda Ananda ◽

Kwun Ho Ngan ◽

Cefa Karabag ◽

Eduardo Alonso ◽

Alex Ter-Sarkisov ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Data Augmentation ◽

Kappa Coefficient ◽

Network Architectures ◽

Radiographic Images ◽

Cohen's Kappa ◽

Neural Network Architectures ◽

Activation Mapping

This paper investigates the classification of radiographic images with eleven convolutional neural network (CNN) architectures (GoogleNet, VGG-19, AlexNet, SqueezeNet, ResNet-18, Inception-v3, ResNet-50, VGG-16, ResNet-101, DenseNet-201 and Inception-ResNet-v2). The CNNs were used to classify a series of wrist radiographs from the Stanford Musculoskeletal Radiographs (MURA) dataset into two classes - normal and abnormal. The architectures were compared for different hyper-parameters against accuracy and Cohen's kappa coefficient. The best two results were then explored with data augmentation. Without the use of augmentation, the best results were provided by Inception-Resnet-v2 (Mean accuracy = 0.723, Mean kappa = 0.506). These were significantly improved with augmentation to Inception-Resnet-v2 (Mean accuracy = 0.857, Mean kappa = 0.703). Finally, Class Activation Mapping was applied to interpret activation of the network against the location of an anomaly in the radiographs.

Download Full-text