Classification between Elderly Voices and Young Voices Using an Efficient Combination of Deep Learning Classifiers and Various Parameters

Ji-Yeoun Lee

doi:10.3390/app11219836

Classification between Elderly Voices and Young Voices Using an Efficient Combination of Deep Learning Classifiers and Various Parameters

Applied Sciences ◽

10.3390/app11219836 ◽

2021 ◽

Vol 11 (21) ◽

pp. 9836

Author(s):

Ji-Yeoun Lee

Keyword(s):

Neural Network ◽

Deep Learning ◽

Linear Prediction ◽

Feedforward Neural Network ◽

Mel Frequency Cepstral Coefficients ◽

Learning Classifiers ◽

Objective System ◽

Using Data ◽

Positive Effect ◽

Skewness And Kurtosis

The objective of this research was to develop deep learning classifiers and various parameters that provide an accurate and objective system for classifying elderly and young voice signals. This work focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for the detection of elderly voice signals using mel-frequency cepstral coefficients (MFCCs) and linear prediction cepstrum coefficients (LPCCs), skewness, as well as kurtosis parameters. In total, 126 subjects (63 elderly and 63 young) were obtained from the Saarbruecken voice database. The highest performance of 93.75% appeared when the skewness was added to the MFCC and MFCC delta parameters, although the fusion of the skewness and kurtosis parameters had a positive effect on the overall accuracy of the classification. The results of this study also revealed that the performance of FNN was higher than that of CNN. Most parameters estimated from male data samples demonstrated good performance in terms of gender. Rather than using mixed female and male data, this work recommends the development of separate systems that represent the best performance through each optimized parameter using data from independent male and female samples.

Download Full-text

Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice Database

Applied Sciences ◽

10.3390/app11157149 ◽

2021 ◽

Vol 11 (15) ◽

pp. 7149

Author(s):

Ji-Yeoun Lee

Keyword(s):

Neural Network ◽

Deep Learning ◽

Linear Prediction ◽

Detection System ◽

Mel Frequency Cepstral Coefficients ◽

Learning Methods ◽

Acoustic Measures ◽

Voice Detection ◽

Voice Data ◽

Pathological Voice

This work is focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for pathological voice detection using mel-frequency cepstral coefficients (MFCCs), linear prediction cepstrum coefficients (LPCCs), and higher-order statistics (HOSs) parameters. In total, 518 voice data samples were obtained from the publicly available Saarbruecken voice database (SVD), comprising recordings of 259 healthy and 259 pathological women and men, respectively, and using /a/, /i/, and /u/ vowels at normal pitch. Significant differences were observed between the normal and the pathological voice signals for normalized skewness (p = 0.000) and kurtosis (p = 0.000), except for normalized kurtosis (p = 0.051) that was estimated in the /u/ samples in women. These parameters are useful and meaningful for classifying pathological voice signals. The highest accuracy, 82.69%, was achieved by the CNN classifier with the LPCCs parameter in the /u/ vowel in men. The second-best performance, 80.77%, was obtained with a combination of the FNN classifier, MFCCs, and HOSs for the /i/ vowel samples in women. There was merit in combining the acoustic measures with HOS parameters for better characterization in terms of accuracy. The combination of various parameters and deep learning methods was also useful for distinguishing normal from pathological voices.

Download Full-text

Downscaling rainfall using deep learning long short‐term memory and feedforward neural network

International Journal of Climatology ◽

10.1002/joc.6066 ◽

2019 ◽

Vol 39 (10) ◽

pp. 4170-4188 ◽

Cited By ~ 3

Author(s):

Duong Tran Anh ◽

Song P. Van ◽

Thanh D. Dang ◽

Long P. Hoang

Keyword(s):

Neural Network ◽

Deep Learning ◽

Short Term Memory ◽

Feedforward Neural Network ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Download Full-text

Analysis of COVID-19 Infections on a CT Image Using DeepSense Model

Frontiers in Public Health ◽

10.3389/fpubh.2020.599550 ◽

2020 ◽

Vol 8 ◽

Author(s):

Adil Khadidos ◽

Alaa O. Khadidos ◽

Srihari Kannan ◽

Yuvaraj Natarajan ◽

Sachi Nandan Mohanty ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Input Data ◽

Training Data ◽

Learning Framework ◽

Learning Classifiers ◽

Tomography Image ◽

Mining Model ◽

Deep Learning Model

In this paper, a data mining model on a hybrid deep learning framework is designed to diagnose the medical conditions of patients infected with the coronavirus disease 2019 (COVID-19) virus. The hybrid deep learning model is designed as a combination of convolutional neural network (CNN) and recurrent neural network (RNN) and named as DeepSense method. It is designed as a series of layers to extract and classify the related features of COVID-19 infections from the lungs. The computerized tomography image is used as an input data, and hence, the classifier is designed to ease the process of classification on learning the multidimensional input data using the Expert Hidden layers. The validation of the model is conducted against the medical image datasets to predict the infections using deep learning classifiers. The results show that the DeepSense classifier offers accuracy in an improved manner than the conventional deep and machine learning classifiers. The proposed method is validated against three different datasets, where the training data are compared with 70%, 80%, and 90% training data. It specifically provides the quality of the diagnostic method adopted for the prediction of COVID-19 infections in a patient.

Download Full-text

Atrial Fibrillation Detection Using Feedforward Neural Network

10.21203/rs.3.rs-780802/v1 ◽

2021 ◽

Author(s):

Yunfan Chen ◽

Chong Zhang ◽

Chengyu Liu ◽

Yiming Wang ◽

Xiangkui Wan

Keyword(s):

Neural Network ◽

Atrial Fibrillation ◽

Deep Learning ◽

Detection System ◽

Recognition Rate ◽

Feedforward Neural Network ◽

Classification Model ◽

Detection Sensitivity ◽

Ecg Signal ◽

Atrial Fibrillation Detection

Abstract Atrial fibrillation is one of the most common arrhythmias in clinics, which has a great impact on people's physical and mental health. Electrocardiogram (ECG) based arrhythmia detection is widely used in early atrial fibrillation detection. However, ECG needs to be manually checked in clinical practice, which is time-consuming and labor-consuming. It is necessary to develop an automatic atrial fibrillation detection system. Recent research has demonstrated that deep learning technology can help to improve the performance of the automatic classification model of ECG signals. To this end, this work proposes effective deep learning based technology to automatically detect atrial fibrillation. First, novel preprocessing algorithms of wavelet transform and sliding window filtering (SWF) are introduced to reduce the noise of the ECG signal and to filter high-frequency components in the ECG signal, respectively. Then, a robust R-wave detection algorithm is developed, which achieves 99.22% detection sensitivity, 98.55% positive recognition rate, and 2.25% deviance on the MIT-BIH arrhythmia database. In addition, we propose a feedforward neural network (FNN) to detect atrial fibrillation based on ECG records. Experiments verified by a 10-fold cross-validation strategy show that the proposed model achieves competitive detection performance and can be applied to wearable detection devices. The proposed atrial fibrillation detection model achieves an accuracy of 84.00%, the detection sensitivity of 84.26%, the specificity of 93.23%, and the area under the receiver working curve of 89.40% on the mixed dataset composed of Challenge2017 database and MIT-BIH arrhythmia database.

Download Full-text

Cascaded deep learning classifiers for computer-aided diagnosis of COVID-19 and pneumonia diseases in X-ray scans

Complex & Intelligent Systems ◽

10.1007/s40747-020-00199-4 ◽

2020 ◽

Cited By ~ 1

Author(s):

Mohamed Esmail Karar ◽

Ezz El-Din Hemdan ◽

Marwa A. Shouman

Keyword(s):

Neural Network ◽

Deep Learning ◽

Computer Aided Diagnosis ◽

Detection Accuracy ◽

Cad Systems ◽

X Ray ◽

Learning Framework ◽

Learning Classifiers ◽

Computer Aided ◽

Aided Diagnosis

Abstract Computer-aided diagnosis (CAD) systems are considered a powerful tool for physicians to support identification of the novel Coronavirus Disease 2019 (COVID-19) using medical imaging modalities. Therefore, this article proposes a new framework of cascaded deep learning classifiers to enhance the performance of these CAD systems for highly suspected COVID-19 and pneumonia diseases in X-ray images. Our proposed deep learning framework constitutes two major advancements as follows. First, complicated multi-label classification of X-ray images have been simplified using a series of binary classifiers for each tested case of the health status. That mimics the clinical situation to diagnose potential diseases for a patient. Second, the cascaded architecture of COVID-19 and pneumonia classifiers is flexible to use different fine-tuned deep learning models simultaneously, achieving the best performance of confirming infected cases. This study includes eleven pre-trained convolutional neural network models, such as Visual Geometry Group Network (VGG) and Residual Neural Network (ResNet). They have been successfully tested and evaluated on public X-ray image dataset for normal and three diseased cases. The results of proposed cascaded classifiers showed that VGG16, ResNet50V2, and Dense Neural Network (DenseNet169) models achieved the best detection accuracy of COVID-19, viral (Non-COVID-19) pneumonia, and bacterial pneumonia images, respectively. Furthermore, the performance of our cascaded deep learning classifiers is superior to other multi-label classification methods of COVID-19 and pneumonia diseases in previous studies. Therefore, the proposed deep learning framework presents a good option to be applied in the clinical routine to assist the diagnostic procedures of COVID-19 infection.

Download Full-text

Qualitative Analysis of PLP in LSTM for Bangla Speech Recognition

The International journal of Multimedia & Its Applications ◽

10.5121/ijma.2020.12501 ◽

2020 ◽

Vol 12 (5) ◽

pp. 1-8

Author(s):

Nahyan Al Mahmud ◽

Shahfida Amjad Munni

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Linear Prediction ◽

Short Term Memory ◽

Acoustic Features ◽

Linear Predictive Coding ◽

Acoustic Feature ◽

Mel Frequency Cepstral Coefficients ◽

Bhattacharyya Distance ◽

Perceptual Linear Prediction

The performance of various acoustic feature extraction methods has been compared in this work using Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic features are a series of vectors that represents the speech signals. They can be classified in either words or sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) have also been used. These two methods closely resemble the human auditory system. These feature vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to investigate the nature of those acoustic features.

Download Full-text

Vowels and Prosody Contribution in Neural Network Based Voice Conversion Algorithm with Noisy Training Data

European Journal of Engineering Research and Science ◽

10.24018/ejers.2020.5.3.1802 ◽

2020 ◽

Vol 5 (3) ◽

pp. 229-233

Author(s):

Olaide Ayodeji Agbolade

Keyword(s):

Neural Network ◽

Significant Contribution ◽

Linear Prediction ◽

Feedforward Neural Network ◽

Training Data ◽

Voice Conversion ◽

Conversion Model ◽

The Voice ◽

Average Noise Level ◽

Target Speaker

This research presents a neural network based voice conversion model. While it is a known fact that voiced sounds and prosody are the most important component of the voice conversion framework, what is not known is their objective contributions particularly in a noisy and uncontrolled environment. This model uses a 3 layer feedforward neural network to map the Linear prediction analysis coefficients of a source speaker to the acoustic vector space of the target speaker with a view to objectively determine the contributions of the voiced, unvoiced and supra-segmental components of sounds to the voice conversion model. Results showed that vowels “a”, “i”, “o” have the most significant contribution in the conversion success. The voiceless sounds were also found to be most affected by the noisy training data. An average noise level of 40 dB above the noise floor were found to degrade the voice conversion success by 55.14 percent relative to the voiced sounds. The result also show that for cross-gender voice conversion, prosody conversion is more significant in scenarios where a female is the target speaker.

Download Full-text

Deep Learning-Based LOS and NLOS Identification in Wireless Body Area Networks

Sensors ◽

10.3390/s19194229 ◽

2019 ◽

Vol 19 (19) ◽

pp. 4229 ◽

Cited By ~ 6

Author(s):

Krzysztof K. Cwalina ◽

Piotr Rajchowski ◽

Olga Blaszkiewicz ◽

Alicja Olejniczak ◽

Jaroslaw Sadowski

Keyword(s):

Neural Network ◽

Deep Learning ◽

Impulse Response ◽

Indoor Environment ◽

Measurement Data ◽

Body Area Networks ◽

Wireless Body Area Networks ◽

Feedforward Neural Network ◽

Ultra Wideband ◽

Body Area

In this article, the usage of deep learning (DL) in ultra-wideband (UWB) Wireless Body Area Networks (WBANs) is presented. The developed approach, using channel impulse response, allows higher efficiency in identifying the direct visibility conditions between nodes in off-body communication with comparison to the methods described in the literature. The effectiveness of the proposed deep feedforward neural network was checked on the basis of the measurement data for dynamic scenarios in an indoor environment. The obtained results clearly prove the validity of the proposed DL approach in the UWB WBANs and high (over 98.6% for most cases) efficiency for LOS and NLOS conditions classification.

Download Full-text

Arabic Speech Classification Method Based on Padding and Deep Learning Neural Network

Baghdad Science Journal ◽

10.21123/bsj.2021.18.2(suppl.).0925 ◽

2021 ◽

Vol 18 (2(Suppl.)) ◽

pp. 0925

Author(s):

Asroni Asroni ◽

Ku Ruhana Ku-Mahamud ◽

Cahya Damarjati ◽

Hasan Basri Slamat

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolution Neural Network ◽

Classification Model ◽

Mel Frequency Cepstral Coefficients ◽

Speech Classification ◽

Deep Learning Neural Network ◽

Voice Data ◽

The Voice

Deep learning convolution neural network has been widely used to recognize or classify voice. Various techniques have been used together with convolution neural network to prepare voice data before the training process in developing the classification model. However, not all model can produce good classification accuracy as there are many types of voice or speech. Classification of Arabic alphabet pronunciation is a one of the types of voice and accurate pronunciation is required in the learning of the Qur’an reading. Thus, the technique to process the pronunciation and training of the processed data requires specific approach. To overcome this issue, a method based on padding and deep learning convolution neural network is proposed to evaluate the pronunciation of the Arabic alphabet. Voice data from six school children are recorded and used to test the performance of the proposed method. The padding technique has been used to augment the voice data before feeding the data to the CNN structure to developed the classification model. In addition, three other feature extraction techniques have been introduced to enable the comparison of the proposed method which employs padding technique. The performance of the proposed method with padding technique is at par with the spectrogram but better than mel-spectrogram and mel-frequency cepstral coefficients. Results also show that the proposed method was able to distinguish the Arabic alphabets that are difficult to pronounce. The proposed method with padding technique may be extended to address other voice pronunciation ability other than the Arabic alphabets.

Download Full-text

BowTie – A deep learning feedforward neural network for sentiment analysis

10.6028/nist.cswp.04222019 ◽

2019 ◽

Author(s):

Apostol Vassilev

Keyword(s):

Neural Network ◽

Deep Learning ◽

Sentiment Analysis ◽

Feedforward Neural Network

Download Full-text