scholarly journals Classification between Elderly Voices and Young Voices Using an Efficient Combination of Deep Learning Classifiers and Various Parameters

2021 ◽  
Vol 11 (21) ◽  
pp. 9836
Author(s):  
Ji-Yeoun Lee

The objective of this research was to develop deep learning classifiers and various parameters that provide an accurate and objective system for classifying elderly and young voice signals. This work focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for the detection of elderly voice signals using mel-frequency cepstral coefficients (MFCCs) and linear prediction cepstrum coefficients (LPCCs), skewness, as well as kurtosis parameters. In total, 126 subjects (63 elderly and 63 young) were obtained from the Saarbruecken voice database. The highest performance of 93.75% appeared when the skewness was added to the MFCC and MFCC delta parameters, although the fusion of the skewness and kurtosis parameters had a positive effect on the overall accuracy of the classification. The results of this study also revealed that the performance of FNN was higher than that of CNN. Most parameters estimated from male data samples demonstrated good performance in terms of gender. Rather than using mixed female and male data, this work recommends the development of separate systems that represent the best performance through each optimized parameter using data from independent male and female samples.

2021 ◽  
Vol 11 (15) ◽  
pp. 7149
Author(s):  
Ji-Yeoun Lee

This work is focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for pathological voice detection using mel-frequency cepstral coefficients (MFCCs), linear prediction cepstrum coefficients (LPCCs), and higher-order statistics (HOSs) parameters. In total, 518 voice data samples were obtained from the publicly available Saarbruecken voice database (SVD), comprising recordings of 259 healthy and 259 pathological women and men, respectively, and using /a/, /i/, and /u/ vowels at normal pitch. Significant differences were observed between the normal and the pathological voice signals for normalized skewness (p = 0.000) and kurtosis (p = 0.000), except for normalized kurtosis (p = 0.051) that was estimated in the /u/ samples in women. These parameters are useful and meaningful for classifying pathological voice signals. The highest accuracy, 82.69%, was achieved by the CNN classifier with the LPCCs parameter in the /u/ vowel in men. The second-best performance, 80.77%, was obtained with a combination of the FNN classifier, MFCCs, and HOSs for the /i/ vowel samples in women. There was merit in combining the acoustic measures with HOS parameters for better characterization in terms of accuracy. The combination of various parameters and deep learning methods was also useful for distinguishing normal from pathological voices.


2020 ◽  
Vol 8 ◽  
Author(s):  
Adil Khadidos ◽  
Alaa O. Khadidos ◽  
Srihari Kannan ◽  
Yuvaraj Natarajan ◽  
Sachi Nandan Mohanty ◽  
...  

In this paper, a data mining model on a hybrid deep learning framework is designed to diagnose the medical conditions of patients infected with the coronavirus disease 2019 (COVID-19) virus. The hybrid deep learning model is designed as a combination of convolutional neural network (CNN) and recurrent neural network (RNN) and named as DeepSense method. It is designed as a series of layers to extract and classify the related features of COVID-19 infections from the lungs. The computerized tomography image is used as an input data, and hence, the classifier is designed to ease the process of classification on learning the multidimensional input data using the Expert Hidden layers. The validation of the model is conducted against the medical image datasets to predict the infections using deep learning classifiers. The results show that the DeepSense classifier offers accuracy in an improved manner than the conventional deep and machine learning classifiers. The proposed method is validated against three different datasets, where the training data are compared with 70%, 80%, and 90% training data. It specifically provides the quality of the diagnostic method adopted for the prediction of COVID-19 infections in a patient.


2021 ◽  
Author(s):  
Yunfan Chen ◽  
Chong Zhang ◽  
Chengyu Liu ◽  
Yiming Wang ◽  
Xiangkui Wan

Abstract Atrial fibrillation is one of the most common arrhythmias in clinics, which has a great impact on people's physical and mental health. Electrocardiogram (ECG) based arrhythmia detection is widely used in early atrial fibrillation detection. However, ECG needs to be manually checked in clinical practice, which is time-consuming and labor-consuming. It is necessary to develop an automatic atrial fibrillation detection system. Recent research has demonstrated that deep learning technology can help to improve the performance of the automatic classification model of ECG signals. To this end, this work proposes effective deep learning based technology to automatically detect atrial fibrillation. First, novel preprocessing algorithms of wavelet transform and sliding window filtering (SWF) are introduced to reduce the noise of the ECG signal and to filter high-frequency components in the ECG signal, respectively. Then, a robust R-wave detection algorithm is developed, which achieves 99.22% detection sensitivity, 98.55% positive recognition rate, and 2.25% deviance on the MIT-BIH arrhythmia database. In addition, we propose a feedforward neural network (FNN) to detect atrial fibrillation based on ECG records. Experiments verified by a 10-fold cross-validation strategy show that the proposed model achieves competitive detection performance and can be applied to wearable detection devices. The proposed atrial fibrillation detection model achieves an accuracy of 84.00%, the detection sensitivity of 84.26%, the specificity of 93.23%, and the area under the receiver working curve of 89.40% on the mixed dataset composed of Challenge2017 database and MIT-BIH arrhythmia database.


Author(s):  
Mohamed Esmail Karar ◽  
Ezz El-Din Hemdan ◽  
Marwa A. Shouman

Abstract Computer-aided diagnosis (CAD) systems are considered a powerful tool for physicians to support identification of the novel Coronavirus Disease 2019 (COVID-19) using medical imaging modalities. Therefore, this article proposes a new framework of cascaded deep learning classifiers to enhance the performance of these CAD systems for highly suspected COVID-19 and pneumonia diseases in X-ray images. Our proposed deep learning framework constitutes two major advancements as follows. First, complicated multi-label classification of X-ray images have been simplified using a series of binary classifiers for each tested case of the health status. That mimics the clinical situation to diagnose potential diseases for a patient. Second, the cascaded architecture of COVID-19 and pneumonia classifiers is flexible to use different fine-tuned deep learning models simultaneously, achieving the best performance of confirming infected cases. This study includes eleven pre-trained convolutional neural network models, such as Visual Geometry Group Network (VGG) and Residual Neural Network (ResNet). They have been successfully tested and evaluated on public X-ray image dataset for normal and three diseased cases. The results of proposed cascaded classifiers showed that VGG16, ResNet50V2, and Dense Neural Network (DenseNet169) models achieved the best detection accuracy of COVID-19, viral (Non-COVID-19) pneumonia, and bacterial pneumonia images, respectively. Furthermore, the performance of our cascaded deep learning classifiers is superior to other multi-label classification methods of COVID-19 and pneumonia diseases in previous studies. Therefore, the proposed deep learning framework presents a good option to be applied in the clinical routine to assist the diagnostic procedures of COVID-19 infection.


2020 ◽  
Vol 12 (5) ◽  
pp. 1-8
Author(s):  
Nahyan Al Mahmud ◽  
Shahfida Amjad Munni

The performance of various acoustic feature extraction methods has been compared in this work using Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic features are a series of vectors that represents the speech signals. They can be classified in either words or sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) have also been used. These two methods closely resemble the human auditory system. These feature vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to investigate the nature of those acoustic features.


2020 ◽  
Vol 5 (3) ◽  
pp. 229-233
Author(s):  
Olaide Ayodeji Agbolade

This research presents a neural network based voice conversion model. While it is a known fact that voiced sounds and prosody are the most important component of the voice conversion framework, what is not known is their objective contributions particularly in a noisy and uncontrolled environment. This model uses a 3 layer feedforward neural network to map the Linear prediction analysis coefficients of a source speaker to the acoustic vector space of the target speaker with a view to objectively determine the contributions of the voiced, unvoiced and supra-segmental components of sounds to the voice conversion model. Results showed that vowels “a”, “i”, “o” have the most significant contribution in the conversion success. The voiceless sounds were also found to be most affected by the noisy training data. An average noise level of 40 dB above the noise floor were found to degrade the voice conversion success by 55.14 percent relative to the voiced sounds. The result also show that for cross-gender voice conversion, prosody conversion is more significant in scenarios where a female is the target speaker.


Sensors ◽  
2019 ◽  
Vol 19 (19) ◽  
pp. 4229 ◽  
Author(s):  
Krzysztof K. Cwalina ◽  
Piotr Rajchowski ◽  
Olga Blaszkiewicz ◽  
Alicja Olejniczak ◽  
Jaroslaw Sadowski

In this article, the usage of deep learning (DL) in ultra-wideband (UWB) Wireless Body Area Networks (WBANs) is presented. The developed approach, using channel impulse response, allows higher efficiency in identifying the direct visibility conditions between nodes in off-body communication with comparison to the methods described in the literature. The effectiveness of the proposed deep feedforward neural network was checked on the basis of the measurement data for dynamic scenarios in an indoor environment. The obtained results clearly prove the validity of the proposed DL approach in the UWB WBANs and high (over 98.6% for most cases) efficiency for LOS and NLOS conditions classification.


2021 ◽  
Vol 18 (2(Suppl.)) ◽  
pp. 0925
Author(s):  
Asroni Asroni ◽  
Ku Ruhana Ku-Mahamud ◽  
Cahya Damarjati ◽  
Hasan Basri Slamat

Deep learning convolution neural network has been widely used to recognize or classify voice. Various techniques have been used together with convolution neural network to prepare voice data before the training process in developing the classification model. However, not all model can produce good classification accuracy as there are many types of voice or speech. Classification of Arabic alphabet pronunciation is a one of the types of voice and accurate pronunciation is required in the learning of the Qur’an reading. Thus, the technique to process the pronunciation and training of the processed data requires specific approach. To overcome this issue, a method based on padding and deep learning convolution neural network is proposed to evaluate the pronunciation of the Arabic alphabet. Voice data from six school children are recorded and used to test the performance of the proposed method. The padding technique has been used to augment the voice data before feeding the data to the CNN structure to developed the classification model. In addition, three other feature extraction techniques have been introduced to enable the comparison of the proposed method which employs padding technique. The performance of the proposed method with padding technique is at par with the spectrogram but better than mel-spectrogram and mel-frequency cepstral coefficients. Results also show that the proposed method was able to distinguish the Arabic alphabets that are difficult to pronounce. The proposed method with padding technique may be extended to address other voice pronunciation ability other than the Arabic alphabets.


Sign in / Sign up

Export Citation Format

Share Document