Bidirectional neural network for pathological voice detection

Author(s):  
Iman Esmaili ◽  
Nader Jafarnia Dabanloo ◽  
Keyvan Maghooli
2021 ◽  
Vol 11 (15) ◽  
pp. 7149
Author(s):  
Ji-Yeoun Lee

This work is focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for pathological voice detection using mel-frequency cepstral coefficients (MFCCs), linear prediction cepstrum coefficients (LPCCs), and higher-order statistics (HOSs) parameters. In total, 518 voice data samples were obtained from the publicly available Saarbruecken voice database (SVD), comprising recordings of 259 healthy and 259 pathological women and men, respectively, and using /a/, /i/, and /u/ vowels at normal pitch. Significant differences were observed between the normal and the pathological voice signals for normalized skewness (p = 0.000) and kurtosis (p = 0.000), except for normalized kurtosis (p = 0.051) that was estimated in the /u/ samples in women. These parameters are useful and meaningful for classifying pathological voice signals. The highest accuracy, 82.69%, was achieved by the CNN classifier with the LPCCs parameter in the /u/ vowel in men. The second-best performance, 80.77%, was obtained with a combination of the FNN classifier, MFCCs, and HOSs for the /i/ vowel samples in women. There was merit in combining the acoustic measures with HOS parameters for better characterization in terms of accuracy. The combination of various parameters and deep learning methods was also useful for distinguishing normal from pathological voices.


2020 ◽  
Author(s):  
Hao-Chun Hu ◽  
Shyue-Yih Chang ◽  
Chuen-Heng Wang ◽  
Kai-Jun Li ◽  
Hsiao-Yun Cho ◽  
...  

BACKGROUND Dysphonia influences the quality of life by interfering with communication. However, laryngoscopic examination is expensive and not readily accessible in primary care units. Experienced laryngologists are required to achieve an accurate diagnosis. OBJECTIVE This study sought to detect various vocal fold diseases through pathological voice recognition using artificial intelligence. METHODS We collected 29 normal voice samples and 527 samples of individuals with voice disorders, including vocal atrophy (n=210), unilateral vocal paralysis (n=43), organic vocal fold lesions (n=244), and adductor spasmodic dysphonia (n=30). The 556 samples were divided into two sets: 440 samples as the training set and 116 samples as the testing set. A convolutional neural network approach was applied to train the model and findings were compared with human specialists. RESULTS The convolutional neural network model achieved a sensitivity of 0.70, a specificity of 0.90, and an overall accuracy of 65.5% for distinguishing normal voice, vocal atrophy, unilateral vocal paralysis, organic vocal fold lesions, and adductor spasmodic dysphonia. Compared to human specialists, the overall accuracy was 58.6% and 49.1% for the two laryngologists, and 38.8% and 34.5% for the two general ear, nose, and throat doctors. CONCLUSIONS We developed an artificial intelligence-based screening tool for common vocal fold diseases, which possessed high specificity after training with our Mandarin pathological voice database. This approach has clinical potential to use artificial intelligence for general vocal fold disease screening via voice and includes a quick survey during a general health examination. It can be applied in telemedicine for areas that lack laryngoscopic abilities in primary care units.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 66749-66776
Author(s):  
Rumana Islam ◽  
Mohammed Tarique ◽  
Esam Abdel-Raheem

2021 ◽  
Author(s):  
Zhang Yihua ◽  
Zhu Xincheng ◽  
Wu Yuanbo ◽  
Zhang Xiaojun ◽  
Xu Yishen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document