scholarly journals Class-Imbalanced Voice Pathology Detection and Classification Using Fuzzy Cluster Oversampling Method

2021 ◽  
Vol 11 (8) ◽  
pp. 3450
Author(s):  
Ziqi Fan ◽  
Yuanbo Wu ◽  
Changwei Zhou ◽  
Xiaojun Zhang ◽  
Zhi Tao

The Massachusetts Eye and Ear Infirmary (MEEI) database is an international-standard training database for voice pathology detection (VPD) systems. However, there is a class-imbalanced distribution in normal and pathological voice samples and different types of pathological voice samples in the MEEI database. This study aimed to develop a VPD system that uses the fuzzy clustering synthetic minority oversampling technique algorithm (FC-SMOTE) to automatically detect and classify four types of pathological voices in a multi-class imbalanced database. The proposed FC-SMOTE algorithm processes the initial class-imbalanced dataset. A set of machine learning models was evaluated and validated using the resulting class-balanced dataset as an input. The effectiveness of the VPD system with FC-SMOTE was further verified by an external validation set and another pathological voice database (Saarbruecken Voice Database (SVD)). The experimental results show that, in the multi-classification of pathological voice for the class-imbalanced dataset, the method we propose can significantly improve the diagnostic accuracy. Meanwhile, FC-SMOTE outperforms the traditional imbalanced data oversampling algorithms, and it is preferred for imbalanced voice diagnosis in practical applications.

In this paper, the authors present an effort to increase the applicability domain (AD) by means of retraining models using a database of 701 great dissimilar molecules presenting anti-tyrosinase activity and 728 drugs with other uses. Atom-based linear indices and best subset linear discriminant analysis (LDA) were used to develop individual classification models. Eighteen individual classification-based QSAR models for the tyrosinase inhibitory activity were obtained with global accuracy varying from 88.15-91.60% in the training set and values of Matthews correlation coefficients (C) varying from 0.76-0.82. The external validation set shows globally classifications above 85.99% and 0.72 for C. All individual models were validated and fulfilled by OECD principles. A brief analysis of AD for the training set of 478 compounds and the new active compounds included in the re-training was carried out. Various assembled multiclassifier systems contained eighteen models using different selection criterions were obtained, which provide possibility of select the best strategy for particular problem. The various assembled multiclassifier systems also estimated the potency of active identified compounds. Eighteen validated potency models by OECD principles were used.


Author(s):  
Gerardo M. Casañola-Martín ◽  
Mahmud Tareq Hassan Khan ◽  
Huong Le-Thi-Thu ◽  
Yovani Marrero-Ponce ◽  
Ramón García-Domenech ◽  
...  

In this paper, the authors present an effort to increase the applicability domain (AD) by means of retraining models using a database of 701 great dissimilar molecules presenting anti-tyrosinase activity and 728 drugs with other uses. Atom-based linear indices and best subset linear discriminant analysis (LDA) were used to develop individual classification models. Eighteen individual classification-based QSAR models for the tyrosinase inhibitory activity were obtained with global accuracy varying from 88.15-91.60% in the training set and values of Matthews correlation coefficients (C) varying from 0.76-0.82. The external validation set shows globally classifications above 85.99% and 0.72 for C. All individual models were validated and fulfilled by OECD principles. A brief analysis of AD for the training set of 478 compounds and the new active compounds included in the re-training was carried out. Various assembled multiclassifier systems contained eighteen models using different selection criterions were obtained, which provide possibility of select the best strategy for particular problem. The various assembled multiclassifier systems also estimated the potency of active identified compounds. Eighteen validated potency models by OECD principles were used.


2011 ◽  
Vol 8 (3) ◽  
pp. 105-117 ◽  
Author(s):  
Rosalía Laza ◽  
Reyes Pavón ◽  
Miguel Reboiro-Jato ◽  
Florentino Fdez-Riverola

Summary Nowadays, document classification has become an interesting research field. Partly, this is due to the increasing availability of biomedical information in digital form which is necessary to catalogue and organize. In this context, machine learning techniques are usually applied to text classification by using a general inductive process that automatically builds a text classifier from a set of pre-classified documents. Related with this domain, imbalanced data is a well-known problem in many practical applications of knowledge discovery and its effects on the performance of standard classifiers are remarkable. In this paper, we investigate the application of a Bayesian Network (BN) model for the triage of documents, which are represented by the association of different MeSH terms. Our results show that BNs are adequate for describing conditional independencies between MeSH terms and that MeSH ontology is a valuable resource for representing Medline documents at different abstraction levels. Moreover, we perform an extensive experimental evaluation to investigate if the classification of Medline documents using a BN classifier poses additional challenges when dealing with class-imbalanced prediction. The evaluation involves two methods, under-sampling and cost-sensitive learning. We conclude that BN classifier is sensitive to both balancing strategies and existing techniques can improve its overall performance.


2020 ◽  
Vol 31 (2) ◽  
pp. 25
Author(s):  
Liqaa M. Shoohi ◽  
Jamila H. Saud

Classification of imbalanced data is an important issue. Many algorithms have been developed for classification, such as Back Propagation (BP) neural networks, decision tree, Bayesian networks etc., and have been used repeatedly in many fields. These algorithms speak of the problem of imbalanced data, where there are situations that belong to more classes than others. Imbalanced data result in poor performance and bias to a class without other classes. In this paper, we proposed three techniques based on the Over-Sampling (O.S.) technique for processing imbalanced dataset and redistributing it and converting it into balanced dataset. These techniques are (Improved Synthetic Minority Over-Sampling Technique (Improved SMOTE),  Borderline-SMOTE + Imbalanced Ratio(IR), Adaptive Synthetic Sampling (ADASYN) +IR) Algorithm, where the work these techniques are generate the synthetic samples for the minority class to achieve balance between minority and majority classes and then calculate the IR between classes of minority and majority. Experimental results show ImprovedSMOTE algorithm outperform the Borderline-SMOTE + IR and ADASYN + IR algorithms because it achieves a high balance between minority and majority classes.


Author(s):  
Thomas Dratsch ◽  
Michael Korenkov ◽  
David Zopfs ◽  
Sebastian Brodehl ◽  
Bettina Baessler ◽  
...  

Abstract Objectives The goal of the present study was to classify the most common types of plain radiographs using a neural network and to validate the network’s performance on internal and external data. Such a network could help improve various radiological workflows. Methods All radiographs from the year 2017 (n = 71,274) acquired at our institution were retrieved from the PACS. The 30 largest categories (n = 58,219, 81.7% of all radiographs performed in 2017) were used to develop and validate a neural network (MobileNet v1.0) using transfer learning. Image categories were extracted from DICOM metadata (study and image description) and mapped to the WHO manual of diagnostic imaging. As an independent, external validation set, we used images from other institutions that had been stored in our PACS (n = 5324). Results In the internal validation, the overall accuracy of the model was 90.3% (95%CI: 89.2–91.3%), whereas, for the external validation set, the overall accuracy was 94.0% (95%CI: 93.3–94.6%). Conclusions Using data from one single institution, we were able to classify the most common categories of radiographs with a neural network. The network showed good generalizability on the external validation set and could be used to automatically organize a PACS, preselect radiographs so that they can be routed to more specialized networks for abnormality detection or help with other parts of the radiological workflow (e.g., automated hanging protocols; check if ordered image and performed image are the same). The final AI algorithm is publicly available for evaluation and extension. Key Points • Data from one single institution can be used to train a neural network for the correct detection of the 30 most common categories of plain radiographs. • The trained model achieved a high accuracy for the majority of categories and showed good generalizability to images from other institutions. • The neural network is made publicly available and can be used to automatically organize a PACS or to preselect radiographs so that they can be routed to more specialized neural networks for abnormality detection.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sylvia Kalli ◽  
Carla Araya-Cloutier ◽  
Jos Hageman ◽  
Jean-Paul Vincken

AbstractHigh resistance towards traditional antibiotics has urged the development of new, natural therapeutics against methicillin-resistant Staphylococcus aureus (MRSA). Prenylated (iso)flavonoids, present mainly in the Fabaceae, can serve as promising candidates. Herein, the anti-MRSA properties of 23 prenylated (iso)flavonoids were assessed in-vitro. The di-prenylated (iso)flavonoids, glabrol (flavanone) and 6,8-diprenyl genistein (isoflavone), together with the mono-prenylated, 4′-O-methyl glabridin (isoflavan), were the most active anti-MRSA compounds (Minimum Inhibitory Concentrations (MIC) ≤ 10 µg/mL, 30 µM). The in-house activity data was complemented with literature data to yield an extended, curated dataset of 67 molecules for the development of robust in-silico prediction models. A QSAR model having a good fit (R2adj 0.61), low average prediction errors and a good predictive power (Q2) for the training (4% and Q2LOO 0.57, respectively) and the test set (5% and Q2test 0.75, respectively) was obtained. Furthermore, the model predicted well the activity of an external validation set (on average 5% prediction errors), as well as the level of activity (low, moderate, high) of prenylated (iso)flavonoids against other Gram-positive bacteria. For the first time, the importance of formal charge, besides hydrophobic volume and hydrogen-bonding, in the anti-MRSA activity was highlighted, thereby suggesting potentially different modes of action of the different prenylated (iso)flavonoids.


Sign in / Sign up

Export Citation Format

Share Document