Class-Imbalanced Voice Pathology Detection and Classification Using Fuzzy Cluster Oversampling Method

In this paper, the authors present an effort to increase the applicability domain (AD) by means of retraining models using a database of 701 great dissimilar molecules presenting anti-tyrosinase activity and 728 drugs with other uses. Atom-based linear indices and best subset linear discriminant analysis (LDA) were used to develop individual classification models. Eighteen individual classification-based QSAR models for the tyrosinase inhibitory activity were obtained with global accuracy varying from 88.15-91.60% in the training set and values of Matthews correlation coefficients (C) varying from 0.76-0.82. The external validation set shows globally classifications above 85.99% and 0.72 for C. All individual models were validated and fulfilled by OECD principles. A brief analysis of AD for the training set of 478 compounds and the new active compounds included in the re-training was carried out. Various assembled multiclassifier systems contained eighteen models using different selection criterions were obtained, which provide possibility of select the best strategy for particular problem. The various assembled multiclassifier systems also estimated the potency of active identified compounds. Eighteen validated potency models by OECD principles were used.

Download Full-text

Retrained Classification of Tyrosinase Inhibitors and “In Silico” Potency Estimation by Using Atom-Type Linear Indices

International Journal of Chemoinformatics and Chemical Engineering ◽

10.4018/ijcce.2012070104 ◽

2012 ◽

Vol 2 (2) ◽

pp. 42-144

Author(s):

Gerardo M. Casañola-Martín ◽

Mahmud Tareq Hassan Khan ◽

Huong Le-Thi-Thu ◽

Yovani Marrero-Ponce ◽

Ramón García-Domenech ◽

...

Keyword(s):

External Validation ◽

Correlation Coefficients ◽

Training Set ◽

Atom Type ◽

Linear Discriminant ◽

Oecd Principles ◽

Qsar Models ◽

Validation Set ◽

Global Accuracy

In this paper, the authors present an effort to increase the applicability domain (AD) by means of retraining models using a database of 701 great dissimilar molecules presenting anti-tyrosinase activity and 728 drugs with other uses. Atom-based linear indices and best subset linear discriminant analysis (LDA) were used to develop individual classification models. Eighteen individual classification-based QSAR models for the tyrosinase inhibitory activity were obtained with global accuracy varying from 88.15-91.60% in the training set and values of Matthews correlation coefficients (C) varying from 0.76-0.82. The external validation set shows globally classifications above 85.99% and 0.72 for C. All individual models were validated and fulfilled by OECD principles. A brief analysis of AD for the training set of 478 compounds and the new active compounds included in the re-training was carried out. Various assembled multiclassifier systems contained eighteen models using different selection criterions were obtained, which provide possibility of select the best strategy for particular problem. The various assembled multiclassifier systems also estimated the potency of active identified compounds. Eighteen validated potency models by OECD principles were used.

Download Full-text

Evaluating the effect of unbalanced data in biomedical document classification

Journal of Integrative Bioinformatics ◽

10.1515/jib-2011-177 ◽

2011 ◽

Vol 8 (3) ◽

pp. 105-117 ◽

Cited By ~ 9

Author(s):

Rosalía Laza ◽

Reyes Pavón ◽

Miguel Reboiro-Jato ◽

Florentino Fdez-Riverola

Keyword(s):

Imbalanced Data ◽

Document Classification ◽

Research Field ◽

Machine Learning Techniques ◽

Practical Applications ◽

Mesh Terms ◽

Learning Techniques ◽

Under Sampling ◽

Abstraction Levels

Summary Nowadays, document classification has become an interesting research field. Partly, this is due to the increasing availability of biomedical information in digital form which is necessary to catalogue and organize. In this context, machine learning techniques are usually applied to text classification by using a general inductive process that automatically builds a text classifier from a set of pre-classified documents. Related with this domain, imbalanced data is a well-known problem in many practical applications of knowledge discovery and its effects on the performance of standard classifiers are remarkable. In this paper, we investigate the application of a Bayesian Network (BN) model for the triage of documents, which are represented by the association of different MeSH terms. Our results show that BNs are adequate for describing conditional independencies between MeSH terms and that MeSH ontology is a valuable resource for representing Medline documents at different abstraction levels. Moreover, we perform an extensive experimental evaluation to investigate if the classification of Medline documents using a BN classifier poses additional challenges when dealing with class-imbalanced prediction. The evaluation involves two methods, under-sampling and cost-sensitive learning. We conclude that BN classifier is sensitive to both balancing strategies and existing techniques can improve its overall performance.

Download Full-text

Adaptation Proposed Methods for Handling Imbalanced Datasets based on Over-Sampling Technique

Al-Mustansiriyah Journal of Science ◽

10.23851/mjs.v31i2.740 ◽

2020 ◽

Vol 31 (2) ◽

pp. 25

Author(s):

Liqaa M. Shoohi ◽

Jamila H. Saud

Keyword(s):

Neural Networks ◽

Decision Tree ◽

Back Propagation ◽

Imbalanced Data ◽

Sampling Technique ◽

Poor Performance ◽

Imbalanced Dataset ◽

Minority Class ◽

Data Result

Classification of imbalanced data is an important issue. Many algorithms have been developed for classification, such as Back Propagation (BP) neural networks, decision tree, Bayesian networks etc., and have been used repeatedly in many fields. These algorithms speak of the problem of imbalanced data, where there are situations that belong to more classes than others. Imbalanced data result in poor performance and bias to a class without other classes. In this paper, we proposed three techniques based on the Over-Sampling (O.S.) technique for processing imbalanced dataset and redistributing it and converting it into balanced dataset. These techniques are (Improved Synthetic Minority Over-Sampling Technique (Improved SMOTE), Borderline-SMOTE + Imbalanced Ratio(IR), Adaptive Synthetic Sampling (ADASYN) +IR) Algorithm, where the work these techniques are generate the synthetic samples for the minority class to achieve balance between minority and majority classes and then calculate the IR between classes of minority and majority. Experimental results show ImprovedSMOTE algorithm outperform the Borderline-SMOTE + IR and ADASYN + IR algorithms because it achieves a high balance between minority and majority classes.

Download Full-text

Practical applications of deep learning: classifying the most common categories of plain radiographs in a PACS using a neural network

European Radiology ◽

10.1007/s00330-020-07241-6 ◽

2020 ◽

Author(s):

Thomas Dratsch ◽

Michael Korenkov ◽

David Zopfs ◽

Sebastian Brodehl ◽

Bettina Baessler ◽

...

Keyword(s):

Neural Network ◽

External Validation ◽

Abnormality Detection ◽

Image Description ◽

Single Institution ◽

Internal Validation ◽

Plain Radiographs ◽

Practical Applications ◽

External Data ◽

Validation Set

Abstract Objectives The goal of the present study was to classify the most common types of plain radiographs using a neural network and to validate the network’s performance on internal and external data. Such a network could help improve various radiological workflows. Methods All radiographs from the year 2017 (n = 71,274) acquired at our institution were retrieved from the PACS. The 30 largest categories (n = 58,219, 81.7% of all radiographs performed in 2017) were used to develop and validate a neural network (MobileNet v1.0) using transfer learning. Image categories were extracted from DICOM metadata (study and image description) and mapped to the WHO manual of diagnostic imaging. As an independent, external validation set, we used images from other institutions that had been stored in our PACS (n = 5324). Results In the internal validation, the overall accuracy of the model was 90.3% (95%CI: 89.2–91.3%), whereas, for the external validation set, the overall accuracy was 94.0% (95%CI: 93.3–94.6%). Conclusions Using data from one single institution, we were able to classify the most common categories of radiographs with a neural network. The network showed good generalizability on the external validation set and could be used to automatically organize a PACS, preselect radiographs so that they can be routed to more specialized networks for abnormality detection or help with other parts of the radiological workflow (e.g., automated hanging protocols; check if ordered image and performed image are the same). The final AI algorithm is publicly available for evaluation and extension. Key Points • Data from one single institution can be used to train a neural network for the correct detection of the 30 most common categories of plain radiographs. • The trained model achieved a high accuracy for the majority of categories and showed good generalizability to images from other institutions. • The neural network is made publicly available and can be used to automatically organize a PACS or to preselect radiographs so that they can be routed to more specialized neural networks for abnormality detection.

Download Full-text

Faculty Opinions recommendation of The 2008 WHO classification of lymphoid neoplasms and beyond: evolving concepts and practical applications.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.8536957.11931054 ◽

2011 ◽

Author(s):

Thomas Habermann

Keyword(s):

Who Classification ◽

Practical Applications ◽

Lymphoid Neoplasms

Download Full-text

Multi-classification of audio signal based on modified SVM

IET International Communication Conference on Wireless Mobile & Computing (CCWMC 2009) ◽

10.1049/cp.2009.1958 ◽

2009 ◽

Author(s):

Junwei Liu ◽

Xiaoqing Yu ◽

Wanggen Wan ◽

Changlian Li

Keyword(s):

Audio Signal ◽

Multi Classification

Download Full-text

Multi-Classification of Brain Tumor MRI Images Using Deep Convolutional Neural Network with Fully Optimized Framework

Iranian Journal of Science and Technology Transactions of Electrical Engineering ◽

10.1007/s40998-021-00426-9 ◽

2021 ◽

Author(s):

Emrah Irmak

Keyword(s):

Neural Network ◽

Brain Tumor ◽

Convolutional Neural Network ◽

Deep Convolutional Neural Network ◽

Multi Classification ◽

Tumor Mri

Download Full-text

A practical system based on CNN-BLSTM network for accurate classification of ECG heartbeats of MIT-BIH imbalanced dataset

2021 26th International Computer Conference, Computer Society of Iran (CSICC) ◽

10.1109/csicc52343.2021.9420620 ◽

2021 ◽

Author(s):

Armin Shoughi ◽

Mohammad Bagher Dowlatshahi

Keyword(s):

Imbalanced Dataset ◽

Practical System

Download Full-text

Insights into the molecular properties underlying antibacterial activity of prenylated (iso)flavonoids against MRSA

Scientific Reports ◽

10.1038/s41598-021-92964-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sylvia Kalli ◽

Carla Araya-Cloutier ◽

Jos Hageman ◽

Jean-Paul Vincken

Keyword(s):

Prediction Models ◽

External Validation ◽

Qsar Model ◽

Prediction Errors ◽

Activity Data ◽

Gram Positive Bacteria ◽

Level Of Activity ◽

Formal Charge ◽

Validation Set

AbstractHigh resistance towards traditional antibiotics has urged the development of new, natural therapeutics against methicillin-resistant Staphylococcus aureus (MRSA). Prenylated (iso)flavonoids, present mainly in the Fabaceae, can serve as promising candidates. Herein, the anti-MRSA properties of 23 prenylated (iso)flavonoids were assessed in-vitro. The di-prenylated (iso)flavonoids, glabrol (flavanone) and 6,8-diprenyl genistein (isoflavone), together with the mono-prenylated, 4′-O-methyl glabridin (isoflavan), were the most active anti-MRSA compounds (Minimum Inhibitory Concentrations (MIC) ≤ 10 µg/mL, 30 µM). The in-house activity data was complemented with literature data to yield an extended, curated dataset of 67 molecules for the development of robust in-silico prediction models. A QSAR model having a good fit (R2adj 0.61), low average prediction errors and a good predictive power (Q2) for the training (4% and Q2LOO 0.57, respectively) and the test set (5% and Q2test 0.75, respectively) was obtained. Furthermore, the model predicted well the activity of an external validation set (on average 5% prediction errors), as well as the level of activity (low, moderate, high) of prenylated (iso)flavonoids against other Gram-positive bacteria. For the first time, the importance of formal charge, besides hydrophobic volume and hydrogen-bonding, in the anti-MRSA activity was highlighted, thereby suggesting potentially different modes of action of the different prenylated (iso)flavonoids.

Download Full-text