Comparison of Combinatorial Clustering Methods on Pharmacological Data Sets Represented by Machine Learning-Selected Real Molecular Descriptors

Oscar Miguel Rivera-Borroto; Yovani Marrero-Ponce; José Manuel García-de la Vega; Ricardo del Corazón Grau-Ábalo

doi:10.1021/ci2000083

Dunn’s index for cluster tendency assessment of pharmacological data sets

Canadian Journal of Physiology and Pharmacology ◽

10.1139/y2012-002 ◽

2012 ◽

Vol 90 (4) ◽

pp. 425-433 ◽

Cited By ~ 4

Author(s):

Oscar Miguel Rivera-Borroto ◽

Mónica Rabassa-Gutiérrez ◽

Ricardo del Corazón Grau-Ábalo ◽

Yovani Marrero-Ponce ◽

José Manuel García-de la Vega

Keyword(s):

Classification Accuracy ◽

Molecular Descriptors ◽

Visual Assessment ◽

Data Sets ◽

Single Linkage ◽

Cluster Algorithms ◽

Pharmacological Data ◽

Cluster Separability ◽

Intensity Image ◽

Weak Tendency

Cluster tendency assessment is an important stage in cluster analysis. In this sense, a group of promising techniques named visual assessment of tendency (VAT) has emerged in the literature. The presence of clusters can be detected easily through the direct observation of a dark blocks structure along the main diagonal of the intensity image. Alternatively, if the Dunn’s index for a single linkage partition is greater than 1, then it is a good indication of the blocklike structure. In this report, the Dunn’s index is applied as a novel measure of tendency on 8 pharmacological data sets, represented by machine-learning-selected molecular descriptors. In all cases, observed values are less than 1, thus indicating a weak tendency for data to form compact clusters. Other results suggest that there is an increasing relationship between the Dunn’s index as a measure of cluster separability and the classification accuracy of various cluster algorithms tested on the same data sets.

Download Full-text

Machine Learning Approaches for the Analysis of Non-Metallic Inclusion Data Sets

AISTech2019 Proceedings of the Iron and Steel Technology Conference ◽

10.33313/377/275 ◽

2019 ◽

Author(s):

M. Webler ◽

B. Abdulsalam

Keyword(s):

Machine Learning ◽

Data Sets ◽

Learning Approaches ◽

Metallic Inclusion

Download Full-text

Molecular Topology and Other Promiscuity Determinants as Predictors of Therapeutic Class - A Theoretical Framework to Guide Drug Repositioning?

Current Topics in Medicinal Chemistry ◽

10.2174/1568026618666180801091642 ◽

2018 ◽

Vol 18 (13) ◽

pp. 1110-1122 ◽

Cited By ~ 2

Author(s):

Juan F. Morales ◽

Lucas N. Alberca ◽

Sara Chuguransky ◽

Mauricio E. Di Ianni ◽

Alan Talevi ◽

...

Keyword(s):

Molecular Descriptors ◽

Drug Repositioning ◽

Drug Repurposing ◽

Topological Descriptors ◽

Log P ◽

Acidity Constant ◽

Molecular Topology ◽

Clustering Methods ◽

Mean Values ◽

Qsar Models

Much interest has been paid in the last decade on molecular predictors of promiscuity, including molecular weight, log P, molecular complexity, acidity constant and molecular topology, with correlations between promiscuity and those descriptors seemingly being context-dependent. It has been observed that certain therapeutic categories (e.g. mood disorders therapies) display a tendency to include multi-target agents (i.e. selective non-selectivity). Numerous QSAR models based on topological descriptors suggest that the topology of a given drug could be used to infer its therapeutic applications. Here, we have used descriptive statistics to explore the distribution of molecular topology descriptors and other promiscuity predictors across different therapeutic categories. Working with the publicly available ChEMBL database and 14 molecular descriptors, both hierarchical and non-hierchical clustering methods were applied to the descriptors mean values of the therapeutic categories after the refinement of the database (770 drugs grouped into 34 therapeutic categories). On the other hand, another publicly available database (repoDB) was used to retrieve cases of clinically-approved drug repositioning examples that could be classified into the therapeutic categories considered by the aforementioned clusters (111 cases), and the correspondence between the two studies was evaluated. Interestingly, a 3- cluster hierarchical clustering scheme based on only 14 molecular descriptors linked to promiscuity seem to explain up to 82.9% of approved cases of drug repurposing retrieved of repoDB. Therapeutic categories seem to display distinctive molecular patterns, which could be used as a basis for drug screening and drug design campaigns, and to unveil drug repurposing opportunities between particular therapeutic categories.

Download Full-text

Trade-off Predictivity and Explainability for Machine-Learning Powered Predictive Toxicology: An in-Depth Investigation with Tox21 Data Sets

Chemical Research in Toxicology ◽

10.1021/acs.chemrestox.0c00373 ◽

2021 ◽

Vol 34 (2) ◽

pp. 541-549 ◽

Cited By ~ 1

Author(s):

Leihong Wu ◽

Ruili Huang ◽

Igor V. Tetko ◽

Zhonghua Xia ◽

Joshua Xu ◽

...

Keyword(s):

Machine Learning ◽

Data Sets ◽

Predictive Toxicology ◽

Trade Off

Download Full-text

Using Machine Learning Methods to Identify Particle Types from Doppler Lidar Measurements in Iceland

Remote Sensing ◽

10.3390/rs13132433 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2433

Author(s):

Shu Yang ◽

Fengchao Peng ◽

Sibylle von Löwis ◽

Guðrún Nína Petersen ◽

David Christian Finger

Keyword(s):

Machine Learning ◽

Weather Conditions ◽

Dust Storms ◽

Machine Learning Algorithms ◽

Lidar Data ◽

Data Sets ◽

Doppler Lidar ◽

Lidar Measurements ◽

Using Data ◽

Filter Noise

Doppler lidars are used worldwide for wind monitoring and recently also for the detection of aerosols. Automatic algorithms that classify the lidar signals retrieved from lidar measurements are very useful for the users. In this study, we explore the value of machine learning to classify backscattered signals from Doppler lidars using data from Iceland. We combined supervised and unsupervised machine learning algorithms with conventional lidar data processing methods and trained two models to filter noise signals and classify Doppler lidar observations into different classes, including clouds, aerosols and rain. The results reveal a high accuracy for noise identification and aerosols and clouds classification. However, precipitation detection is underestimated. The method was tested on data sets from two instruments during different weather conditions, including three dust storms during the summer of 2019. Our results reveal that this method can provide an efficient, accurate and real-time classification of lidar measurements. Accordingly, we conclude that machine learning can open new opportunities for lidar data end-users, such as aviation safety operators, to monitor dust in the vicinity of airports.

Download Full-text

A top-level model of case-based argumentation for explanation: Formalisation and experiments

Argument & Computation ◽

10.3233/aac-210009 ◽

2021 ◽

pp. 1-36

Author(s):

Henry Prakken ◽

Rosa Ratsma

Keyword(s):

Machine Learning ◽

Decision Making ◽

Linear Models ◽

Evaluation Studies ◽

Data Sets ◽

Machine Learning Applications ◽

Level Model ◽

Similarities And Differences ◽

Further Development ◽

Case Based

This paper proposes a formal top-level model of explaining the outputs of machine-learning-based decision-making applications and evaluates it experimentally with three data sets. The model draws on AI & law research on argumentation with cases, which models how lawyers draw analogies to past cases and discuss their relevant similarities and differences in terms of relevant factors and dimensions in the problem domain. A case-based approach is natural since the input data of machine-learning applications can be seen as cases. While the approach is motivated by legal decision making, it also applies to other kinds of decision making, such as commercial decisions about loan applications or employee hiring, as long as the outcome is binary and the input conforms to this paper’s factor- or dimension format. The model is top-level in that it can be extended with more refined accounts of similarities and differences between cases. It is shown to overcome several limitations of similar argumentation-based explanation models, which only have binary features and do not represent the tendency of features towards particular outcomes. The results of the experimental evaluation studies indicate that the model may be feasible in practice, but that further development and experimentation is needed to confirm its usefulness as an explanation model. Main challenges here are selecting from a large number of possible explanations, reducing the number of features in the explanations and adding more meaningful information to them. It also remains to be investigated how suitable our approach is for explaining non-linear models.

Download Full-text

A novel framework for designing a multi-DoF prosthetic wrist control using machine learning

Scientific Reports ◽

10.1038/s41598-021-94449-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chinmay P. Swami ◽

Nicholas Lenhard ◽

Jiyeon Kang

Keyword(s):

Machine Learning ◽

Random Forest ◽

Upper Limb ◽

Daily Living ◽

Machine Learning Algorithms ◽

Data Sets ◽

Random Forest Regression ◽

Prosthetic Devices ◽

Upper Limb Function ◽

The Neural Network

AbstractProsthetic arms can significantly increase the upper limb function of individuals with upper limb loss, however despite the development of various multi-DoF prosthetic arms the rate of prosthesis abandonment is still high. One of the major challenges is to design a multi-DoF controller that has high precision, robustness, and intuitiveness for daily use. The present study demonstrates a novel framework for developing a controller leveraging machine learning algorithms and movement synergies to implement natural control of a 2-DoF prosthetic wrist for activities of daily living (ADL). The data was collected during ADL tasks of ten individuals with a wrist brace emulating the absence of wrist function. Using this data, the neural network classifies the movement and then random forest regression computes the desired velocity of the prosthetic wrist. The models were trained/tested with ADLs where their robustness was tested using cross-validation and holdout data sets. The proposed framework demonstrated high accuracy (F-1 score of 99% for the classifier and Pearson’s correlation of 0.98 for the regression). Additionally, the interpretable nature of random forest regression was used to verify the targeted movement synergies. The present work provides a novel and effective framework to develop an intuitive control for multi-DoF prosthetic devices.

Download Full-text

Analysis of Risk Factors in Dementia Through Machine Learning

Journal of Alzheimer s Disease ◽

10.3233/jad-200955 ◽

2020 ◽

pp. 1-17

Author(s):

Francisco Javier Balea-Fernandez ◽

Beatriz Martinez-Vega ◽

Samuel Ortega ◽

Himar Fabelo ◽

Raquel Leon ◽

...

Keyword(s):

Machine Learning ◽

Optimization Algorithms ◽

Progressive Increase ◽

Control Group ◽

Data Sets ◽

Modifiable Factors ◽

Validation Set ◽

The One ◽

And Control ◽

Potential Tool

Background: Sociodemographic data indicate the progressive increase in life expectancy and the prevalence of Alzheimer’s disease (AD). AD is raised as one of the greatest public health problems. Its etiology is twofold: on the one hand, non-modifiable factors and on the other, modifiable. Objective: This study aims to develop a processing framework based on machine learning (ML) and optimization algorithms to study sociodemographic, clinical, and analytical variables, selecting the best combination among them for an accurate discrimination between controls and subjects with major neurocognitive disorder (MNCD). Methods: This research is based on an observational-analytical design. Two research groups were established: MNCD group (n = 46) and control group (n = 38). ML and optimization algorithms were employed to automatically diagnose MNCD. Results: Twelve out of 37 variables were identified in the validation set as the most relevant for MNCD diagnosis. Sensitivity of 100%and specificity of 71%were achieved using a Random Forest classifier. Conclusion: ML is a potential tool for automatic prediction of MNCD which can be applied to relatively small preclinical and clinical data sets. These results can be interpreted to support the influence of the environment on the development of AD.

Download Full-text

Experiments of Image Classification Using Dissimilarity Spaces Built with Siamese Networks

Sensors ◽

10.3390/s21051573 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1573

Author(s):

Loris Nanni ◽

Giovanni Minchio ◽

Sheryl Brahnam ◽

Gianluca Maguolo ◽

Alessandra Lumini

Keyword(s):

Vector Space ◽

Image Classification ◽

Ad Hoc ◽

Feature Space ◽

Medical Data ◽

Training Data ◽

Data Sets ◽

Large Set ◽

Clustering Methods ◽

Siamese Networks

Traditionally, classifiers are trained to predict patterns within a feature space. The image classification system presented here trains classifiers to predict patterns within a vector space by combining the dissimilarity spaces generated by a large set of Siamese Neural Networks (SNNs). A set of centroids from the patterns in the training data sets is calculated with supervised k-means clustering. The centroids are used to generate the dissimilarity space via the Siamese networks. The vector space descriptors are extracted by projecting patterns onto the similarity spaces, and SVMs classify an image by its dissimilarity vector. The versatility of the proposed approach in image classification is demonstrated by evaluating the system on different types of images across two domains: two medical data sets and two animal audio data sets with vocalizations represented as images (spectrograms). Results show that the proposed system’s performance competes competitively against the best-performing methods in the literature, obtaining state-of-the-art performance on one of the medical data sets, and does so without ad-hoc optimization of the clustering methods on the tested data sets.

Download Full-text

Evaluation of Unsupervised Clustering Methods on Hyperspectral Image Data Sets

2018 IEEE International Conference on Progress in Informatics and Computing (PIC) ◽

10.1109/pic.2018.8706315 ◽

2018 ◽

Author(s):

Wei Zhang ◽

Zhichao Lian ◽

Chanying Huang

Keyword(s):

Hyperspectral Image ◽

Image Data ◽

Unsupervised Clustering ◽

Data Sets ◽

Clustering Methods ◽

Hyperspectral Image Data

Download Full-text