VALIDATION OF CLUSTERING METHODS FOR MEDICAL DATA SETS

Azam Orooji; Farzaneh Kermani

doi:10.19082/ah116

Experiments of Image Classification Using Dissimilarity Spaces Built with Siamese Networks

Sensors ◽

10.3390/s21051573 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1573

Author(s):

Loris Nanni ◽

Giovanni Minchio ◽

Sheryl Brahnam ◽

Gianluca Maguolo ◽

Alessandra Lumini

Keyword(s):

Vector Space ◽

Image Classification ◽

Ad Hoc ◽

Feature Space ◽

Medical Data ◽

Training Data ◽

Data Sets ◽

Large Set ◽

Clustering Methods ◽

Siamese Networks

Traditionally, classifiers are trained to predict patterns within a feature space. The image classification system presented here trains classifiers to predict patterns within a vector space by combining the dissimilarity spaces generated by a large set of Siamese Neural Networks (SNNs). A set of centroids from the patterns in the training data sets is calculated with supervised k-means clustering. The centroids are used to generate the dissimilarity space via the Siamese networks. The vector space descriptors are extracted by projecting patterns onto the similarity spaces, and SVMs classify an image by its dissimilarity vector. The versatility of the proposed approach in image classification is demonstrated by evaluating the system on different types of images across two domains: two medical data sets and two animal audio data sets with vocalizations represented as images (spectrograms). Results show that the proposed system’s performance competes competitively against the best-performing methods in the literature, obtaining state-of-the-art performance on one of the medical data sets, and does so without ad-hoc optimization of the clustering methods on the tested data sets.

Download Full-text

Evaluation of Unsupervised Clustering Methods on Hyperspectral Image Data Sets

2018 IEEE International Conference on Progress in Informatics and Computing (PIC) ◽

10.1109/pic.2018.8706315 ◽

2018 ◽

Author(s):

Wei Zhang ◽

Zhichao Lian ◽

Chanying Huang

Keyword(s):

Hyperspectral Image ◽

Image Data ◽

Unsupervised Clustering ◽

Data Sets ◽

Clustering Methods ◽

Hyperspectral Image Data

Download Full-text

A simple clustering technique to extract subsets of data for function approximation

Journal of Hydroinformatics ◽

10.2166/hydro.2015.065 ◽

2015 ◽

Vol 17 (5) ◽

pp. 719-732

Author(s):

Dulakshi Santhusitha Kumari Karunasingha ◽

Shie-Yui Liong

Keyword(s):

Function Approximation ◽

Prediction Models ◽

Data Extraction ◽

Single Parameter ◽

Subtractive Clustering ◽

Data Sets ◽

Clustering Methods ◽

Clustering Method ◽

Data Set ◽

Functional Relationships

A simple clustering method is proposed for extracting representative subsets from lengthy data sets. The main purpose of the extracted subset of data is to use it to build prediction models (of the form of approximating functional relationships) instead of using the entire large data set. Such smaller subsets of data are often required in exploratory analysis stages of studies that involve resource consuming investigations. A few recent studies have used a subtractive clustering method (SCM) for such data extraction, in the absence of clustering methods for function approximation. SCM, however, requires several parameters to be specified. This study proposes a clustering method, which requires only a single parameter to be specified, yet it is shown to be as effective as the SCM. A method to find suitable values for the parameter is also proposed. Due to having only a single parameter, using the proposed clustering method is shown to be orders of magnitudes more efficient than using SCM. The effectiveness of the proposed method is demonstrated on phase space prediction of three univariate time series and prediction of two multivariate data sets. Some drawbacks of SCM when applied for data extraction are identified, and the proposed method is shown to be a solution for them.

Download Full-text

Sparse Matrix Approach in Neural Networks for Effective Medical Data Sets Classifications

Journal of Basic and Applied Research in Biomedicine ◽

10.51152/jbarbiomed.v6i2.113 ◽

2020 ◽

Vol 6 (2) ◽

pp. 90-97

Author(s):

Sagir Masanawa ◽

Hamza Abubakar

Keyword(s):

Intelligent System ◽

Sparse Matrix ◽

Data Classification ◽

Medical Data ◽

Data Sets ◽

Matrix Approach ◽

Neural Network Learning ◽

Network Learning ◽

Hybrid Intelligent System ◽

Medical Data Classification

In this paper, a hybrid intelligent system that consists of the sparse matrix approach incorporated in neural network learning model as a decision support tool for medical data classification is presented. The main objective of this research is to develop an effective intelligent system that can be used by medical practitioners to accelerate diagnosis and treatment processes. The sparse matrix approach incorporated in neural network learning algorithm for scalability, minimize higher memory storage capacity usage, enhancing implementation time and speed up the analysis of the medical data classification problem. The hybrid intelligent system aims to exploit the advantages of the constituent models and, at the same time, alleviate their limitations. The proposed intelligent classification system maximizes the intelligently classification of medical data and minimizes the number of trends inaccurately identified. To evaluate the effectiveness of the hybrid intelligent system, three benchmark medical data sets, viz., Hepatitis, SPECT Heart and Cleveland Heart from the UCI Repository of Machine Learning, are used for evaluation. A number of useful performance metrics in medical applications which include accuracy, sensitivity, specificity. The results were analyzed and compared with those from other methods published in the literature. The experimental outcomes positively demonstrate that the hybrid intelligent system was effective in undertaking medical data classification tasks.

Download Full-text

Adaptive Structure Concept Factorization for Multiview Clustering

Neural Computation ◽

10.1162/neco_a_01055 ◽

2018 ◽

Vol 30 (4) ◽

pp. 1080-1103 ◽

Cited By ~ 10

Author(s):

Kun Zhan ◽

Jinhui Shi ◽

Jing Wang ◽

Haibo Wang ◽

Yuange Xie

Keyword(s):

Nonnegative Matrix Factorization ◽

State Of The Art ◽

Nonnegative Matrix ◽

Adaptive Method ◽

Data Sets ◽

Clustering Methods ◽

Normalized Mutual Information ◽

Adaptive Structure ◽

Concept Factorization ◽

Multiview Clustering

Most existing multiview clustering methods require that graph matrices in different views are computed beforehand and that each graph is obtained independently. However, this requirement ignores the correlation between multiple views. In this letter, we tackle the problem of multiview clustering by jointly optimizing the graph matrix to make full use of the data correlation between views. With the interview correlation, a concept factorization–based multiview clustering method is developed for data integration, and the adaptive method correlates the affinity weights of all views. This method differs from nonnegative matrix factorization–based clustering methods in that it can be applicable to data sets containing negative values. Experiments are conducted to demonstrate the effectiveness of the proposed method in comparison with state-of-the-art approaches in terms of accuracy, normalized mutual information, and purity.

Download Full-text

Future of Medical Research in Rare Diseases and Cancers: Shift from Pharma to Biotech and the Golden Age of Medical Advancement

Cancer and Clinical Oncology ◽

10.5539/cco.v6n2p12 ◽

2017 ◽

Vol 6 (2) ◽

pp. 12

Author(s):

Abhith Pallegar

Keyword(s):

Big Data ◽

Medical Research ◽

Rare Diseases ◽

Network Effects ◽

Medical Knowledge ◽

Economic Cost ◽

Medical Data ◽

Data Sets ◽

Leading Role ◽

Diverse Data

The objective of the paper is to elucidate how interconnected biological systems can be better mapped and understood using the rapidly growing area of Big Data. We can harness network efficiencies by analyzing diverse medical data and probe how we can effectively lower the economic cost of finding cures for rare diseases. Most rare diseases are due to genetic abnormalities, many forms of cancers develop due to genetic mutations. Finding cures for rare diseases requires us to understand the biology and biological processes of the human body. In this paper, we explore what the historical shift of focus from pharmacology to biotechnology means for accelerating biomedical solutions. With biotechnology playing a leading role in the field of medical research, we explore how network efficiencies can be harnessed by strengthening the existing knowledge base. Studying rare or orphan diseases provides rich observable statistical data that can be leveraged for finding solutions. Network effects can be squeezed from working with diverse data sets that enables us to generate the highest quality medical knowledge with the fewest resources. This paper examines gene manipulation technologies like Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) that can prevent diseases of genetic variety. We further explore the role of the emerging field of Big Data in analyzing large quantities of medical data with the rapid growth of computing power and some of the network efficiencies gained from this endeavor.

Download Full-text

clusTCR: a Python interface for rapid clustering of large sets of CDR3 sequences

10.1101/2021.02.22.432291 ◽

2021 ◽

Author(s):

Sebastiaan Valkiers ◽

Max Van Houcke ◽

Kris Laukens ◽

Pieter Meysman

Keyword(s):

T Cell ◽

Large Data ◽

Cell Receptor ◽

Amino Acid Sequences ◽

Large Data Sets ◽

Data Sets ◽

Clustering Methods ◽

Link Type ◽

Large Sets ◽

Similar Accuracy

The T-cell receptor (TCR) determines the specificity of a T-cell towards an epitope. As of yet, the rules for antigen recognition remain largely undetermined. Current methods for grouping TCRs according to their epitope specificity remain limited in performance and scalability. Multiple methodologies have been developed, but all of them fail to efficiently cluster large data sets exceeding 1 million sequences. To account for this limitation, we developed clusTCR, a rapid TCR clustering alternative that efficiently scales up to millions of CDR3 amino acid sequences. Benchmarking comparisons revealed similar accuracy of clusTCR with other TCR clustering methods. clusTCR offers a drastic improvement in clustering speed, which allows clustering of millions of TCR sequences in just a few minutes through efficient similarity searching and sequence hashing.clusTCR was written in Python 3. It is available as an anaconda package (https://anaconda.org/svalkiers/clustcr) and on github (https://github.com/svalkiers/clusTCR).

Download Full-text

Unsupervised Clustering Methods for Medical Data: An Application to Thyroid Gland Data

Artificial Neural Networks and Neural Information Processing — ICANN/ICONIP 2003 - Lecture Notes in Computer Science ◽

10.1007/3-540-44989-2_83 ◽

2003 ◽

pp. 695-701 ◽

Cited By ~ 4

Author(s):

Songül Albayrak

Keyword(s):

Thyroid Gland ◽

Medical Data ◽

Unsupervised Clustering ◽

Clustering Methods

Download Full-text

GRASP for Instance Selection in Medical Data Sets

Advances in Intelligent and Soft Computing - Advances in Bioinformatics ◽

10.1007/978-3-642-13214-8_7 ◽

2010 ◽

pp. 53-60 ◽

Cited By ~ 1

Author(s):

Alfonso Fernández ◽

Abraham Duarte ◽

Rosa Hernández ◽

Ángel Sánchez

Keyword(s):

Medical Data ◽

Instance Selection ◽

Data Sets

Download Full-text

Analysis and Integration of Biological Data

Knowledge Discovery Practices and Emerging Applications of Data Mining - Advances in Data Mining and Database Management ◽

10.4018/978-1-60960-067-9.ch014 ◽

2010 ◽

pp. 287-314

Author(s):

Diego Milone ◽

Georgina Stegmayer ◽

Matías Gerard ◽

Laura Kamenetzky ◽

Mariana López ◽

...

Keyword(s):

Data Integration ◽

New Technologies ◽

Biological Significance ◽

Neural Model ◽

Biological Data ◽

Data Sets ◽

Clustering Methods ◽

Gene Expressions ◽

Internal Coherence ◽

Biological Data Integration

The volume of information derived from post genomic technologies is rapidly increasing. Due to the amount of involved data, novel computational methods are needed for the analysis and knowledge discovery into the massive data sets produced by these new technologies. Furthermore, data integration is also gaining attention for merging signals from different sources in order to discover unknown relations. This chapter presents a pipeline for biological data integration and discovery of a priori unknown relationships between gene expressions and metabolite accumulations. In this pipeline, two standard clustering methods are compared against a novel neural network approach. The neural model provides a simple visualization interface for identification of coordinated patterns variations, independently of the number of produced clusters. Several quality measurements have been defined for the evaluation of the clustering results obtained on a case study involving transcriptomic and metabolomic profiles from tomato fruits. Moreover, a method is proposed for the evaluation of the biological significance of the clusters found. The neural model has shown a high performance in most of the quality measures, with internal coherence in all the identified clusters and better visualization capabilities.

Download Full-text