Experiments of Image Classification Using Dissimilarity Spaces Built with Siamese Networks

Loris Nanni; Giovanni Minchio; Sheryl Brahnam; Gianluca Maguolo; Alessandra Lumini

doi:10.3390/s21051573

Experiments of Image Classification Using Dissimilarity Spaces Built with Siamese Networks

Sensors ◽

10.3390/s21051573 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1573

Author(s):

Loris Nanni ◽

Giovanni Minchio ◽

Sheryl Brahnam ◽

Gianluca Maguolo ◽

Alessandra Lumini

Keyword(s):

Vector Space ◽

Image Classification ◽

Ad Hoc ◽

Feature Space ◽

Medical Data ◽

Training Data ◽

Data Sets ◽

Large Set ◽

Clustering Methods ◽

Siamese Networks

Traditionally, classifiers are trained to predict patterns within a feature space. The image classification system presented here trains classifiers to predict patterns within a vector space by combining the dissimilarity spaces generated by a large set of Siamese Neural Networks (SNNs). A set of centroids from the patterns in the training data sets is calculated with supervised k-means clustering. The centroids are used to generate the dissimilarity space via the Siamese networks. The vector space descriptors are extracted by projecting patterns onto the similarity spaces, and SVMs classify an image by its dissimilarity vector. The versatility of the proposed approach in image classification is demonstrated by evaluating the system on different types of images across two domains: two medical data sets and two animal audio data sets with vocalizations represented as images (spectrograms). Results show that the proposed system’s performance competes competitively against the best-performing methods in the literature, obtaining state-of-the-art performance on one of the medical data sets, and does so without ad-hoc optimization of the clustering methods on the tested data sets.

Download Full-text

MULFE: Multi-Label Learning via Label-Specific Feature Space Ensemble

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451392 ◽

2021 ◽

Vol 16 (1) ◽

pp. 1-24

Author(s):

Yaojin Lin ◽

Qinghua Hu ◽

Jinghua Liu ◽

Xingquan Zhu ◽

Xindong Wu

Keyword(s):

Empirical Studies ◽

Feature Space ◽

Training Data ◽

Data Sets ◽

Learning Framework ◽

Feature Spaces ◽

Public Data ◽

Margin Distribution ◽

Label Correlations ◽

Label Correlation

In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, mu lti- l abel-specific f eature space e nsemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.

Download Full-text

Kernel Treelets

Advances in Data Science and Adaptive Analysis ◽

10.1142/s2424922x19500062 ◽

2019 ◽

Vol 11 (03n04) ◽

pp. 1950006

Author(s):

Hedi Xia ◽

Hector D. Ceniceros

Keyword(s):

Hierarchical Clustering ◽

Reproducing Kernel ◽

Reproducing Kernel Hilbert Space ◽

Feature Space ◽

Coefficient Matrix ◽

Data Sets ◽

Clustering Methods ◽

Orthonormal Bases ◽

Data Points ◽

General Data

A new method for hierarchical clustering of data points is presented. It combines treelets, a particular multiresolution decomposition of data, with a mapping on a reproducing kernel Hilbert space. The proposed approach, called kernel treelets (KT), uses this mapping to go from a hierarchical clustering over attributes (the natural output of treelets) to a hierarchical clustering over data. KT effectively substitutes the correlation coefficient matrix used in treelets with a symmetric and positive semi-definite matrix efficiently constructed from a symmetric and positive semi-definite kernel function. Unlike most clustering methods, which require data sets to be numeric, KT can be applied to more general data and yields a multiresolution sequence of orthonormal bases on the data directly in feature space. The effectiveness and potential of KT in clustering analysis are illustrated with some examples.

Download Full-text

SatImNet: Structured and Harmonised Training Data for Enhanced Satellite Imagery Classification

Remote Sensing ◽

10.3390/rs12203358 ◽

2020 ◽

Vol 12 (20) ◽

pp. 3358

Author(s):

Vasileios Syrris ◽

Ondrej Pesek ◽

Pierre Soille

Keyword(s):

Neural Networks ◽

Image Classification ◽

Supervised Classification ◽

Deep Neural Networks ◽

Satellite Image ◽

Data Retrieval ◽

Remote Sensing Image ◽

Training Data ◽

Data Sets ◽

Remote Sensing Image Classification

Automatic supervised classification with complex modelling such as deep neural networks requires the availability of representative training data sets. While there exists a plethora of data sets that can be used for this purpose, they are usually very heterogeneous and not interoperable. In this context, the present work has a twofold objective: (i) to describe procedures of open-source training data management, integration, and data retrieval, and (ii) to demonstrate the practical use of varying source training data for remote sensing image classification. For the former, we propose SatImNet, a collection of open training data, structured and harmonized according to specific rules. For the latter, two modelling approaches based on convolutional neural networks have been designed and configured to deal with satellite image classification and segmentation.

Download Full-text

The Active Segmentation Platform for Microscopic Image Classification and Segmentation

Brain Sciences ◽

10.3390/brainsci11121645 ◽

2021 ◽

Vol 11 (12) ◽

pp. 1645

Author(s):

Sumit K. Vohra ◽

Dimiter Prodanov

Keyword(s):

Machine Learning ◽

Image Segmentation ◽

Image Classification ◽

Domain Knowledge ◽

Feature Space ◽

Ground Truth ◽

Classification Problem ◽

Data Sets ◽

Learning Approaches ◽

Data Set

Image segmentation still represents an active area of research since no universal solution can be identified. Traditional image segmentation algorithms are problem-specific and limited in scope. On the other hand, machine learning offers an alternative paradigm where predefined features are combined into different classifiers, providing pixel-level classification and segmentation. However, machine learning only can not address the question as to which features are appropriate for a certain classification problem. The article presents an automated image segmentation and classification platform, called Active Segmentation, which is based on ImageJ. The platform integrates expert domain knowledge, providing partial ground truth, with geometrical feature extraction based on multi-scale signal processing combined with machine learning. The approach in image segmentation is exemplified on the ISBI 2012 image segmentation challenge data set. As a second application we demonstrate whole image classification functionality based on the same principles. The approach is exemplified using the HeLa and HEp-2 data sets. Obtained results indicate that feature space enrichment properly balanced with feature selection functionality can achieve performance comparable to deep learning approaches. In summary, differential geometry can substantially improve the outcome of machine learning since it can enrich the underlying feature space with new geometrical invariant objects.

Download Full-text

Motor Imagery EEG Classification with Biclustering Based Fuzzy Inference

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2020.3040 ◽

2020 ◽

Vol 10 (7) ◽

pp. 1486-1493

Author(s):

Jianjun Sun

Keyword(s):

Fuzzy Inference ◽

Feature Space ◽

Small Sample ◽

Majority Voting ◽

Training Data ◽

Rule Base ◽

Data Sets ◽

Common Spatial Pattern ◽

Eeg Classification ◽

Subsequent Step

The rehabilitation of armless or footless patients is of great importance. One choice for such issue is using the electroencephalograph (EEG) brain computer interface to help the patients communicate with outside. Classifying the EEG signals generated from mental activity is one of the most important technologies. However, existing classification methods often suffer the overfitting problem caused by the small training data sets while big dimensionality of feature space. Fuzzy inference can imitate the human judgement, effectively dealing with uncertainty and small-sample learning problems. Besides, biclustering has shown excellent performance in constructing rule base. This paper proposes a novel biclustering based fuzzy inference method for EEG classification. It can be divided into five steps. The first step is generating features with common spatial pattern. The second step is searching local coherent patterns with column nearly constant biclustering. The third step is to transform the patterns to if-then rules with column averaging and majority voting strategy. Subsequent step is to employ Mamdani fuzzy inference to map the input feature vector into decimals. Finally, particle swarm optimization is utilized to generate optimal threshold for linear classification. Experiments on several commonly used data sets show that the proposed method has advantages over competitors in terms of classification accuracy.

Download Full-text

THE USE OF MACHINE LEARNING METHODS FOR BINARY CLASSIFICATION OF THE WORKING CONDITION OF BEARINGS USING THE SIGNALS OF VIBRATION ACCELERATION

Bulletin of National Technical University KhPI Series System Analysis Control and Information Technologies ◽

10.20998/2079-0023.2021.02.03 ◽

2021 ◽

pp. 15-22

Author(s):

Ruslan Babudzhan ◽

Konstantyn Isaienkov ◽

Danilo Krasiy ◽

Oleksii Vodka ◽

Ivan Zadorozhny ◽

...

Keyword(s):

Machine Learning ◽

Binary Classification ◽

Fractal Dimensions ◽

Feature Space ◽

Training Data ◽

Supervised Machine Learning ◽

Support Vector ◽

Data Sets ◽

Vibration Acceleration ◽

K Nearest Neighbors

The paper investigates the relationship between vibration acceleration of bearings with their operational state. To determine these dependencies, a testbench was built and 112 experiments were carried out with different bearings: 100 bearings that developed an internal defect during operation and 12bearings without a defect. From the obtained records, a dataset was formed, which was used to build classifiers. Dataset is freely available. A methodfor classifying new and used bearings was proposed, which consists in searching for dependencies and regularities of the signal using descriptive functions: statistical, entropy, fractal dimensions and others. In addition to processing the signal itself, the frequency domain of the bearing operationsignal was also used to complement the feature space. The paper considered the possibility of generalizing the classification for its application on thosesignals that were not obtained in the course of laboratory experiments. An extraneous dataset was found in the public domain. This dataset was used todetermine how accurate a classifier was when it was trained and tested on significantly different signals. Training and validation were carried out usingthe bootstrapping method to eradicate the effect of randomness, given the small amount of training data available. To estimate the quality of theclassifiers, the F1-measure was used as the main metric due to the imbalance of the data sets. The following supervised machine learning methodswere chosen as classifier models: logistic regression, support vector machine, random forest, and K nearest neighbors. The results are presented in theform of plots of density distribution and diagrams.

Download Full-text

VALIDATION OF CLUSTERING METHODS FOR MEDICAL DATA SETS

Acta healthmedica ◽

10.19082/ah116 ◽

2017 ◽

Vol 2 (1) ◽

pp. 116-116

Author(s):

Azam Orooji ◽

Farzaneh Kermani

Keyword(s):

Medical Data ◽

Data Sets ◽

Clustering Methods

Download Full-text

Animal Sound Classification Using Dissimilarity Spaces

Applied Sciences ◽

10.3390/app10238578 ◽

2020 ◽

Vol 10 (23) ◽

pp. 8578

Author(s):

Loris Nanni ◽

Sheryl Brahnam ◽

Alessandra Lumini ◽

Gianluca Maguolo

Keyword(s):

Ad Hoc ◽

Space Representation ◽

Support Vector ◽

Clustering Methods ◽

Audio Classification ◽

Environmental Sound ◽

Clustering Techniques ◽

Sound Classification ◽

Animal Vocalization ◽

Siamese Networks

The classifier system proposed in this work combines the dissimilarity spaces produced by a set of Siamese neural networks (SNNs) designed using four different backbones with different clustering techniques for training SVMs for automated animal audio classification. The system is evaluated on two animal audio datasets: one for cat and another for bird vocalizations. The proposed approach uses clustering methods to determine a set of centroids (in both a supervised and unsupervised fashion) from the spectrograms in the dataset. Such centroids are exploited to generate the dissimilarity space through the Siamese networks. In addition to feeding the SNNs with spectrograms, experiments process the spectrograms using the heterogeneous auto-similarities of characteristics. Once the similarity spaces are computed, each pattern is “projected” into the space to obtain a vector space representation; this descriptor is then coupled to a support vector machine (SVM) to classify a spectrogram by its dissimilarity vector. Results demonstrate that the proposed approach performs competitively (without ad-hoc optimization of the clustering methods) on both animal vocalization datasets. To further demonstrate the power of the proposed system, the best standalone approach is also evaluated on the challenging Dataset for Environmental Sound Classification (ESC50) dataset.

Download Full-text

Nearest labelset using double distances for multi-label classification

PeerJ Computer Science ◽

10.7717/peerj-cs.242 ◽

2019 ◽

Vol 5 ◽

pp. e242

Author(s):

Hyukjun Gweon ◽

Matthias Schonlau ◽

Stefan H. Steiner

Keyword(s):

Maximum Likelihood ◽

Supervised Learning ◽

Feature Space ◽

Training Data ◽

Model Parameters ◽

Data Sets ◽

Weighted Sum ◽

Novel Approach ◽

Binomial Regression ◽

F Measure

Multi-label classification is a type of supervised learning where an instance may belong to multiple labels simultaneously. Predicting each label independently has been criticized for not exploiting any correlation between labels. In this article we propose a novel approach, Nearest Labelset using Double Distances (NLDD), that predicts the labelset observed in the training data that minimizes a weighted sum of the distances in both the feature space and the label space to the new instance. The weights specify the relative tradeoff between the two distances. The weights are estimated from a binomial regression of the number of misclassified labels as a function of the two distances. Model parameters are estimated by maximum likelihood. NLDD only considers labelsets observed in the training data, thus implicitly taking into account label dependencies. Experiments on benchmark multi-label data sets show that the proposed method on average outperforms other well-known approaches in terms of 0/1 loss, and multi-label accuracy and ranks second on the F-measure (after a method called ECC) and on Hamming loss (after a method called RF-PCT).

Download Full-text

Training data sets for TensorFlow models from TeleEcho data.

10.35543/osf.io/jrk4y ◽

2020 ◽

Author(s):

Anil Kumar Bheemaiah

Keyword(s):

Decision Support ◽

Gpu Computing ◽

Feature Space ◽

Training Data ◽

Data Sets ◽

Tensor Model ◽

Accelerometer Data ◽

Data Repositories ◽

Digital Medicine ◽

Mental Wellness

Abstract:Data streams are persisted and visualized for a practice of biofeedback based therapy, with the option of @edge decision support for premium services, in the form of on-demand telemedical services and CDS based decision support services, and integrated services like Amazon Pharmacy.Keywords: Digital Medicine, CDS HL7 webhooks, bio-feedback, LSL streams, AWS S3, Wolfram cloud, feature extraction functions, visualization of filters.What:Extraction of data by data-mining from hyperscale data from tele-echo data repositories, to create training data sets for a specific thread for Tensorflow model templates for transfer learning, with deployment of pre-trained networks using TensorFlow lite.Pre-Trained models are evaluated for prediction accuracy in integrated feature space and classification fitness models, for scalable deployment.How:We consider the use of TensorFlow Models, and train the models on an EC2 P3 image using GPU computing on SageMaker, using a Thread for the purpose.We consider the creation of the following : A MUSE 2 headset for PPG, Gyro Accelerometer data for breath and heart diagnostics is made using a python script and a 1D tensor model.(alexandrebarachant n.d.; “tf.nn.conv1d | TensorFlow Core r2.0” n.d., “tf.keras.layers.Conv1D | TensorFlow Core r2.0” n.d., “Tensorflow - Math behind 1D Convolution with Advanced Examples in TF | Tensorflow Tutorial” n.d.; Lee 2018)Why:Digital Medicine is accessible in the mental wellness community with an EEG wearable such as MUSE 2 , which has ppg and accelerometer data which can be data mined with a classifier 1D convolution Tensor Net for detecting any anomalies, requiring telemedicine.

Download Full-text