Robust Semi-Supervised Manifold Learning Algorithm for Classification

Mathematical Problems in Engineering ◽

10.1155/2018/2382803 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Mingxia Chen ◽

Jing Wang ◽

Xueqing Li ◽

Xiaolong Sun

Keyword(s):

Manifold Learning ◽

Optimization Model ◽

Learning Algorithm ◽

Learning Algorithms ◽

Data Sets ◽

Regularization Term ◽

Partially Labeled Data ◽

Label Information ◽

Low Dimensional ◽

The Impact

In the recent years, manifold learning methods have been widely used in data classification to tackle the curse of dimensionality problem, since they can discover the potential intrinsic low-dimensional structures of the high-dimensional data. Given partially labeled data, the semi-supervised manifold learning algorithms are proposed to predict the labels of the unlabeled points, taking into account label information. However, these semi-supervised manifold learning algorithms are not robust against noisy points, especially when the labeled data contain noise. In this paper, we propose a framework for robust semi-supervised manifold learning (RSSML) to address this problem. The noisy levels of the labeled points are firstly predicted, and then a regularization term is constructed to reduce the impact of labeled points containing noise. A new robust semi-supervised optimization model is proposed by adding the regularization term to the traditional semi-supervised optimization model. Numerical experiments are given to show the improvement and efficiency of RSSML on noisy data sets.

Download Full-text

LTSA Algorithm for Dimension Reduction of Microarray Data

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.645.192 ◽

2013 ◽

Vol 645 ◽

pp. 192-195 ◽

Cited By ~ 1

Author(s):

Xiao Zhou Chen

Keyword(s):

Dimensionality Reduction ◽

Dimension Reduction ◽

Manifold Learning ◽

Microarray Data ◽

Learning Algorithm ◽

Medical Applications ◽

Data Sets ◽

Learning Method ◽

Data Dimensionality Reduction ◽

Low Dimensional

Dimension reduction is an important issue to understand microarray data. In this study, we proposed a efficient approach for dimensionality reduction of microarray data. Our method allows to apply the manifold learning algorithm to analyses dimensionality reduction of microarray data. The intra-/inter-category distances were used as the criteria to quantitatively evaluate the effects of data dimensionality reduction. Colon cancer and leukaemia gene expression datasets are selected for our investigation. When the neighborhood parameter was effectivly set, all the intrinsic dimension numbers of data sets were low. Therefore, manifold learning is used to study microarray data in the low-dimensional projection space. Our results indicate that Manifold learning method possesses better effects than the linear methods in analysis of microarray data, which is suitable for clinical diagnosis and other medical applications.

Download Full-text

Classification of Infrared Objects in Manifold Space Using Kullback-Leibler Divergence of Gaussian Distributions of Image Points

Symmetry ◽

10.3390/sym12030434 ◽

2020 ◽

Vol 12 (3) ◽

pp. 434 ◽

Cited By ~ 2

Author(s):

Huilin Ge ◽

Zhiyu Zhu ◽

Kang Lou ◽

Wei Wei ◽

Runbang Liu ◽

...

Keyword(s):

Manifold Learning ◽

Gaussian Distribution ◽

Classification Accuracy ◽

Infrared Image ◽

Data Sets ◽

Dimensional Manifold ◽

Infrared Images ◽

Leibler Divergence ◽

Data Points ◽

Low Dimensional

Infrared image recognition technology can work day and night and has a long detection distance. However, the infrared objects have less prior information and external factors in the real-world environment easily interfere with them. Therefore, infrared object classification is a very challenging research area. Manifold learning can be used to improve the classification accuracy of infrared images in the manifold space. In this article, we propose a novel manifold learning algorithm for infrared object detection and classification. First, a manifold space is constructed with each pixel of the infrared object image as a dimension. Infrared images are represented as data points in this constructed manifold space. Next, we simulate the probability distribution information of infrared data points with the Gaussian distribution in the manifold space. Then, based on the Gaussian distribution information in the manifold space, the distribution characteristics of the data points of the infrared image in the low-dimensional space are derived. The proposed algorithm uses the Kullback-Leibler (KL) divergence to minimize the loss function between two symmetrical distributions, and finally completes the classification in the low-dimensional manifold space. The efficiency of the algorithm is validated on two public infrared image data sets. The experiments show that the proposed method has a 97.46% classification accuracy and competitive speed in regards to the analyzed data sets.

Download Full-text

Enhancing Both Efficiency and Representational Capability of Isomap by Extensive Landmark Selection

Mathematical Problems in Engineering ◽

10.1155/2015/241436 ◽

2015 ◽

Vol 2015 ◽

pp. 1-18 ◽

Cited By ~ 2

Author(s):

Dong Liang ◽

Chen Qiao ◽

Zongben Xu

Keyword(s):

Manifold Learning ◽

Computational Efficiency ◽

New Method ◽

Data Sets ◽

Learning Approaches ◽

Real World Data ◽

Efficiency Property ◽

Data Points ◽

Low Dimensional ◽

High Computational Efficiency

The problems of improving computational efficiency and extending representational capability are the two hottest topics in approaches of global manifold learning. In this paper, a new method called extensive landmark Isomap (EL-Isomap) is presented, addressing both topics simultaneously. On one hand, originated from landmark Isomap (L-Isomap), which is known for its high computational efficiency property, EL-Isomap also possesses high computational efficiency through utilizing a small set of landmarks to embed all data points. On the other hand, EL-Isomap significantly extends the representational capability of L-Isomap and other global manifold learning approaches by utilizing only an available subset from the whole landmark set instead of all to embed each point. Particularly, compared with other manifold learning approaches, the data manifolds with intrinsic low-dimensional concave topologies and essential loops can be unwrapped by the new method more successfully, which are shown by simulation results on a series of synthetic and real-world data sets. Moreover, the accuracy, robustness, and computational complexity of EL-Isomap are analyzed in this paper, and the relation between EL-Isomap and L-Isomap is also discussed theoretically.

Download Full-text

Improved Isomap Algorithm Based on Supervision

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.427-429.1896 ◽

2013 ◽

Vol 427-429 ◽

pp. 1896-1899 ◽

Cited By ~ 1

Author(s):

Zhong Yong Wu ◽

Li Li Gan

Keyword(s):

Manifold Learning ◽

Isometric Embedding ◽

Learning Algorithm ◽

Learning Algorithms ◽

Classification Problem ◽

Classification Problems ◽

Series Of Experiments ◽

Effective Use

It focuses Isomap isometric embedding algorithm is proposed to improve supervised isometric embedding algorithm (SIsomap). Both supervised manifold learning algorithm, using the introduction of adjustable parameters in the form of classes in the classification problem for the effective use of information, making the manifold learning algorithms for classification classification problems have a stronger effect. Finally, through a series of experiments to fully illustrate the proposed improvement of the effectiveness of the algorithm, the proposed oversight of the manifold learning algorithm can more effectively enhance manifold learning algorithms for classification problems

Download Full-text

Transductive Bounds for the Multi-Class Majority Vote Classifier

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013566 ◽

2019 ◽

Vol 33 ◽

pp. 3566-3573

Author(s):

Vasilii Feofanov ◽

Emilie Devijver ◽

Massih-Reza Amini

Keyword(s):

Learning Algorithm ◽

Majority Vote ◽

Confusion Matrix ◽

Data Sets ◽

Bayes Classifier ◽

Partially Labeled Data ◽

Margin Distribution ◽

Training Examples ◽

Multi Class Classification ◽

Self Learning

In this paper, we propose a transductive bound over the risk of the majority vote classifier learned with partially labeled data for the multi-class classification. The bound is obtained by considering the class confusion matrix as an error indicator and it involves the margin distribution of the classifier over each class and a bound over the risk of the associated Gibbs classifier. When this latter bound is tight and, the errors of the majority vote classifier per class are concentrated on a low margin zone; we prove that the bound over the Bayes classifier’ risk is tight. As an application, we extend the self-learning algorithm to the multi-class case. The algorithm iteratively assigns pseudo-labels to a subset of unlabeled training examples that have their associated class margin above a threshold obtained from the proposed transductive bound. Empirical results on different data sets show the effectiveness of our approach compared to the same algorithm where the threshold is fixed manually, to the extension of TSVM to multi-class classification and to a graph-based semi-supervised algorithm.

Download Full-text

Performance Assessment of Learning Algorithms on Multi-Domain Data Sets

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/ijkdb.2018010103 ◽

2018 ◽

Vol 8 (1) ◽

pp. 27-41

Author(s):

Amit Kumar ◽

Bikash Kanti Sarkar

Keyword(s):

Data Mining ◽

Performance Assessment ◽

Learning Algorithm ◽

Wide Spectrum ◽

Learning Algorithms ◽

Complex Nature ◽

Data Sets ◽

Real World Data ◽

Comparative Performance ◽

Assessment Of Learning

This article describes how for the last few decades, data mining research has had significant progress in a wide spectrum of applications. Research in prediction of multi-domain data sets is a challenging task due to the imbalanced, voluminous, conflicting, and complex nature of data sets. A learning algorithm is the most important technique for solving these problems. The learning algorithms are widely used for classification purposes. But choosing the learners that perform best for data sets of particular domains is a challenging task in data mining. This article provides a comparative performance assessment of various state-of-the-art learning algorithms over multi-domain data sets to search the effective classifier(s) for a particular domain, e.g., artificial, natural, semi-natural, etc. In the present article, a total of 14 real world data sets are selected from University of California, Irvine (UCI) machine learning repository for conducting experiments using three competent individual learners and their hybrid combinations.

Download Full-text

Wearable Devices Data for Activity Prediction Using Machine Learning Algorithms

International Journal of Big Data and Analytics in Healthcare ◽

10.4018/ijbdah.2019010103 ◽

2019 ◽

Vol 4 (1) ◽

pp. 32-46

Author(s):

Lakshmi Prayaga ◽

Krishna Devulapalli ◽

Chandra Prayaga

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Wearable Devices ◽

Machine Learning Algorithms ◽

Embedded Sensors ◽

Data Sets ◽

Activity Prediction ◽

Related Data ◽

Recent Trends

Wearable devices are contributing heavily towards the proliferation of data and creating a rich minefield for data analytics. Recent trends in the design of wearable devices include several embedded sensors which also provide useful data for many applications. This research presents results obtained from studying human-activity related data, collected from wearable devices. The activities considered for this study were working at the computer, standing and walking, standing, walking, walking up and down the stairs, and talking while walking. The research entails the use of a portion of the data to train machine learning algorithms and build a model. The rest of the data is used as test data for predicting the activity of an individual. Details of data collection, processing, and presentation are also discussed. After studying the literature and the data sets, a Random Forest machine learning algorithm was determined to be best applicable algorithm for analyzing data from wearable devices. The software used in this research includes the R statistical package and the SensorLog app.

Download Full-text

Multi-label feature selection based on logistic regression and manifold learning

10.20944/preprints202107.0341.v1 ◽

2021 ◽

Author(s):

Yao Zhang ◽

Yingcang Ma ◽

Xiaofei Yang

Keyword(s):

Logistic Regression ◽

Feature Selection ◽

Manifold Learning ◽

High Dimensional ◽

Data Sets ◽

Efficiency Improvement ◽

Effective Technique ◽

Sparse Regularization ◽

Label Information ◽

Feature Weight

Like traditional single label learning, multi-label learning is also faced with the problem of dimensional disaster.Feature selection is an effective technique for dimensionality reduction and learning efficiency improvement of high-dimensional data. In this paper, Logistic regression, manifold learning and sparse regularization were combined to construct a joint framework for multi-label feature selection (LMFS). Firstly, the sparsity of the eigenweight matrix is constrained by the $L_{2,1}$-norm. Secondly, the feature manifold and label manifold can constrain the feature weight matrix to make it fit the data information and label information better. An iterative updating algorithm is designed and the convergence of the algorithm is proved.Finally, the LMFS algorithm is compared with DRMFS, SCLS and other algorithms on eight classical multi-label data sets. The experimental results show the effectiveness of LMFS algorithm.

Download Full-text

Increasing the Accuracy of Predictive Algorithms

Encyclopedia of Information Science and Technology, Second Edition ◽

10.4018/978-1-60566-026-4.ch300 ◽

2011 ◽

pp. 1906-1910

Author(s):

Sotiris Kotsiantis ◽

Dimitris Kanellopoulos ◽

Panayotis Pintelas

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Bayesian Networks ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Data Sets ◽

Combining Classifiers ◽

Predictive Algorithms

In classification learning, the learning scheme is presented with a set of classified examples from which it is expected tone can learn a way of classifying unseen examples (see Table 1). Formally, the problem can be stated as follows: Given training data {(x1, y1)…(xn, yn)}, produce a classifier h: X- >Y that maps an object x ? X to its classification label y ? Y. A large number of classification techniques have been developed based on artificial intelligence (logic-based techniques, perception-based techniques) and statistics (Bayesian networks, instance-based techniques). No single learning algorithm can uniformly outperform other algorithms over all data sets. The concept of combining classifiers is proposed as a new direction for the improvement of the performance of individual machine learning algorithms. Numerous methods have been suggested for the creation of ensembles of classi- fiers (Dietterich, 2000). Although, or perhaps because, many methods of ensemble creation have been proposed, there is as yet no clear picture of which method is best.

Download Full-text

Nonlinear Feature Extraction Through Manifold Learning in an Electronic Tongue Classification Task

Sensors ◽

10.3390/s20174834 ◽

2020 ◽

Vol 20 (17) ◽

pp. 4834

Author(s):

Jersson X. Leon-Medina ◽

Maribel Anaya ◽

Francesc Pozo ◽

Diego Tibaduiza

Keyword(s):

Feature Extraction ◽

Manifold Learning ◽

Classification Accuracy ◽

Learning Algorithm ◽

Learning Algorithms ◽

Electronic Tongue ◽

Locally Linear Embedding ◽

Data Set ◽

Nonlinear Feature Extraction ◽

Nonlinear Feature

A nonlinear feature extraction-based approach using manifold learning algorithms is developed in order to improve the classification accuracy in an electronic tongue sensor array. The developed signal processing methodology is composed of four stages: data unfolding, scaling, feature extraction, and classification. This study aims to compare seven manifold learning algorithms: Isomap, Laplacian Eigenmaps, Locally Linear Embedding (LLE), modified LLE, Hessian LLE, Local Tangent Space Alignment (LTSA), and t-Distributed Stochastic Neighbor Embedding (t-SNE) to find the best classification accuracy in a multifrequency large-amplitude pulse voltammetry electronic tongue. A sensitivity study of the parameters of each manifold learning algorithm is also included. A data set of seven different aqueous matrices is used to validate the proposed data processing methodology. A leave-one-out cross validation was employed in 63 samples. The best accuracy (96.83%) was obtained when the methodology uses Mean-Centered Group Scaling (MCGS) for data normalization, the t-SNE algorithm for feature extraction, and k-nearest neighbors (kNN) as classifier.

Download Full-text