A System for Outlier Detection of High Dimensional Data

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2012.1037 ◽

2012 ◽

pp. 197-201

Author(s):

Bharat Gupta ◽

Durga Toshniwal

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

Research Problem ◽

High Dimensional ◽

Full Data ◽

Data Set ◽

Detection Techniques ◽

New Concepts ◽

Low Dimensional ◽

Important Research Problem

In high dimensional data large no of outliers are embedded in low dimensional subspaces known as projected outliers, but most of existing outlier detection techniques are unable to find these projected outliers, because these methods perform detection of abnormal patterns in full data space. So, outlier detection in high dimensional data becomes an important research problem. In this paper we are proposing an approach for outlier detection of high dimensional data. Here we are modifying the existing SPOT approach by adding three new concepts namely Adaption of Sparse Sub-Space Template (SST), Different combination of PCS parameters and set of non outlying cells for testing data set.

Download Full-text

A Novel Density-based Technique for Outlier Detection of High Dimensional Data Utilizing Full Feature Space

Information Technology And Control ◽

10.5755/j01.itc.50.1.25588 ◽

2021 ◽

Vol 50 (1) ◽

pp. 138-152

Author(s):

Mujeeb Ur Rehman ◽

Dost Muhammad Khan

Keyword(s):

Data Mining ◽

Outlier Detection ◽

High Dimensional Data ◽

Research Work ◽

Feature Space ◽

High Dimensional ◽

Data Set ◽

Data Points ◽

Low Dimensional ◽

Intrinsic Feature

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.

Download Full-text

Outlier Detection in the Framework of Dimensionality Reduction

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001415500172 ◽

2015 ◽

Vol 29 (04) ◽

pp. 1550017 ◽

Cited By ~ 3

Author(s):

Qiang Ye ◽

Weifeng Zhi

Keyword(s):

Dimensionality Reduction ◽

Outlier Detection ◽

Nonlinear Models ◽

High Dimensional Data ◽

Detection Algorithm ◽

High Dimensional ◽

Dimensional Manifold ◽

Data Set ◽

Manifold Models ◽

Low Dimensional

We propose an effective outlier detection algorithm for high-dimensional data. We consider manifold models of data as is typically assumed in dimensionality reduction/manifold learning. Namely, we consider a noisy data set sampled from a low-dimensional manifold in a high-dimensional data space. Our algorithm uses local geometric structure to determine inliers, from which the outliers are identified. The algorithm is applicable to both linear and nonlinear models of data. We also discuss various implementation issues and we present several examples to demonstrate the effectiveness of the new approach.

Download Full-text

Outlier Detection Algorithm Basing on Similarity Measurement Relation

Advanced Engineering Forum ◽

10.4028/www.scientific.net/aef.6-7.621 ◽

2012 ◽

Vol 6-7 ◽

pp. 621-624

Author(s):

Hong Bin Fang

Keyword(s):

Outlier Detection ◽

Credit Card ◽

High Dimensional Data ◽

Detection Algorithm ◽

Experimental Result ◽

Similarity Measurement ◽

High Dimensional ◽

Data Set ◽

Network Intrusion ◽

Metric Function

Outlier detection is an important field of data mining, which is widely used in credit card fraud detection, network intrusion detection ,etc. A kind of high dimensional data similarity metric function and the concept of class density are given in the paper, basing on the combination of hierarchical clustering and similarity, as well as outlier detection algorithm about similarity measurement is presented after the redefinition of high dimension density outliers is put. The algorithm has some value for outliers detection of high dimensional data set in view of experimental result.

Download Full-text

An Efficient Method to Detect Outliers in High Dimensional Data

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8274 ◽

2019 ◽

Vol 16 (9) ◽

pp. 3938-3944

Author(s):

Atul Garg ◽

Kamaljeet Kaur

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

Detection Algorithm ◽

High Dimensional ◽

The Novel ◽

Linear Discriminant ◽

Detection Techniques ◽

Detection Algorithms ◽

Large Sets ◽

Detection Of Outliers

In this era, detection of outliers or anomalies from high dimensional data is really a great challenge. Normal data is distinguished from data containing anomalies using Outlier detection techniques which classifies new data as normal or abnormal. Different Outlier Detection algorithms are proposed by many researchers for high dimensional data and each algorithm has its own benefits and limitations. In the literature the researchers proposed different algorithms. For this work few algorithms such as Dice-Coefficient Index (DCI), Mapreduce Function and Linear Discriminant Analysis Algorithm (LDA) are considered. Mapreduce function is used to overcome the problem of large datasets. LDA is basically used in the reduction of the data dimensionality. In the present work a novel Hybrid Outlier Detection Algorithm (HbODA) is proposed for efficiently detection of outliers in high dimensional data. The important parameters efficiency, accuracy, computation cost, precision, recall etc. are focused for analyzing the performance of the novel hybrid algorithm. Experimental results on real large sets show that the proposed algorithm is better in detecting outliers than other traditional methods.

Download Full-text

Visualization of High-Dimensional Data by Pairwise Fusion Matrices Using t-SNE

Symmetry ◽

10.3390/sym11010107 ◽

2019 ◽

Vol 11 (1) ◽

pp. 107 ◽

Cited By ~ 6

Author(s):

Mujtaba Husnain ◽

Malik Missen ◽

Shahzad Mumtaz ◽

Muhammad Luqman ◽

Mickaël Coustaty ◽

...

Keyword(s):

Local Structure ◽

High Dimensional Data ◽

Three Dimensional ◽

Principal Component ◽

Large Data ◽

High Dimensional ◽

Data Set ◽

Novel Approach ◽

Critical Issues ◽

Low Dimensional

We applied t-distributed stochastic neighbor embedding (t-SNE) to visualize Urdu handwritten numerals (or digits). The data set used consists of 28 × 28 images of handwritten Urdu numerals. The data set was created by inviting authors from different categories of native Urdu speakers. One of the challenging and critical issues for the correct visualization of Urdu numerals is shape similarity between some of the digits. This issue was resolved using t-SNE, by exploiting local and global structures of the large data set at different scales. The global structure consists of geometrical features and local structure is the pixel-based information for each class of Urdu digits. We introduce a novel approach that allows the fusion of these two independent spaces using Euclidean pairwise distances in a highly organized and principled way. The fusion matrix embedded with t-SNE helps to locate each data point in a two (or three-) dimensional map in a very different way. Furthermore, our proposed approach focuses on preserving the local structure of the high-dimensional data while mapping to a low-dimensional plane. The visualizations produced by t-SNE outperformed other classical techniques like principal component analysis (PCA) and auto-encoders (AE) on our handwritten Urdu numeral dataset.

Download Full-text

A New Outlier Detection Algorithms Based on Markov Chain

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.366.456 ◽

2011 ◽

Vol 366 ◽

pp. 456-459 ◽

Cited By ~ 3

Author(s):

Jun Yang ◽

Ying Long Wang

Keyword(s):

Markov Chain ◽

Outlier Detection ◽

High Dimensional Data ◽

Weighted Graph ◽

Real Data ◽

Curse Of Dimensionality ◽

High Dimensional ◽

Large Set ◽

Data Set ◽

Novel Approach

Detecting outliers in a large set of data objects is a major data mining task aiming at finding different mechanisms responsible for different groups of objects in a data set. In high-dimensional data, these approaches are bound to deteriorate due to the notorious “curse of dimensionality”. In this paper, we propose a novel approach named ODMC (Outlier Detection Based On Markov Chain)，the effects of the “curse of dimensionality” are alleviated compared to purely distance-based approaches. A main advantage of our new approach is that our method is to use a major feature of an undirected weighted graph to calculate the outlier degree of each node, In a thorough experimental evaluation, we compare ODMC to the ABOD and FindFPOF for various artificial and real data set and show ODMC to perform especially well on high-dimensional data.

Download Full-text

A Comparison of Outlier Detection Techniques for High-Dimensional Data

International Journal of Computational Intelligence Systems ◽

10.2991/ijcis.11.1.50 ◽

2018 ◽

Vol 11 (1) ◽

pp. 652 ◽

Cited By ~ 8

Author(s):

Xiaodan Xu ◽

Huawen Liu ◽

Li Li ◽

Minghai Yao

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

High Dimensional ◽

Detection Techniques

Download Full-text

Hubness in Unsupervised Outlier Detection Techniques for High Dimensional Data –A Survey

International Journal of Computer Applications Technology and Research ◽

10.7753/ijcatr0411.1004 ◽

2015 ◽

Vol 4 (11) ◽

pp. 797-801

Author(s):

R.Lakshmi Devi ◽

R. Amalraj

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

High Dimensional ◽

Detection Techniques ◽

Unsupervised Outlier Detection

Download Full-text

Outlier Detection in High Dimensional Data Based on the Anti-Hub and Regression Technique

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2017.8219 ◽

2017 ◽

Vol V (VIII) ◽

pp. 1543-1551

Author(s):

Golla Hemalatha

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

Regression Technique ◽

High Dimensional

Download Full-text

Subspace Outlier Detection in High Dimensional Data using Ensemble of PCA-based Subspaces

2021 26th International Computer Conference, Computer Society of Iran (CSICC) ◽

10.1109/csicc52343.2021.9420589 ◽

2021 ◽

Author(s):

Mahboobeh Riahi-Madvar ◽

Babak Nasersharif ◽

Ahmad Akbari Azirani

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

High Dimensional

Download Full-text