Hierarchical Hexagonal Clustering and Indexing

Vojtěch Uher; Petr Gajdoš; Václav Snášel; Yu-Chi Lai; Michal Radecký

doi:10.3390/sym11060731

Hierarchical Hexagonal Clustering and Indexing

Symmetry ◽

10.3390/sym11060731 ◽

2019 ◽

Vol 11 (6) ◽

pp. 731 ◽

Cited By ~ 5

Author(s):

Vojtěch Uher ◽

Petr Gajdoš ◽

Václav Snášel ◽

Yu-Chi Lai ◽

Michal Radecký

Keyword(s):

Data Analysis ◽

Dimensional Space ◽

Memory Representation ◽

Convex Shape ◽

Uniform Distance ◽

Straightforward Method ◽

Uniform Grids ◽

Indexing Method ◽

Hexagonal Grids ◽

Pattern Shape

Space-filling curves (SFCs) represent an efficient and straightforward method for sparse-space indexing to transform an n-dimensional space into a one-dimensional representation. This is often applied for multidimensional point indexing which brings a better perspective for data analysis, visualization and queries. SFCs are involved in many areas such as big data analysis and visualization, image decomposition, computer graphics and geographic information systems (GISs). The indexing methods subdivide the space into logic clusters of close points and they differ in various parameters including the cluster order, the distance metrics, and the pattern shape. Beside the simple and highly preferred triangular and square uniform grids, the hexagonal uniform grids have gained high interest especially in areas such as GISs, image processing and data visualization for the uniform distance between cells and high effectiveness of circle coverage. While the linearization of hexagons is an obvious approach for memory representation, it seems there is no hexagonal SFC indexing method generally used in practice. The main limitation of hexagons lies in lacking infinite decomposition into sub-hexagons and similarity of tiles on different levels of hierarchy. Our research aims at defining a fast and robust hexagonal SFC method. The Gosper fractal is utilized to preserve the benefits of hexagonal grids and to efficiently and hierarchically linearize points in a hexagonal grid while solving the non-convex shape and recursive transformation issues of the fractal. A comparison to other SFCs and grids is conducted to verify the robustness and effectiveness of our hexagonal method.

Download Full-text

iPoc: A Polar Coordinate Based Indexing Method for Nearest Neighbor Search in High Dimensional Space

Web-Age Information Management - Lecture Notes in Computer Science ◽

10.1007/978-3-642-14246-8_34 ◽

2010 ◽

pp. 345-356 ◽

Cited By ~ 4

Author(s):

Zhang Liu ◽

Chaokun Wang ◽

Peng Zou ◽

Wei Zheng ◽

Jianmin Wang

Keyword(s):

Nearest Neighbor ◽

Dimensional Space ◽

Nearest Neighbor Search ◽

High Dimensional ◽

High Dimensional Space ◽

Neighbor Search ◽

Indexing Method ◽

Polar Coordinate

Download Full-text

Group task-related component analysis (gTRCA): a multivariate method for inter-trial reproducibility and inter-subject similarity maximization for EEG data analysis

Scientific Reports ◽

10.1038/s41598-019-56962-2 ◽

2020 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Hirokazu Tanaka

Keyword(s):

Data Analysis ◽

Dimensional Space ◽

Occipital Lobe ◽

Component Analysis ◽

Previous Method ◽

Brain Computer Interfaces ◽

Group Level ◽

Computer Interfaces ◽

Eeg Data ◽

Low Dimensional

AbstractEEG is known to contain considerable inter-trial and inter-subject variability, which poses a challenge in any group-level EEG analyses. A true experimental effect must be reproducible even with variabilities in trials, sessions, and subjects. Extracting components that are reproducible across trials and subjects benefits both understanding common mechanisms in neural processing of cognitive functions and building robust brain-computer interfaces. This study extends our previous method (task-related component analysis, TRCA) by maximizing not only trial-by-trial reproducibility within single subjects but also similarity across a group of subjects, hence referred to as group TRCA (gTRCA). The problem of maximizing reproducibility of time series across trials and subjects is formulated as a generalized eigenvalue problem. We applied gTRCA to EEG data recorded from 35 subjects during a steady-state visual-evoked potential (SSVEP) experiment. The results revealed: (1) The group-representative data computed by gTRCA showed higher and consistent spectral peaks than other conventional methods; (2) Scalp maps obtained by gTRCA showed estimated source locations consistently within the occipital lobe; And (3) the high-dimensional features extracted by gTRCA are consistently mapped to a low-dimensional space. We conclude that gTRCA offers a framework for group-level EEG data analysis and brain-computer interfaces alternative in complement to grand averaging.

Download Full-text

Indexing of Image Features Using Quadtree

Emerging Technologies in Intelligent Applications for Image and Video Processing - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-4666-9685-3.ch008 ◽

2016 ◽

pp. 185-203

Author(s):

N. Puviarasan ◽

R. Bhavani

Keyword(s):

Extraction Method ◽

Dimensional Space ◽

Texture Features ◽

Image Features ◽

High Dimensional ◽

Shape Features ◽

Indexing Methods ◽

Feature Extraction Method ◽

Indexing Method ◽

Color Texture

In Content based image retrieval (CBIR) applications, the idea of indexing is mapping the extracted descriptors from images into a high-dimensional space. In this paper, visual features like color, texture and shape are considered. The color features are extracted using color coherence vector (CCV), texture features are obtained from Segmentation based Fractal Texture Analysis (SFTA). The shape features of an image are extracted using the Fourier Descriptors (FD) which is the contour based feature extraction method. All features of an image are then combined. After combining the color, texture and shape features using appropriate weights, the quadtree is used for indexing the images. It is experimentally found that the proposed indexing method using quadtree gives better performance than the other existing indexing methods.

Download Full-text

Isotropic correlation functions on d-dimensional balls

Advances in Applied Probability ◽

10.1017/s0001867800009320 ◽

1999 ◽

Vol 31 (03) ◽

pp. 625-631

Author(s):

Tilmann Gneiting

Keyword(s):

Data Analysis ◽

Spatial Data ◽

Correlation Functions ◽

Dimensional Space ◽

Extension Theorem ◽

Spatial Dimension ◽

Positive Definite ◽

Spatial Data Analysis ◽

Spatial Lag ◽

Image Position

A popular procedure in spatial data analysis is to fit a line segment of the formc(x) = 1 - α ||x||, ||x|| < 1, to observed correlations at (appropriately scaled) spatial lagxind-dimensional space. We show that such an approach is permissible if and only ifthe upper bound depending on the spatial dimensiond. The proof relies on Matheron's turning bands operator and an extension theorem for positive definite functions due to Rudin. Side results and examples include a general discussion of isotropic correlation functions defined ond-dimensional balls.

Download Full-text

Cemetery Organization, Brood Sorting, Data Analysis, and Graph Partitioning

Swarm Intelligence ◽

10.1093/oso/9780195131581.003.0008 ◽

1999 ◽

Author(s):

Eric Bonabeau ◽

Marco Dorigo ◽

Guy Theraulaz

Keyword(s):

Data Analysis ◽

Graph Partitioning ◽

Dimensional Space ◽

Local Information ◽

Lasius Niger ◽

Sorting Algorithm ◽

Sorting Data ◽

Pheidole Pallidula ◽

Characteristic Features ◽

Low Dimensional

In the previous two chapters, foraging and division of labor were shown to be useful metaphors to design optimization and resource allocation algrithms. In this chapter, we will see that the clustering and sorting behavior of ants has stimulated researchers to design new algorithms for data analysis and graph partitioning. Several species of ants cluster corpses to form a “cemetery,” or sort their larvae into several piles. This behavior is still not fully understood, but a simple model, in which agents move randomly in space and pick up and deposit items on the basis of local information, may account for some of the characteristic features of clustering and sorting in ants. The model can also be applied to data analysis and graph partitioning: objects with different attributes or the nodes of a graph can be considered items to be sorted. Objects placed next to each other by the sorting algorithm have similar attributes, and nodes placed next each other by the sorting algorithm are tightly connected in the graph. The sorting algorithm takes place in a two-dimensional space, thereby offering a low-dimensional representation of the objects or of the graph. Distributed clustering, and more recently sorting, by a swarm of robots have served as benchmarks for swarm-based robotics. In all cases, the robots exhibit extremely simple behavior, act on the basis of purely local information, and communicate indirectly except for collision avoidance. In several species of ants, workers have been reported to form piles of corpses— literally cemeteries—to clean up their nests. Chretien [72] has performed experiments with the ant Lasius niger to study the organization of cemeteries. Other experiments on the ant Pheidole pallidula are also reported in Deneubourg et al. [88], and many species actually organize a cemetery. Figure 4.1 shows the dynamics of cemetery organization in another ant, Messor sancta. If corpses, or, more precisely, sufficiently large parts of corposes are randomly distributed in space at the beginning of the experiment, the workers form cemetery clusters within a few hours.

Download Full-text

Capturing complex human behaviors in representative sports contexts with a single camera

Medicina ◽

10.3390/medicina46060057 ◽

2010 ◽

Vol 46 (6) ◽

pp. 408 ◽

Cited By ~ 37

Author(s):

Ricardo Duarte ◽

Duarte Araújo ◽

Orlando Fernandes ◽

Cristina Fonseca ◽

Vanda Correia ◽

...

Keyword(s):

Linear Transformation ◽

Dimensional Space ◽

Team Sports ◽

Video Tracking ◽

Collective Variable ◽

Single Camera ◽

Kinematic Data ◽

Straightforward Method ◽

Low Dimensional ◽

And Control

Background and objective. In the last years, several motion analysis methods have been developed without considering representative contexts for sports performance. The purpose of this paper was to explain and underscore a straightforward method to measure human behavior in these contexts. Material and methods. Procedures combining manual video tracking (with TACTO device) and bidimensional reconstruction (through direct linear transformation) using a single camera were used in order to capture kinematic data required to compute collective variable(s) and control parameter(s). These procedures were applied to a 1vs1 association football task as an illustrative subphase of team sports and will be presented in a tutorial fashion. Results. Preliminary analysis of distance and velocity data identified a collective variable (difference between the distance of the attacker and the defender to a target defensive area) and two nested control parameters (interpersonal distance and relative velocity). Conclusions. Findings demonstrated that the complementary use of TACTO software and direct linear transformation permit to capture and reconstruct complex human actions in their context in a low dimensional space (information reduction).

Download Full-text

The Value of Manifold Learning Algorithms in Simplifying Complex Datasets for More Efficacious Analysis

Sciential - McMaster Undergraduate Science Journal ◽

10.15173/sciential.v1i5.2537 ◽

2020 ◽

pp. 13-20

Author(s):

Muhammad Amjad

Keyword(s):

Data Analysis ◽

Manifold Learning ◽

Dimensional Space ◽

Feature Space ◽

Topological Data Analysis ◽

High Dimensional ◽

Dimensional Manifold ◽

Low Dimensional ◽

Low Dimensional Manifold ◽

Complex Datasets

Advances in manifold learning have proven to be of great benefit in reducing the dimensionality of large complex datasets. Elements in an intricate dataset will typically belong in high-dimensional space as the number of individual features or independent variables will be extensive. However, these elements can be integrated into a low-dimensional manifold with well-defined parameters. By constructing a low-dimensional manifold and embedding it into high-dimensional feature space, the dataset can be simplified for easier interpretation. In spite of this elemental dimensionality reduction, the dataset’s constituents do not lose any information, but rather filter it with the hopes of elucidating the appropriate knowledge. This paper will explore the importance of this method of data analysis, its applications, and its extensions into topological data analysis.

Download Full-text

Exploring Equilibria in Stochastic Delay Differential Equations Using Persistent Homology

Volume 8: 26th Conference on Mechanical Vibration and Noise ◽

10.1115/detc2014-35655 ◽

2014 ◽

Cited By ~ 1

Author(s):

Firas A. Khasawneh ◽

Elizabeth Munch

Keyword(s):

Data Analysis ◽

Point Cloud ◽

Dimensional Space ◽

Persistent Homology ◽

Topological Data Analysis ◽

Stochastic Delay Differential Equations ◽

Machining Processes ◽

Automatic Data ◽

Stochastic Delay ◽

Delay Differential

This paper explores the possibility of using techniques from topological data analysis for studying datasets generated from dynamical systems described by stochastic delay equations. The dataset is generated using Euler-Maryuama simulation for two first order systems with stochastic parameters drawn from a normal distribution. The first system contains additive noise whereas the second one contains parametric or multiplicative noise. Using Taken’s embedding, the dataset is converted into a point cloud in a high-dimensional space. Persistent homology is then employed to analyze the structure of the point cloud in order to study equilibria and periodic solutions of the underlying system. Our results show that the persistent homology successfully differentiates between different types of equilibria. Therefore, we believe this approach will prove useful for automatic data analysis of vibration measurements. For example, our approach can be used in machining processes for chatter detection and prevention.

Download Full-text

Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis

BMC Bioinformatics ◽

10.1186/s12859-019-3179-5 ◽

2019 ◽

Vol 20 (S19) ◽

Cited By ~ 5

Author(s):

Thomas A. Geddes ◽

Taiyun Kim ◽

Lihao Nan ◽

James G. Burchfield ◽

Jean Y. H. Yang ◽

...

Keyword(s):

Data Analysis ◽

Single Cell ◽

Clustering Algorithm ◽

Dimensional Space ◽

Clustering Algorithms ◽

Random Projection ◽

Computational Technique ◽

Cell Type ◽

Cluster Ensemble ◽

Cell Type Specific

Abstract Background Single-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the high feature-dimensionality of the transcriptome (i.e. the large number of measured genes in each cell) and because only a small fraction of genes are cell type-specific and therefore informative for generating cell type-specific clusters, clustering directly on the original feature/gene dimension may lead to uninformative clusters and hinder correct cell type identification. Results Here, we propose an autoencoder-based cluster ensemble framework in which we first take random subspace projections from the data, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and finally apply ensemble clustering across all encoded datasets to generate clusters of cells. We employ four evaluation metrics to benchmark clustering performance and our experiments demonstrate that the proposed autoencoder-based cluster ensemble can lead to substantially improved cell type-specific clusters when applied with both the standard k-means clustering algorithm and a state-of-the-art kernel-based clustering algorithm (SIMLR) designed specifically for scRNA-seq data. Compared to directly using these clustering algorithms on the original datasets, the performance improvement in some cases is up to 100%, depending on the evaluation metric used. Conclusions Our results suggest that the proposed framework can facilitate more accurate cell type identification as well as other downstream analyses. The code for creating the proposed autoencoder-based cluster ensemble framework is freely available from https://github.com/gedcom/scCCESS

Download Full-text

Ranking Gradients in Multi-Dimensional Spaces

Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development ◽

10.4018/978-1-60566-748-5.ch011 ◽

2010 ◽

pp. 251-269

Author(s):

Ronnie Alves ◽

Joel Ribeiro ◽

Orlando Belo ◽

Jiawei Han

Keyword(s):

Data Analysis ◽

Dimensional Space ◽

Evaluation Study ◽

Business Strategies ◽

Future Research ◽

Business Organizations ◽

Combining Data ◽

Gradient Based ◽

Preference Query ◽

Query Selection

Business organizations must pay attention to interesting changes in customer behavior in order to anticipate their needs and act accordingly with appropriated business actions. Tracking customer’s commercial paths through the products they are interested in is an essential technique to improve business and increase customer satisfaction. Data warehousing (DW) allows us to do so, giving the basic means to record every customer transaction based on the different business strategies established. Although managing such huge amounts of records may imply business advantage, its exploration, especially in a multi-dimensional space (MDS), is a nontrivial task. The more dimensions we want to explore, the more are the computational costs involved in multi-dimensional data analysis (MDA). To make MDA practical in real world business problems, DW researchers have been working on combining data cubing and mining techniques to detect interesting changes in MDS. Such changes can also be detected through gradient queries. While those studies have provided the basis for future research in MDA, just few of them points to preference query selection in MDS. Thus, not only the exploration of changes in MDS is an essential task, but also even more important is ranking most interesting gradients. In this chapter, the authors investigate how to mine and rank the most interesting changes in a MDS applying a TOP-K gradient strategy. Additionally, the authors also propose a gradient-based cubing method to evaluate interesting gradient regions in MDS. So, the challenge is to find maximum gradient regions (MGRs) that maximize the task of raking gradients in a MDS. The authors’ evaluation study demonstrates that the proposed method presents a promising strategy for ranking gradients in MDS.

Download Full-text