A Memory-Saving and Efficient Data Transformation Technique for Mixed Data Sets Visualization

Auto-weighted concept factorization for joint feature map and data representation learning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200298 ◽

2021 ◽

pp. 1-13

Author(s):

Yikai Zhang ◽

Yong Peng ◽

Hongyu Bian ◽

Yuan Ge ◽

Feiwei Qin ◽

...

Keyword(s):

Objective Function ◽

Optimization Procedure ◽

Feature Space ◽

Representation Learning ◽

Data Representation ◽

Data Sets ◽

Reconstruction Process ◽

Factorization Model ◽

Efficient Data ◽

Concept Factorization

Concept factorization (CF) is an effective matrix factorization model which has been widely used in many applications. In CF, the linear combination of data points serves as the dictionary based on which CF can be performed in both the original feature space as well as the reproducible kernel Hilbert space (RKHS). The conventional CF treats each dimension of the feature vector equally during the data reconstruction process, which might violate the common sense that different features have different discriminative abilities and therefore contribute differently in pattern recognition. In this paper, we introduce an auto-weighting variable into the conventional CF objective function to adaptively learn the corresponding contributions of different features and propose a new model termed Auto-Weighted Concept Factorization (AWCF). In AWCF, on one hand, the feature importance can be quantitatively measured by the auto-weighting variable in which the features with better discriminative abilities are assigned larger weights; on the other hand, we can obtain more efficient data representation to depict its semantic information. The detailed optimization procedure to AWCF objective function is derived whose complexity and convergence are also analyzed. Experiments are conducted on both synthetic and representative benchmark data sets and the clustering results demonstrate the effectiveness of AWCF in comparison with the related models.

Download Full-text

A SELF-ORGANIZING MAP FOR MIXED CONTINUOUS AND CATEGORICAL DATA

International Journal of Computing ◽

10.47839/ijc.10.1.733 ◽

2011 ◽

pp. 24-32 ◽

Cited By ~ 1

Author(s):

Nicoleta Rogovschi ◽

Mustapha Lebbah ◽

Younès Bennani

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Mixed Data ◽

Categorical Variables ◽

Data Sets ◽

Self Organizing Map ◽

Data Set ◽

Public Data ◽

Self Organizing

Most traditional clustering algorithms are limited to handle data sets that contain either continuous or categorical variables. However data sets with mixed types of variables are commonly used in data mining field. In this paper we introduce a weighted self-organizing map for clustering, analysis and visualization mixed data (continuous/binary). The learning of weights and prototypes is done in a simultaneous manner assuring an optimized data clustering. More variables has a high weight, more the clustering algorithm will take into account the informations transmitted by these variables. The learning of these topological maps is combined with a weighting process of different variables by computing weights which influence the quality of clustering. We illustrate the power of this method with data sets taken from a public data set repository: a handwritten digit data set, Zoo data set and other three mixed data sets. The results show a good quality of the topological ordering and homogenous clustering.

Download Full-text

Airport Trends Analytics Engine using the ARIMA Model

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.12.16033 ◽

2018 ◽

Vol 7 (3.12) ◽

pp. 239

Author(s):

Chitransh Rajesh ◽

Yash Jain ◽

J Jayapradha

Keyword(s):

Data Analytics ◽

Arima Model ◽

Interactive Visualization ◽

Database Management System ◽

Data Sets ◽

Arima Models ◽

End User ◽

Report Generation ◽

Efficient Data ◽

Visualization Of Data

Data Analytics is the process of analyzing unprocessed data to draw conclusions by studying and inspecting various patterns in the data. Several algorithms and conceptual methods are often followed to derive legit and accurate results. Efficient data handling is important for interactive visualization of data sets. Considering recent researches and analytical theories on column-oriented Database Management System, we are developing a new data engine using R and Tableau to predict airport trends. The engine uses Univariate datasets (Example, Perth Airport Passenger Movement Dataset, and Newark Airport Cargo Stats Dataset) to analyze and predict accurate trends. Data analyzing and prediction is done with the implementation of Time Series Analysis and respective ARIMA Models for respective modules. Development of modules is done using RStudio whereas Tableau is used for interactive visualization and end-user report generation. The Airport Trends Analytics Engine is an integral part of R and Tableau 10.4 and is optimized for use on desktop and server environments.

Download Full-text

CLUSTERING CATEGORICAL AND NUMERICAL DATA: A NEW PROCEDURE USING MULTIDIMENSIONAL SCALING

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622003000549 ◽

2003 ◽

Vol 02 (01) ◽

pp. 135-159 ◽

Cited By ~ 12

Author(s):

SUNG-GI LEE ◽

DEOK-KYUN YUN

Keyword(s):

Multidimensional Scaling ◽

Clustering Algorithms ◽

Numerical Data ◽

Large Data ◽

Careful Analysis ◽

Mixed Data ◽

Coordinate Space ◽

Data Sets ◽

Categorical Attributes ◽

Categorical Attribute

In this paper, we present a concept based on the similarity of categorical attribute values considering implicit relationships and propose a new and effective clustering procedure for mixed data. Our procedure obtains similarities between categorical values from careful analysis and maps the values in each categorical attribute into points in two-dimensional coordinate space using multidimensional scaling. These mapped values make it possible to interpret the relationships between attribute values and to directly apply categorical attributes to clustering algorithms using a Euclidean distance. After trivial modifications, our procedure for clustering mixed data uses the k-means algorithm, well known for its efficiency in clustering large data sets. We use the familiar soybean disease and adult data sets to demonstrate the performance of our clustering procedure. The satisfactory results that we have obtained demonstrate the effectiveness of our algorithm in discovering structure in data.

Download Full-text

Enabling nanotechnology entrepreneurship in a French context

Journal of Small Business and Enterprise Development ◽

10.1108/jsbed-10-2015-0139 ◽

2016 ◽

Vol 23 (4) ◽

pp. 1009-1031 ◽

Cited By ~ 1

Author(s):

Anna Glaser ◽

Sonia Ben Slimane ◽

Claire Auplat ◽

Régis Coeurderoy

Keyword(s):

Early Stage ◽

Theoretical Framework ◽

Mixed Data ◽

Data Sets ◽

Enabling Factors ◽

Content Type ◽

Entrepreneurial Behaviour ◽

Starting Point ◽

University Scientists

Purpose The purpose of this paper is to build a holistic theoretical framework of enabling factors contributing to the development of enterprise in nanotechnology-related industries, in a French context. Design/methodology/approach A systematic literature review methodology was adopted. The review used three gauges to identify enabling factors contributing to the development of enterprise in nanotechnology-related industries in a French context: first, it analysed the literature related to the development of nanotechnologies in a perspective of sustainability in a multidisciplinary stance (“Green view”). Second, it took a disciplinary stance by exploring academic journals in the field of entrepreneurship (“Entrepreneurship view”). Third, it studied the perspective of France (“French view”). Findings The main finding is that in spite of different approaches and sometimes seemingly conflicting stances, the three views converge on three enabling factors: the importance of knowledge sharing across boundaries, access to university scientists and facilities, and government intervention. However, each view also has its particularities: the “Green view” emphasizes the need for civil society inclusion, the “Entrepreneurship view” underlines the importance of early stage capital and entrepreneurial behaviour and the “French view” concentrates on the role of clusters. Research limitations/implications The paper provides a theoretical framework and a starting point for further work on entrepreneurial nanotechnology facilitation. Its findings constitute a benchmark which may be tested in empirical cases. The focus on the French context may be seen as a limitation but also as a source of interesting comparative work focussing on other national or regional contexts. Practical implications The paper shows that public policy is an important element in the nascent field of enterprise development for nano-based materials. It outlines how different contexts create different barriers to entrepreneurship, and it proposes recommendations to overcome some of these barriers. Originality/value In this paper, findings result from an exploration of the nanotechnology literature that focusses solely on nanotechnology data sets and not on mixed data sets. The use of three different gauges leads to the construction of a holistic theoretical framework that includes enabling factors as well as the types of barriers that entrepreneurs have to overcome to succeed.

Download Full-text

Hamming Distance based Clustering Algorithm

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2012010102 ◽

2012 ◽

Vol 2 (1) ◽

pp. 11-20 ◽

Cited By ~ 3

Author(s):

Ritu Vijay ◽

Prerna Mahajan ◽

Rekha Kandwal

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

Hamming Distance ◽

Promising Result ◽

Clustering Algorithms ◽

Distribution Patterns ◽

Mixed Data ◽

Binary Representation ◽

Data Sets ◽

Performance Study

Cluster analysis has been extensively used in machine learning and data mining to discover distribution patterns in the data. Clustering algorithms are generally based on a distance metric in order to partition the data into small groups such that data instances in the same group are more similar than the instances belonging to different groups. In this paper the authors have extended the concept of hamming distance for categorical data .As a data processing step they have transformed the data into binary representation. The authors have used proposed algorithm to group data points into clusters. The experiments are carried out on the data sets from UCI machine learning repository to analyze the performance study. They conclude by stating that this proposed algorithm shows promising result and can be extended to handle numeric as well as mixed data.

Download Full-text

A clustering method for very large mixed data sets

Proceedings 2001 IEEE International Conference on Data Mining ◽

10.1109/icdm.2001.989590 ◽

2002 ◽

Cited By ~ 3

Author(s):

G. Sanchez-Diaz ◽

J. Ruiz-Shulcloper

Keyword(s):

Mixed Data ◽

Data Sets ◽

Clustering Method

Download Full-text

Applications of PSO and data transformation technique in interval type-2 fuzzy identification

Proceeding of the 11th World Congress on Intelligent Control and Automation ◽

10.1109/wcica.2014.7052805 ◽

2014 ◽

Author(s):

Shu'en Wang ◽

Jinmei Dou ◽

Fucai Liu ◽

Jianxiong Li

Keyword(s):

Data Transformation ◽

Fuzzy Identification ◽

Transformation Technique ◽

Interval Type

Download Full-text

Principle Component Analysis Based Data Mining for Contingency Analysis on IEEE 30 Bus Power System

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7116.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 878-882

Keyword(s):

Data Mining ◽

Power System ◽

Performance Index ◽

Data Transformation ◽

Component Analysis ◽

Support Vector ◽

Power Performance ◽

Contingency Analysis ◽

Transformation Technique ◽

Bus System

This Paper is an attempt to develop a Data Mining tool for the contingency of the power system. By mining the big data in the power system and analyzing the early detection of the contingency in the power system a larger cost cutting can be planned. As Mining would reduce the computational complexity of the contingency analysis this attempt would lead to reduction in the hardware use. This paper uses Multiclass Relevance Vector Machine(MCRVM) and Multiclass Support vector machine(MCSVM) in order to mine the data which include the voltage, power generated , power angles , power demand in different lines of the power system. The Data mining would need a data transformation technique, which would reduce the dimensionality of the data introduced for mining. The combination of Data cleansing and the Principal Component Analysis would act as the data transformation technique in this paper. A Matlab based simulation is carried using the IEEE 30 bus system for the contingency analysis by incorporating the loading risk assessment strategy using the Multiclass SVM and RVM and the results are compared and the outputs are tabulated. Active power performance index and the reactive power performance index are used in contingency analysis of the IEEE 30 bus system thus used and the accuracy of classification and the speed of classification with the different methods and the contingency rankings are found and displayed.

Download Full-text

An internal evaluation of higher education curriculum using data transformation model

Asian Journal of Assessment in Teaching and Learning ◽

10.37134/ajatel.vol10.2.1.2020 ◽

2020 ◽

Vol 10 (2) ◽

pp. 1-9

Author(s):

Michael Bobias Cahapay

Keyword(s):

Higher Education ◽

Data Transformation ◽

Transformation Model ◽

Data Sets ◽

Education Curriculum ◽

Evaluation Theory ◽

Internal Evaluation ◽

Higher Education Curriculum ◽

Mixed Method Design ◽

Using Data

A curriculum does not exist in a void; internal members play a key role in responding to the different forces that continually shape it. One of the approaches to evaluation is through internal evaluation from the perspective of the inside members who work with the curriculum. However, the internal evaluation may pose restricted evaluation due to the innate subjective human judgment. Considering these contexts, this paper performed a pilot internal evaluation of a selected aspect of a higher education curriculum using a triangulation mixed method design called the data transformation model. Based on the results, the evaluation using the data transformation model probed important points of agreement and discrepancy in the data sets. The implications for evaluation theory and curriculum practice are discussed. It is suggested that an extension of the current formative internal evaluation continuing the tradition of data transformative model but progressively focusing on larger aspects of the curriculum should be further conducted.

Download Full-text