A Memory-Saving and Efficient Data Transformation Technique for Mixed Data Sets Visualization

Author(s):  
Sun Yang ◽  
Zhao Xiang ◽  
Tang Daquan ◽  
Xiao Weidong
2021 ◽  
pp. 1-13
Author(s):  
Yikai Zhang ◽  
Yong Peng ◽  
Hongyu Bian ◽  
Yuan Ge ◽  
Feiwei Qin ◽  
...  

Concept factorization (CF) is an effective matrix factorization model which has been widely used in many applications. In CF, the linear combination of data points serves as the dictionary based on which CF can be performed in both the original feature space as well as the reproducible kernel Hilbert space (RKHS). The conventional CF treats each dimension of the feature vector equally during the data reconstruction process, which might violate the common sense that different features have different discriminative abilities and therefore contribute differently in pattern recognition. In this paper, we introduce an auto-weighting variable into the conventional CF objective function to adaptively learn the corresponding contributions of different features and propose a new model termed Auto-Weighted Concept Factorization (AWCF). In AWCF, on one hand, the feature importance can be quantitatively measured by the auto-weighting variable in which the features with better discriminative abilities are assigned larger weights; on the other hand, we can obtain more efficient data representation to depict its semantic information. The detailed optimization procedure to AWCF objective function is derived whose complexity and convergence are also analyzed. Experiments are conducted on both synthetic and representative benchmark data sets and the clustering results demonstrate the effectiveness of AWCF in comparison with the related models.


2011 ◽  
pp. 24-32 ◽  
Author(s):  
Nicoleta Rogovschi ◽  
Mustapha Lebbah ◽  
Younès Bennani

Most traditional clustering algorithms are limited to handle data sets that contain either continuous or categorical variables. However data sets with mixed types of variables are commonly used in data mining field. In this paper we introduce a weighted self-organizing map for clustering, analysis and visualization mixed data (continuous/binary). The learning of weights and prototypes is done in a simultaneous manner assuring an optimized data clustering. More variables has a high weight, more the clustering algorithm will take into account the informations transmitted by these variables. The learning of these topological maps is combined with a weighting process of different variables by computing weights which influence the quality of clustering. We illustrate the power of this method with data sets taken from a public data set repository: a handwritten digit data set, Zoo data set and other three mixed data sets. The results show a good quality of the topological ordering and homogenous clustering.


2018 ◽  
Vol 7 (3.12) ◽  
pp. 239
Author(s):  
Chitransh Rajesh ◽  
Yash Jain ◽  
J Jayapradha

Data Analytics is the process of analyzing unprocessed data to draw conclusions by studying and inspecting various patterns in the data. Several algorithms and conceptual methods are often followed to derive legit and accurate results. Efficient data handling is important for interactive visualization of data sets. Considering recent researches and analytical theories on column-oriented Database Management System, we are developing a new data engine using R and Tableau to predict airport trends. The engine uses Univariate datasets (Example, Perth Airport Passenger Movement Dataset, and Newark Airport Cargo Stats Dataset) to analyze and predict accurate trends. Data analyzing and prediction is done with the implementation of Time Series Analysis and respective ARIMA Models for respective modules. Development of modules is done using RStudio whereas Tableau is used for interactive visualization and end-user report generation. The Airport Trends Analytics Engine is an integral part of R and Tableau 10.4 and is optimized for use on desktop and server environments.  


Author(s):  
SUNG-GI LEE ◽  
DEOK-KYUN YUN

In this paper, we present a concept based on the similarity of categorical attribute values considering implicit relationships and propose a new and effective clustering procedure for mixed data. Our procedure obtains similarities between categorical values from careful analysis and maps the values in each categorical attribute into points in two-dimensional coordinate space using multidimensional scaling. These mapped values make it possible to interpret the relationships between attribute values and to directly apply categorical attributes to clustering algorithms using a Euclidean distance. After trivial modifications, our procedure for clustering mixed data uses the k-means algorithm, well known for its efficiency in clustering large data sets. We use the familiar soybean disease and adult data sets to demonstrate the performance of our clustering procedure. The satisfactory results that we have obtained demonstrate the effectiveness of our algorithm in discovering structure in data.


2016 ◽  
Vol 23 (4) ◽  
pp. 1009-1031 ◽  
Author(s):  
Anna Glaser ◽  
Sonia Ben Slimane ◽  
Claire Auplat ◽  
Régis Coeurderoy

Purpose The purpose of this paper is to build a holistic theoretical framework of enabling factors contributing to the development of enterprise in nanotechnology-related industries, in a French context. Design/methodology/approach A systematic literature review methodology was adopted. The review used three gauges to identify enabling factors contributing to the development of enterprise in nanotechnology-related industries in a French context: first, it analysed the literature related to the development of nanotechnologies in a perspective of sustainability in a multidisciplinary stance (“Green view”). Second, it took a disciplinary stance by exploring academic journals in the field of entrepreneurship (“Entrepreneurship view”). Third, it studied the perspective of France (“French view”). Findings The main finding is that in spite of different approaches and sometimes seemingly conflicting stances, the three views converge on three enabling factors: the importance of knowledge sharing across boundaries, access to university scientists and facilities, and government intervention. However, each view also has its particularities: the “Green view” emphasizes the need for civil society inclusion, the “Entrepreneurship view” underlines the importance of early stage capital and entrepreneurial behaviour and the “French view” concentrates on the role of clusters. Research limitations/implications The paper provides a theoretical framework and a starting point for further work on entrepreneurial nanotechnology facilitation. Its findings constitute a benchmark which may be tested in empirical cases. The focus on the French context may be seen as a limitation but also as a source of interesting comparative work focussing on other national or regional contexts. Practical implications The paper shows that public policy is an important element in the nascent field of enterprise development for nano-based materials. It outlines how different contexts create different barriers to entrepreneurship, and it proposes recommendations to overcome some of these barriers. Originality/value In this paper, findings result from an exploration of the nanotechnology literature that focusses solely on nanotechnology data sets and not on mixed data sets. The use of three different gauges leads to the construction of a holistic theoretical framework that includes enabling factors as well as the types of barriers that entrepreneurs have to overcome to succeed.


2012 ◽  
Vol 2 (1) ◽  
pp. 11-20 ◽  
Author(s):  
Ritu Vijay ◽  
Prerna Mahajan ◽  
Rekha Kandwal

Cluster analysis has been extensively used in machine learning and data mining to discover distribution patterns in the data. Clustering algorithms are generally based on a distance metric in order to partition the data into small groups such that data instances in the same group are more similar than the instances belonging to different groups. In this paper the authors have extended the concept of hamming distance for categorical data .As a data processing step they have transformed the data into binary representation. The authors have used proposed algorithm to group data points into clusters. The experiments are carried out on the data sets from UCI machine learning repository to analyze the performance study. They conclude by stating that this proposed algorithm shows promising result and can be extended to handle numeric as well as mixed data.


This Paper is an attempt to develop a Data Mining tool for the contingency of the power system. By mining the big data in the power system and analyzing the early detection of the contingency in the power system a larger cost cutting can be planned. As Mining would reduce the computational complexity of the contingency analysis this attempt would lead to reduction in the hardware use. This paper uses Multiclass Relevance Vector Machine(MCRVM) and Multiclass Support vector machine(MCSVM) in order to mine the data which include the voltage, power generated , power angles , power demand in different lines of the power system. The Data mining would need a data transformation technique, which would reduce the dimensionality of the data introduced for mining. The combination of Data cleansing and the Principal Component Analysis would act as the data transformation technique in this paper. A Matlab based simulation is carried using the IEEE 30 bus system for the contingency analysis by incorporating the loading risk assessment strategy using the Multiclass SVM and RVM and the results are compared and the outputs are tabulated. Active power performance index and the reactive power performance index are used in contingency analysis of the IEEE 30 bus system thus used and the accuracy of classification and the speed of classification with the different methods and the contingency rankings are found and displayed.


2020 ◽  
Vol 10 (2) ◽  
pp. 1-9
Author(s):  
Michael Bobias Cahapay

A curriculum does not exist in a void; internal members play a key role in responding to the different forces that continually shape it. One of the approaches to evaluation is through internal evaluation from the perspective of the inside members who work with the curriculum. However, the internal evaluation may pose restricted evaluation due to the innate subjective human judgment. Considering these contexts, this paper performed a pilot internal evaluation of a selected aspect of a higher education curriculum using a triangulation mixed method design called the data transformation model. Based on the results, the evaluation using the data transformation model probed important points of agreement and discrepancy in the data sets. The implications for evaluation theory and curriculum practice are discussed. It is suggested that an extension of the current formative internal evaluation continuing the tradition of data transformative model but progressively focusing on larger aspects of the curriculum should be further conducted.


Sign in / Sign up

Export Citation Format

Share Document