A Novel Parallel Model for Self-Organizing Map and its Efficient Implementation on a Data-Driven Multiprocessor

Ruck Thawonmas;  ; Makoto Iwata; Satoshi Fukunaga;  ;

doi:10.20965/jaciii.2003.p0355

A Novel Parallel Model for Self-Organizing Map and its Efficient Implementation on a Data-Driven Multiprocessor

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2003.p0355 ◽

2003 ◽

Vol 7 (3) ◽

pp. 355-361 ◽

Cited By ~ 1

Author(s):

Ruck Thawonmas ◽

◽

Makoto Iwata ◽

Satoshi Fukunaga ◽

◽

...

Keyword(s):

Large Data ◽

Data Driven ◽

Data Sets ◽

Self Organizing Map ◽

Parallel Model ◽

Dynamic Data ◽

Training Time ◽

Load Imbalance ◽

Time Critical ◽

Self Organizing

The self-organizing map (SOM), with its related extensions, is one of the most widely used artificial neural algorithms in unsupervised learning and a wide variety of applications. Dealing with very large data sets, however, the training time on a single processor is too high to be acceptable for time-critical application domains. To cope with this problem, we present a scheme consisting of a novel parallel model and its implementation on a dynamic data-driven multiprocessor. The parallel model ensures that no load imbalance will occur, while the dynamic data-driven multiprocessor yields high scalability. We demonstrate the effectiveness of the scheme by comparing the parallel model with an existing parallel model, and the proposed implementation with an implementation on another multiprocessor.

Download Full-text

An Online Cellular Probabilistic Self-Organizing Map for Static and Dynamic Data Sets

IEEE Transactions on Circuits and Systems I Fundamental Theory and Applications ◽

10.1109/tcsi.2004.826213 ◽

2004 ◽

Vol 51 (4) ◽

pp. 732-747 ◽

Cited By ~ 12

Author(s):

T.W.S. Chow ◽

S. Wu

Keyword(s):

Data Sets ◽

Self Organizing Map ◽

Dynamic Data ◽

Self Organizing

Download Full-text

VISUAL APPROACH TO SUPERVISED VARIABLE SELECTION BY SELF-ORGANIZING MAP

International Journal of Neural Systems ◽

10.1142/s0129065705000098 ◽

2005 ◽

Vol 15 (01n02) ◽

pp. 101-110 ◽

Cited By ~ 1

Author(s):

TIMO SIMILÄ ◽

SAMPSA LAINE

Keyword(s):

Variable Selection ◽

The Self ◽

Data Sets ◽

Self Organizing Map ◽

Robust Method ◽

Relevant Variables ◽

Visual Approach ◽

Predefined Criterion ◽

Target Data ◽

Self Organizing

Practical data analysis often encounters data sets with both relevant and useless variables. Supervised variable selection is the task of selecting the relevant variables based on some predefined criterion. We propose a robust method for this task. The user manually selects a set of target variables and trains a Self-Organizing Map with these data. This sets a criterion to variable selection and is an illustrative description of the user's problem, even for multivariate target data. The user also defines another set of variables that are potentially related to the problem. Our method returns a subset of these variables, which best corresponds to the description provided by the Self-Organizing Map and, thus, agrees with the user's understanding about the problem. The method is conceptually simple and, based on experiments, allows an accessible approach to supervised variable selection.

Download Full-text

Data-driven campaigns in public sensemaking: Discursive positions, contextualization, and maneuvers in American, British, and German debates around computational politics

Communications ◽

10.1515/commun-2019-0125 ◽

2020 ◽

Vol 45 (s1) ◽

pp. 535-559

Author(s):

Christian Pentzold ◽

Lena Fölsche

Keyword(s):

United States ◽

United Kingdom ◽

Presidential Election ◽

Data Analytics ◽

Large Data ◽

Large Data Sets ◽

Data Driven ◽

Data Sets ◽

Federal Election ◽

Online Comments

AbstractOur article examines how journalistic reports and online comments have made sense of computational politics. It treats the discourse around data-driven campaigns as its object of analysis and codifies four main perspectives that have structured the debates about the use of large data sets and data analytics in elections. We study American, British, and German sources on the 2016 United States presidential election, the 2017 United Kingdom general election, and the 2017 German federal election. There, groups of speakers maneuvered between enthusiastic, skeptical, agnostic, or admonitory stances and so cannot be clearly mapped onto these four discursive positions. Coming along with the inconsistent accounts, public sensemaking was marked by an atmosphere of speculation about the substance and effects of computational politics. We conclude that this equivocality helped journalists and commentators to sideline prior reporting on the issue in order to repeatedly rediscover the practices they had already covered.

Download Full-text

A SELF-ORGANIZING MAP FOR MIXED CONTINUOUS AND CATEGORICAL DATA

International Journal of Computing ◽

10.47839/ijc.10.1.733 ◽

2011 ◽

pp. 24-32 ◽

Cited By ~ 1

Author(s):

Nicoleta Rogovschi ◽

Mustapha Lebbah ◽

Younès Bennani

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Mixed Data ◽

Categorical Variables ◽

Data Sets ◽

Self Organizing Map ◽

Data Set ◽

Public Data ◽

Self Organizing

Most traditional clustering algorithms are limited to handle data sets that contain either continuous or categorical variables. However data sets with mixed types of variables are commonly used in data mining field. In this paper we introduce a weighted self-organizing map for clustering, analysis and visualization mixed data (continuous/binary). The learning of weights and prototypes is done in a simultaneous manner assuring an optimized data clustering. More variables has a high weight, more the clustering algorithm will take into account the informations transmitted by these variables. The learning of these topological maps is combined with a weighting process of different variables by computing weights which influence the quality of clustering. We illustrate the power of this method with data sets taken from a public data set repository: a handwritten digit data set, Zoo data set and other three mixed data sets. The results show a good quality of the topological ordering and homogenous clustering.

Download Full-text

Distributed training and scalability for the particle clustering method UCluster

EPJ Web of Conferences ◽

10.1051/epjconf/202125102054 ◽

2021 ◽

Vol 251 ◽

pp. 02054

Author(s):

Olga Sunneborn Gudnadottir ◽

Daniel Gedon ◽

Colin Desmarais ◽

Karl Bengtsson Bernander ◽

Raazesh Sainudiin ◽

...

Keyword(s):

Particle Physics ◽

Hadron Collider ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Training Time ◽

Distributed Training ◽

Machine Learning Methods ◽

Multi Class Classification

In recent years, machine-learning methods have become increasingly important for the experiments at the Large Hadron Collider (LHC). They are utilised in everything from trigger systems to reconstruction and data analysis. The recent UCluster method is a general model providing unsupervised clustering of particle physics data, that can be easily modified to provide solutions for a variety of different decision problems. In the current paper, we improve on the UCluster method by adding the option of training the model in a scalable and distributed fashion, and thereby extending its utility to learn from arbitrarily large data sets. UCluster combines a graph-based neural network called ABCnet with a clustering step, using a combined loss function in the training phase. The original code is publicly available in TensorFlow v1.14 and has previously been trained on a single GPU. It shows a clustering accuracy of 81% when applied to the problem of multi-class classification of simulated jet events. Our implementation adds the distributed training functionality by utilising the Horovod distributed training framework, which necessitated a migration of the code to TensorFlow v2. Together with using parquet files for splitting data up between different compute nodes, the distributed training makes the model scalable to any amount of input data, something that will be essential for use with real LHC data sets. We find that the model is well suited for distributed training, with the training time decreasing in direct relation to the number of GPU’s used. However, further improvements by a more exhaustive and possibly distributed hyper-parameter search is required in order to achieve the reported accuracy of the original UCluster method.

Download Full-text

A robust data-driven approach identifies four personality types across four large data sets

Nature Human Behaviour ◽

10.1038/s41562-018-0419-z ◽

2018 ◽

Vol 2 (10) ◽

pp. 735-742 ◽

Cited By ~ 41

Author(s):

Martin Gerlach ◽

Beatrice Farb ◽

William Revelle ◽

Luís A. Nunes Amaral

Keyword(s):

Large Data ◽

Personality Types ◽

Large Data Sets ◽

Data Driven ◽

Data Sets ◽

Data Driven Approach

Download Full-text

Diagnosis of Breast Cancer Using Intelligent Information Systems Techniques

Nature-Inspired Computing ◽

10.4018/978-1-5225-0788-8.ch010 ◽

2016 ◽

pp. 203-214 ◽

Cited By ~ 1

Author(s):

Ahmad Al-Khasawneh

Keyword(s):

Breast Cancer ◽

Neural Network ◽

Cancer Diagnosis ◽

Multilayer Perceptron ◽

Breast Cancer Diagnosis ◽

Machine Learning Algorithms ◽

Data Sets ◽

Self Organizing Map ◽

Intelligent Information ◽

Self Organizing

Breast cancer is the second leading cause of cancer deaths in women worldwide. Early diagnosis of this illness can increase the chances of long-term survival of cancerous patients. To help in this aid, computerized breast cancer diagnosis systems are being developed. Machine learning algorithms and data mining techniques play a central role in the diagnosis. This paper describes neural network based approaches to breast cancer diagnosis. The aim of this research is to investigate and compare the performance of supervised and unsupervised neural networks in diagnosing breast cancer. A multilayer perceptron has been implemented as a supervised neural network and a self-organizing map as an unsupervised one. Both models were simulated using a variety of parameters and tested using several combinations of those parameters in independent experiments. It was concluded that the multilayer perceptron neural network outperforms Kohonen's self-organizing maps in diagnosing breast cancer even with small data sets.

Download Full-text

Data-Driven Microstructure and Microhardness Design in Additive Manufacturing Using a Self-Organizing Map

Engineering ◽

10.1016/j.eng.2019.03.014 ◽

2019 ◽

Vol 5 (4) ◽

pp. 730-735 ◽

Cited By ~ 7

Author(s):

Zhengtao Gan ◽

Hengyang Li ◽

Sarah J. Wolff ◽

Jennifer L. Bennett ◽

Gregory Hyatt ◽

...

Keyword(s):

Additive Manufacturing ◽

Data Driven ◽

Self Organizing Map ◽

Self Organizing

Download Full-text

A MODIFIED KOHONEN SELF-ORGANIZING MAP (KSOM) CLUSTERING FOR FOUR CATEGORICAL DATA

Jurnal Teknologi ◽

10.11113/jt.v78.9275 ◽

2016 ◽

Vol 78 (6-13) ◽

Author(s):

Azlin Ahmad ◽

Rubiyah Yusof

Keyword(s):

Breast Cancer ◽

Categorical Data ◽

Data Sets ◽

Complex Data ◽

Self Organizing Map ◽

Distance Calculation ◽

The Neural Network ◽

Complex Data Sets ◽

Separable Problems ◽

Self Organizing

The Kohonen Self-Organizing Map (KSOM) is one of the Neural Network unsupervised learning algorithms. This algorithm is used in solving problems in various areas, especially in clustering complex data sets. Despite its advantages, the KSOM algorithm has a few drawbacks; such as overlapped cluster and non-linear separable problems. Therefore, this paper proposes a modified KSOM that inspired from pheromone approach in Ant Colony Optimization. The modification is focusing on the distance calculation amongst objects. The proposed algorithm has been tested on four real categorical data that are obtained from UCI machine learning repository; Iris, Seeds, Glass and Wisconsin Breast Cancer Database. From the results, it shows that the modified KSOM has produced accurate clustering result and all clusters can clearly be identified.

Download Full-text

A PROBABILISTIC SELF-ORGANIZING MAP FOR BINARY DATA TOPOGRAPHIC CLUSTERING

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026808002351 ◽

2008 ◽

Vol 07 (04) ◽

pp. 363-383 ◽

Cited By ~ 10

Author(s):

MUSTAPHA LEBBAH ◽

YOUNÈS BENNANI ◽

NICOLETA ROGOVSCHI

Keyword(s):

Binary Data ◽

Learning Algorithm ◽

Data Sets ◽

Self Organizing Map ◽

Data Set ◽

Binary Coding ◽

Public Data ◽

Multivariate Binary Data ◽

Self Organizing

This paper introduces a probabilistic self-organizing map for topographic clustering, analysis and visualization of multivariate binary data or categorical data using binary coding. We propose a probabilistic formalism dedicated to binary data in which cells are represented by a Bernoulli distribution. Each cell is characterized by a prototype with the same binary coding as used in the data space and the probability of being different from this prototype. The learning algorithm, Bernoulli on self-organizing map, that we propose is an application of the EM standard algorithm. We illustrate the power of this method with six data sets taken from a public data set repository. The results show a good quality of the topological ordering and homogenous clustering.

Download Full-text