A Novel Parallel Model for Self-Organizing Map and its Efficient Implementation on a Data-Driven Multiprocessor

Author(s):  
Ruck Thawonmas ◽  
◽  
Makoto Iwata ◽  
Satoshi Fukunaga ◽  
◽  
...  

The self-organizing map (SOM), with its related extensions, is one of the most widely used artificial neural algorithms in unsupervised learning and a wide variety of applications. Dealing with very large data sets, however, the training time on a single processor is too high to be acceptable for time-critical application domains. To cope with this problem, we present a scheme consisting of a novel parallel model and its implementation on a dynamic data-driven multiprocessor. The parallel model ensures that no load imbalance will occur, while the dynamic data-driven multiprocessor yields high scalability. We demonstrate the effectiveness of the scheme by comparing the parallel model with an existing parallel model, and the proposed implementation with an implementation on another multiprocessor.

2005 ◽  
Vol 15 (01n02) ◽  
pp. 101-110 ◽  
Author(s):  
TIMO SIMILÄ ◽  
SAMPSA LAINE

Practical data analysis often encounters data sets with both relevant and useless variables. Supervised variable selection is the task of selecting the relevant variables based on some predefined criterion. We propose a robust method for this task. The user manually selects a set of target variables and trains a Self-Organizing Map with these data. This sets a criterion to variable selection and is an illustrative description of the user's problem, even for multivariate target data. The user also defines another set of variables that are potentially related to the problem. Our method returns a subset of these variables, which best corresponds to the description provided by the Self-Organizing Map and, thus, agrees with the user's understanding about the problem. The method is conceptually simple and, based on experiments, allows an accessible approach to supervised variable selection.


2020 ◽  
Vol 45 (s1) ◽  
pp. 535-559
Author(s):  
Christian Pentzold ◽  
Lena Fölsche

AbstractOur article examines how journalistic reports and online comments have made sense of computational politics. It treats the discourse around data-driven campaigns as its object of analysis and codifies four main perspectives that have structured the debates about the use of large data sets and data analytics in elections. We study American, British, and German sources on the 2016 United States presidential election, the 2017 United Kingdom general election, and the 2017 German federal election. There, groups of speakers maneuvered between enthusiastic, skeptical, agnostic, or admonitory stances and so cannot be clearly mapped onto these four discursive positions. Coming along with the inconsistent accounts, public sensemaking was marked by an atmosphere of speculation about the substance and effects of computational politics. We conclude that this equivocality helped journalists and commentators to sideline prior reporting on the issue in order to repeatedly rediscover the practices they had already covered.


2011 ◽  
pp. 24-32 ◽  
Author(s):  
Nicoleta Rogovschi ◽  
Mustapha Lebbah ◽  
Younès Bennani

Most traditional clustering algorithms are limited to handle data sets that contain either continuous or categorical variables. However data sets with mixed types of variables are commonly used in data mining field. In this paper we introduce a weighted self-organizing map for clustering, analysis and visualization mixed data (continuous/binary). The learning of weights and prototypes is done in a simultaneous manner assuring an optimized data clustering. More variables has a high weight, more the clustering algorithm will take into account the informations transmitted by these variables. The learning of these topological maps is combined with a weighting process of different variables by computing weights which influence the quality of clustering. We illustrate the power of this method with data sets taken from a public data set repository: a handwritten digit data set, Zoo data set and other three mixed data sets. The results show a good quality of the topological ordering and homogenous clustering.


2021 ◽  
Vol 251 ◽  
pp. 02054
Author(s):  
Olga Sunneborn Gudnadottir ◽  
Daniel Gedon ◽  
Colin Desmarais ◽  
Karl Bengtsson Bernander ◽  
Raazesh Sainudiin ◽  
...  

In recent years, machine-learning methods have become increasingly important for the experiments at the Large Hadron Collider (LHC). They are utilised in everything from trigger systems to reconstruction and data analysis. The recent UCluster method is a general model providing unsupervised clustering of particle physics data, that can be easily modified to provide solutions for a variety of different decision problems. In the current paper, we improve on the UCluster method by adding the option of training the model in a scalable and distributed fashion, and thereby extending its utility to learn from arbitrarily large data sets. UCluster combines a graph-based neural network called ABCnet with a clustering step, using a combined loss function in the training phase. The original code is publicly available in TensorFlow v1.14 and has previously been trained on a single GPU. It shows a clustering accuracy of 81% when applied to the problem of multi-class classification of simulated jet events. Our implementation adds the distributed training functionality by utilising the Horovod distributed training framework, which necessitated a migration of the code to TensorFlow v2. Together with using parquet files for splitting data up between different compute nodes, the distributed training makes the model scalable to any amount of input data, something that will be essential for use with real LHC data sets. We find that the model is well suited for distributed training, with the training time decreasing in direct relation to the number of GPU’s used. However, further improvements by a more exhaustive and possibly distributed hyper-parameter search is required in order to achieve the reported accuracy of the original UCluster method.


2018 ◽  
Vol 2 (10) ◽  
pp. 735-742 ◽  
Author(s):  
Martin Gerlach ◽  
Beatrice Farb ◽  
William Revelle ◽  
Luís A. Nunes Amaral

2016 ◽  
pp. 203-214 ◽  
Author(s):  
Ahmad Al-Khasawneh

Breast cancer is the second leading cause of cancer deaths in women worldwide. Early diagnosis of this illness can increase the chances of long-term survival of cancerous patients. To help in this aid, computerized breast cancer diagnosis systems are being developed. Machine learning algorithms and data mining techniques play a central role in the diagnosis. This paper describes neural network based approaches to breast cancer diagnosis. The aim of this research is to investigate and compare the performance of supervised and unsupervised neural networks in diagnosing breast cancer. A multilayer perceptron has been implemented as a supervised neural network and a self-organizing map as an unsupervised one. Both models were simulated using a variety of parameters and tested using several combinations of those parameters in independent experiments. It was concluded that the multilayer perceptron neural network outperforms Kohonen's self-organizing maps in diagnosing breast cancer even with small data sets.


Engineering ◽  
2019 ◽  
Vol 5 (4) ◽  
pp. 730-735 ◽  
Author(s):  
Zhengtao Gan ◽  
Hengyang Li ◽  
Sarah J. Wolff ◽  
Jennifer L. Bennett ◽  
Gregory Hyatt ◽  
...  

2016 ◽  
Vol 78 (6-13) ◽  
Author(s):  
Azlin Ahmad ◽  
Rubiyah Yusof

The Kohonen Self-Organizing Map (KSOM) is one of the Neural Network unsupervised learning algorithms. This algorithm is used in solving problems in various areas, especially in clustering complex data sets. Despite its advantages, the KSOM algorithm has a few drawbacks; such as overlapped cluster and non-linear separable problems. Therefore, this paper proposes a modified KSOM that inspired from pheromone approach in Ant Colony Optimization. The modification is focusing on the distance calculation amongst objects. The proposed algorithm has been tested on four real categorical data that are obtained from UCI machine learning repository; Iris, Seeds, Glass and Wisconsin Breast Cancer Database. From the results, it shows that the modified KSOM has produced accurate clustering result and all clusters can clearly be identified.


Author(s):  
MUSTAPHA LEBBAH ◽  
YOUNÈS BENNANI ◽  
NICOLETA ROGOVSCHI

This paper introduces a probabilistic self-organizing map for topographic clustering, analysis and visualization of multivariate binary data or categorical data using binary coding. We propose a probabilistic formalism dedicated to binary data in which cells are represented by a Bernoulli distribution. Each cell is characterized by a prototype with the same binary coding as used in the data space and the probability of being different from this prototype. The learning algorithm, Bernoulli on self-organizing map, that we propose is an application of the EM standard algorithm. We illustrate the power of this method with six data sets taken from a public data set repository. The results show a good quality of the topological ordering and homogenous clustering.


Sign in / Sign up

Export Citation Format

Share Document