scholarly journals The N-Grams Based Text Similarity Detection Approach Using Self-Organizing Maps and Similarity Measures

2019 ◽  
Vol 9 (9) ◽  
pp. 1870 ◽  
Author(s):  
Pavel Stefanovič ◽  
Olga Kurasova ◽  
Rokas Štrimaitis

In the paper the word-level n-grams based approach is proposed to find similarity between texts. The approach is a combination of two separate and independent techniques: self-organizing map (SOM) and text similarity measures. SOM’s uniqueness is that the obtained results of data clustering, as well as dimensionality reduction, are presented in a visual form. The four measures have been evaluated: cosine, dice, extended Jaccard’s, and overlap. First of all, texts have to be converted to numerical expression. For that purpose, the text has been split into the word-level n-grams and after that, the bag of n-grams has been created. The n-grams’ frequencies are calculated and the frequency matrix of dataset is formed. Various filters are used to create a bag of n-grams: stemming algorithms, number and punctuation removers, stop words, etc. All experimental investigation has been made using a corpus of plagiarized short answers dataset.

2009 ◽  
Vol 50 ◽  
pp. 334-339
Author(s):  
Pavel Stefanovič ◽  
Olga Kurasova

Straipsnyje nagrinėjamos ir lyginamos tarpusavyje trys saviorganizuojančių neuroninių tinklų (SOM) sistemos: NeNet, SOM-Toolbox ir Databionic ESOM. Pagrindinis šių sistemų tikslas yra suskirstyti duomenis į klasterius pagal jų panašumą, pateikti juos SOM žemėlapyje. Sistemos viena nuo kitos skiriasi duomenų pateikimu, mokymo taisyklėmis, vizualizavimo galimybėmis, todėl čia aptariami sistemų panašumai ir skirtumai. SOM žemėlapiams mokyti ir vizualizuoti naudojami irisų ir stikloduomenys.Comparative Analysis of Self-Organizing Map SystemsPavel Stefanovič, Olga Kurasova SummaryIn the article, we compare three systems of self-organizing maps: NeNet, SOM-Toolbox and Databionic ESOM. The main target of the usage of the systems is data clustering and their graphical presentation on the self-organizing map (SOM). The self-organizing maps are one of types of artifi cial neural networks. The SOM systems are different one from other in their interfaces, the data pre-processing, learning rules, visualization manners, etc. Similarities and differences of the systems have been highlighted here. The experiments have been carried out with two data sets: iris and glass. Quantization and topographic errors of SOMs have been estimated, too.an>


2017 ◽  
Vol 25 (6) ◽  
pp. 1020-1033 ◽  
Author(s):  
Leandro Antonio Pasa ◽  
José Alfredo F. Costa ◽  
Marcial Guerra de Medeiros

Abstract Data Clustering aims to discover groups within the data based on similarities, with a minimal, if any, knowledge of their structure. Variations in the results may occur due to many factors, including algorithm parameters, initialization and stopping criteria. The usage of different attributes or even different subsets of data usually lead to different results. Self-organizing maps (SOM) has been widely used for a variety of tasks regarding data analysis, including data visualization and clustering. A machine committee, or ensemble, is a set of neural networks working independently with some system that enable the combination of individual results into a single output, with the aim to achieve a better generalization compared to a unique neural network. This article presents a new ensemble method that uses SOM networks. Cluster validity indexes are used to combine neuron weights from different maps with different sizes. Results are shown from simulations with real and synthetic data, from the UCI Repository and Fundamental Clustering Problems Suite. The proposed method presented promising results, with increased performance compared with conventional single Kohonen map.


Medicina ◽  
2021 ◽  
Vol 57 (3) ◽  
pp. 235
Author(s):  
Diego Galvan ◽  
Luciane Effting ◽  
Hágata Cremasco ◽  
Carlos Adam Conte-Junior

Background and objective: In the current pandemic scenario, data mining tools are fundamental to evaluate the measures adopted to contain the spread of COVID-19. In this study, unsupervised neural networks of the Self-Organizing Maps (SOM) type were used to assess the spatial and temporal spread of COVID-19 in Brazil, according to the number of cases and deaths in regions, states, and cities. Materials and methods: The SOM applied in this context does not evaluate which measures applied have helped contain the spread of the disease, but these datasets represent the repercussions of the country’s measures, which were implemented to contain the virus’ spread. Results: This approach demonstrated that the spread of the disease in Brazil does not have a standard behavior, changing according to the region, state, or city. The analyses showed that cities and states in the north and northeast regions of the country were the most affected by the disease, with the highest number of cases and deaths registered per 100,000 inhabitants. Conclusions: The SOM clustering was able to spatially group cities, states, and regions according to their coronavirus cases, with similar behavior. Thus, it is possible to benefit from the use of similar strategies to deal with the virus’ spread in these cities, states, and regions.


2017 ◽  
Vol 2017 ◽  
pp. 1-11 ◽  
Author(s):  
Adeoluwa Akande ◽  
Ana Cristina Costa ◽  
Jorge Mateu ◽  
Roberto Henriques

The explosion of data in the information age has provided an opportunity to explore the possibility of characterizing the climate patterns using data mining techniques. Nigeria has a unique tropical climate with two precipitation regimes: low precipitation in the north leading to aridity and desertification and high precipitation in parts of the southwest and southeast leading to large scale flooding. In this research, four indices have been used to characterize the intensity, frequency, and amount of rainfall over Nigeria. A type of Artificial Neural Network called the self-organizing map has been used to reduce the multiplicity of dimensions and produce four unique zones characterizing extreme precipitation conditions in Nigeria. This approach allowed for the assessment of spatial and temporal patterns in extreme precipitation in the last three decades. Precipitation properties in each cluster are discussed. The cluster closest to the Atlantic has high values of precipitation intensity, frequency, and duration, whereas the cluster closest to the Sahara Desert has low values. A significant increasing trend has been observed in the frequency of rainy days at the center of the northern region of Nigeria.


2021 ◽  
Vol 11 (4) ◽  
pp. 1933
Author(s):  
Hiroomi Hikawa ◽  
Yuta Ichikawa ◽  
Hidetaka Ito ◽  
Yutaka Maeda

In this paper, a real-time dynamic hand gesture recognition system with gesture spotting function is proposed. In the proposed system, input video frames are converted to feature vectors, and they are used to form a posture sequence vector that represents the input gesture. Then, gesture identification and gesture spotting are carried out in the self-organizing map (SOM)-Hebb classifier. The gesture spotting function detects the end of the gesture by using the vector distance between the posture sequence vector and the winner neuron’s weight vector. The proposed gesture recognition method was tested by simulation and real-time gesture recognition experiment. Results revealed that the system could recognize nine types of gesture with an accuracy of 96.6%, and it successfully outputted the recognition result at the end of gesture using the spotting result.


Author(s):  
Macario O. Cordel ◽  
Arnulfo P. Azcarraga

Several time-critical problems relying on large amount of data, e.g., business trends, disaster response and disease outbreak, require cost-effective, timely and accurate data summary and visualization, in order to come up with an efficient and effective decision. Self-organizing map (SOM) is a very effective data clustering and visualization tool as it provides intuitive display of data in lower-dimensional space. However, with [Formula: see text] complexity, SOM becomes inappropriate for large datasets. In this paper, we propose a force-directed visualization method that emulates SOMs capability to display the data clusters with [Formula: see text] complexity. The main idea is to perform a force-directed fine-tuning of the 2D representation of data. To demonstrate the efficiency and the vast potential of the proposed method as a fast visualization tool, the methodology is used to do a 2D-projection of the MNIST handwritten digits dataset.


2019 ◽  
Vol 1 (1) ◽  
pp. 194-202
Author(s):  
Adrian Costea

Abstract This paper assesses the financial performance of Romania’s non-banking financial institutions (NFIs) using a neural network training algorithm proposed by Kohonen, namely the Self-Organizing Maps algorithm. The algorithm takes the financial dataset and positiones each observation into a self-organizing map (a two-dimensional map) which can be latter used to visualize the trajectories of an individual NFI and explain it based on different performance dimensions, such as capital adequacy, assets’ quality and profitability. Further, we use the map as an early-warning system that would accurately forecast the NFIs future performance (whether they would stay or be eliminated from the NFI’s Special Register three quarters into the future). The results are promising: the model is able to correctly predict NFIs’ performance movements. Finally, we compared the results of our SOM-based model with those obtained by applying a multivariate logit-based model. The SOM model performed worse in discriminating the NFIs’ performance: the performance classes were not clearly defined and the model lacked the interpretability of the results. In the contrary, the multivariate logit coefficients have nice interpretability and an individual default probability estimate is obtained for each new observation. However, we can benefit from the results of both techniques: the visualization capabilities of the SOM model and the interpretability of multivariate logit-based model.


2009 ◽  
Vol 18 (04) ◽  
pp. 603-611 ◽  
Author(s):  
CHIH-FONG TSAI ◽  
YUAH-CHIAO LIN ◽  
YI-TING WANG

Stock trading activities are always very popular in many countries. Generally, investors with various backgrounds have different preferences over the stocks they trade. In literature, a number of studies examine the institutions' holding preferences for certain stock characteristics when choosing the security portfolio. However, very few studies investigate the stock trading preferences of individual investors. In this paper, we focus on two factors which affect the portfolio choices of investors, which are stock characteristics and investor features. In particular, a self-organizing map (SOM) is used to group a certain number of clusters based on a chosen dataset. Then, the decision tree model is used to extract useful rules from the clusters which contain the most trading records in the sample. We find that if the investors are females, less wealthy, and make stock trades with lower frequencies, they will be more careful and conservative. On the other hand, if the investors are males, having a high level of wealth, and make stock trades very often, they tend to choose stocks with high EPS, high market-to-book, and high prices.


Sign in / Sign up

Export Citation Format

Share Document