A Hybrid SOM-Based Document Organization System

Author(s):  
Renato Correa ◽  
Teresa Ludermir
2017 ◽  
Vol 21 (6) ◽  
pp. 480-497 ◽  
Author(s):  
Anna Potocki ◽  
Christine Ros ◽  
Nicolas Vibert ◽  
Jean-François Rouet

Author(s):  
Thomas Seifried ◽  
Matthew Jervis ◽  
Michael Haller ◽  
Masood Masoodian ◽  
Nicolas Villar

Document organization is necessary for better utilization of documents. The major problem of organization online documents is so complex because documents should be grouped into its appropriate group during its appearance on the web. Classification is one of the best solutions to organize the documents. Naive Bayes categorization is playing a vital role in document organization. It is one of the simplest probabilistic Bayesian categorization and assumption that the effect of an attribute value on a given category is independent of the values. The document classification is the essential task of organization and necessary for efficient control of textual fact systems. The files may be classified as unconfirmed, supervised and semi supervised methods. In this paper, to review and study of various types of document organization approach using naive Bayesian classification and other related existing document organization methods.


2021 ◽  
Vol 20 (1) ◽  
pp. 168
Author(s):  
Paulo Daniel Marcos dos Santos ◽  
Daiana da Conceição Alves de Magalhães ◽  
Nelma Camêlo de Araujo

O tesauro é um método de esquema de listagem que funciona como instrumento de organização documental onde as palavras apresentam relação semântica dentro de um assunto/tema específico, e essa relação é estabelecida hierarquicamente por meio de descritores que estabelecem padrão e maior especificidade do tema trabalhado, a partir do tema escolhido “Crustáceos utilizados na culinária alagoana”. Durante a produção do presente artigo, tornou-se perceptível que a literatura acerca do tema frutos do mar no litoral alagoano não se encontra disposta de modo organizado, logo, foi constatado que o tema seria conveniente para elaboração deste tesauro experimental. Com base nas instruções do sistema Tesauro foi possível analisar, recuperar e indexar a informação, tornando o resgate desse tema disponível como um registro documental padronizado contribuindo para outros pesquisadores/estudiosos que demonstrem interesse na temática, seja pelo tema crustáceos ou pela estruturação.ABSTRACTThe thesaurus is a method of listing scheme that works as an instrument of document organization where words have a semantic relationship within a specific subject/theme, and this relationship is established hierarchically through descriptors that establish a pattern and greater specificity of the theme being worked on, from the chosen theme “Crustaceans used in Alagoas cuisine”. During the production of this article, it became noticeable that the literature on the theme of seafood on the coast of Alagoas is not arranged in an organized way, so it was found that the topic would be convenient for the elaboration of this experimental thesaurus. Based on the instructions of the Thesaurus system, it was possible to analyse, retrieve and index the information, making the rescue of this theme available as a standardized documental record, contributing to other researchers/scholars who show interest in the theme, whether by the theme of crustaceans or by structuring. 


Author(s):  
Nguyen Chi Thanh ◽  
◽  
Koichi Yamada ◽  
Muneyuki Unehara

Document clustering is a textmining technique for unsupervised document organization. It helps the users browse and navigate large sets of documents. Ho et al. proposed a Tolerance Rough Set Model (TRSM) [1] for improving the vector space model that represents documents by vectors of terms and applied it to document clustering. In this paper we analyze their model to propose a new model for efficient clustering of documents. We introduce Similarity Rough Set Model (SRSM) as another model for presenting documents in document clustering. The model is evaluated by experiments on test collections. The experiment results show that the SRSM document clusteringmethod outperforms the one with TRSM and the results of SRSM are less affected by the value of parameter than TRSM.


Sign in / Sign up

Export Citation Format

Share Document