Author(s):  
A.A. Malykh ◽  
◽  
A.V. Mantsivoda ◽  
Keyword(s):  

2018 ◽  
Vol 11 (4) ◽  
pp. 77 ◽  
Author(s):  
Malek Mouhoub ◽  
Mustakim Al Helal

Topic modeling is a powerful technique for unsupervised analysis of large document collections. Topic models have a wide range of applications including tag recommendation, text categorization, keyword extraction and similarity search in the text mining, information retrieval and statistical language modeling. The research on topic modeling is gaining popularity day by day. There are various efficient topic modeling techniques available for the English language as it is one of the most spoken languages in the whole world but not for the other spoken languages. Bangla being the seventh most spoken native language in the world by population, it needs automation in different aspects. This paper deals with finding the core topics of Bangla news corpus and classifying news with similarity measures. The document models are built using LDA (Latent Dirichlet Allocation) with bigram.


Author(s):  
A. V. Mantsivoda ◽  
◽  
D. K. Ponomaryov ◽  
◽  
◽  
...  

1992 ◽  
Vol 4 (4) ◽  
pp. 355-362 ◽  
Author(s):  
Trevor J.M. Bench-Capon ◽  
Frans Coenen ◽  
Paul E.S. Dunne

2009 ◽  
Vol 5 (4) ◽  
pp. 44-57 ◽  
Author(s):  
Min Song ◽  
Xiaohua Hu ◽  
Illhoi Yoo ◽  
Eric Koppel

As an unsupervised learning process, document clustering has been used to improve information retrieval performance by grouping similar documents and to help text mining approaches by providing a high-quality input for them. In this article, the authors propose a novel hybrid clustering technique that incorporates semantic smoothing of document models into a neural network framework. Recently, it has been reported that the semantic smoothing model enhances the retrieval quality in Information Retrieval (IR). Inspired by that, the authors developed and applied a context-sensitive semantic smoothing model to boost accuracy of clustering that is generated by a dynamic growing cell structure algorithm, a variation of the neural network technique. They evaluated the proposed technique on biomedical article sets from MEDLINE, the largest biomedical digital library in the world. Their experimental evaluations show that the proposed algorithm significantly improves the clustering quality over the traditional clustering techniques including k-means and self-organizing map (SOM).


Author(s):  
Steven DeRose

Models for XML documents often focus on text documents, but XML is used for many other kinds of data as well: databases, math, music, vector graphic images, and more. This paper examines how basic document models in the “text” world, do and do not fit a quite different kind of data: vector graphic images, and in particular their very common application for many kinds of diagrams.


2020 ◽  
Vol 4 (4) ◽  
pp. 12-22
Author(s):  
Olga V. Marchenko

The subject. The article reveals the main historical trends and legal problems concerning unification of documents used by Russian authorities during different historical periods. The purpose of the article is to identify the prerequisites for the origin of document unification, as well as to characterize the periods of development and main directions of document flow standardization in pre-revolutionary Russia. The methodology includes historical-legal method, formal-legal method, systematic approach, chronological method, analysis, synthesis. The main results of research. Scientific understanding of the historical and legal aspects of document flow standardization is closely related to the main stages of its development, and therefore the problem of periodization of document flow standardization in Russia for the purpose of systematization and scientific generalization of this field of knowledge comes to the fore. The chronological approach was chosen as the most appropriate criterion, which allows to trace the evolutionary development of document management standardization, link it with the general history of office work in Russia and state policy in this area. The research will help to determine ways to improve the current system of document management standardization in Russia. The research topic becomes especially relevant in connection with the activation of the processes of implementation of international standards, and the wide application of foreign practice in the field of documentation management over the past decade in Russia. Generalization and analysis of the historical experience of our country in this area makes it possible to identify the national specifics of document management and its standardization. It helps to determine the prospects for the implementation of international standards. Conclusions. The study of the history of documentation practice in Russia allows us to conclude that the issues of document flow rationalization were of great importance since the XVII century. Considerable experience was accumulated in the field of document unification in pre-revolutionary Russia. The beginnings of document unification arose at the dawn of the XVII century and developed gradually with the formation and complexity of the office system in Russia. At the first stage unification was manifested in the consolidation of spontaneously formed norms and rules for drawing up business papers, by the end of the XIX century it turned into an independent element in the field of document management. The gradual evolution of the form as well as the introduction of stamp paper led to the appearance of legally established forms of documents with permanent details in the XIX century, and the first unified documentation systems were created. The appearance of collections of business paper samples showed that government and Russian society understood the importance of using sustainable document models in order to streamline document flow.


Sign in / Sign up

Export Citation Format

Share Document