A common architecture for different text processing techniques in an information retrieval environment

Author(s):  
G. Thurmair
2021 ◽  
Vol 2021 (2) ◽  
pp. 19-23
Author(s):  
Anastasiya Ivanova ◽  
Aleksandr Kuz'menko ◽  
Rodion Filippov ◽  
Lyudmila Filippova ◽  
Anna Sazonova ◽  
...  

The task of producing a chatbot based on a neural network supposes machine processing of the text, which in turn involves using various methods and techniques for analyzing phrases and sentences. The article considers the most popular solutions and models for data analysis in the text format: methods of lemmatization, vectorization, as well as machine learning methods. Particular attention is paid to the text processing techniques, after their analyzing the best method was identified and tested.


Author(s):  
Francisco M. Couto ◽  
Mário J. Silva ◽  
Vivian Lee ◽  
Emily Dimmer ◽  
Evelyn Camon ◽  
...  

Molecular Biology research projects produced vast amounts of data, part of which has been preserved in a variety of public databases. However, a large portion of the data contains a significant number of errors and therefore requires careful verification by curators, a painful and costly task, before being reliable enough to derive valid conclusions from it. On the other hand, research in biomedical information retrieval and information extraction are nowadays delivering Text Mining solutions that can support curators to improve the efficiency of their work to deliver better data resources. Over the past decades, automatic text processing systems have successfully exploited biomedical scientific literature to reduce the researchers’ efforts to keep up to date, but many of these systems still rely on domain knowledge that is integrated manually leading to unnecessary overheads and restrictions in its use. A more efficient approach would acquire the domain knowledge automatically from publicly available biological sources, such as BioOntologies, rather than using manually inserted domain knowledge. An example of this approach is GOAnnotator, a tool that assists the verification of uncurated protein annotations. It provided correct evidence text at 93% precision to the curators and thus achieved promising results. GOAnnotator was implemented as a web tool that is freely available at http://xldb.di.fc.ul.pt/rebil/tools/goa/.


2008 ◽  
Vol 5 (1) ◽  
pp. 17-36 ◽  
Author(s):  
Margaret R. Garnsey ◽  
Ingrid E. Fisher

ABSTRACT: Accounting language evolves as the transactions and organizations it provides guidance for change. We provide a preliminary analysis of terms used in official accounting pronouncements and annual corporate financial statements. Initial results show statistical natural language-processing techniques provide a means of identifying new terms as they enter the lexicon. These techniques should be valuable in deriving a complete accounting lexicon as well as in constructing and maintaining an accounting thesaurus to support information retrieval.


2020 ◽  
Author(s):  
Rianto Rianto ◽  
Achmad Benny Mutiara ◽  
Eri Prasetyo Wibowo ◽  
Paulus Insap Santosa

Abstract Stemming has long been used in data pre-processing in information retrieval, which aims to make affix words into root words. However, there are not many stemming methods for non-formal Indonesian text processing. The existing stemming method has high accuracy for formal Indonesian, but low for non-formal Indonesian. Thus, the stemming method which has high accuracy for non-formal Indonesian classifier model is still an open-ended challenge. This study introduces a new stemming method to solve problems in the non-formal Indonesian text data pre-processing. Furthermore, this study aims to provide comprehensive research on improving the accuracy of text classifier models by strengthening on stemming method. Using the Support Vector Machine algorithm, a text classifier model is developed, and its accuracy is checked. The experimental evaluation was done by testing 550 datasets in Indonesian using two different stemming methods. The results show that using the proposed stemming method, the text classifier model has higher accuracy than the existing methods with a score of 0.85 and 0.73, respectively. In the future, the proposed stemming method can be used to develop the Indonesian text classifier model which can be used for various purposes including text clustering, summarization, detecting hate speech, and other text processing applications.


Author(s):  
Utkarsh Malik ◽  
◽  
Harpreet Kaur ◽  
Aditi Chaudhary ◽  
◽  
...  

We can’t disregard the importance of Social Media in Today’s Technology Era. Internet is almost in every hand. People uses various Social Media platforms to express themselves and their thinking about various topics such as Politics, Entertainment, Sports, etc. In the Data Science industry, trend analysis can be used for several purposes like marketing or product analysis. Twitter data has been used to analyze political polarization and the spread of protest movements. Twitter is one of the most popular social media platform that allows the users to spread and share information. Twitter publishes the list of recent or latest topics named as “Trending Topics” which shows all the happenings in the world and what are the people’s opinions about those topics. This Trend Analyzer will work on a given set of tweets and generates a graph based on the tweets and showsthe comparative popularity of the used hashtags. This Analyzer will examine a set of tweets using Python and text-processing techniques


Sign in / Sign up

Export Citation Format

Share Document