scholarly journals Tagger: BeCalm API for rapid named entity recognition

2017 ◽  
Author(s):  
Lars Juhl Jensen

AbstractMost BioCreative tasks to date have focused on assessing the quality of text-mining annotations in terms of precision of recall. Interoperability, speed, and stability are, however, other important factors to consider for practical applications of text mining. The new BioCreative/BeCalm TIPS task focuses purely on these. To participate in this task, I implemented a BeCalm API within the real-time tagging server also used by the Reflect and EXTRACT tools. In addition to retrieval of patent abstracts, PubMed abstracts, and Pub-Med Central open-access articles as required in the TIPS task, the BeCalm API implementation facilitates retrieval of documents from other sources specified as custom request parameters. As in earlier tests, the tagger proved to be both highly efficient and stable, being able to consistently process requests of 5000 abstracts in less than half a minute including retrieval of the document text.

2011 ◽  
Vol 46 (4) ◽  
pp. 543-563 ◽  
Author(s):  
Harith Al-Jumaily ◽  
Paloma Martínez ◽  
José L. Martínez-Fernández ◽  
Erik Van der Goot

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Nícia Rosário-Ferreira ◽  
Victor Guimarães ◽  
Vítor S. Costa ◽  
Irina S. Moreira

Abstract Background Blood cancers (BCs) are responsible for over 720 K yearly deaths worldwide. Their prevalence and mortality-rate uphold the relevance of research related to BCs. Despite the availability of different resources establishing Disease-Disease Associations (DDAs), the knowledge is scattered and not accessible in a straightforward way to the scientific community. Here, we propose SicknessMiner, a biomedical Text-Mining (TM) approach towards the centralization of DDAs. Our methodology encompasses Named Entity Recognition (NER) and Named Entity Normalization (NEN) steps, and the DDAs retrieved were compared to the DisGeNET resource for qualitative and quantitative comparison. Results We obtained the DDAs via co-mention using our SicknessMiner or gene- or variant-disease similarity on DisGeNET. SicknessMiner was able to retrieve around 92% of the DisGeNET results and nearly 15% of the SicknessMiner results were specific to our pipeline. Conclusions SicknessMiner is a valuable tool to extract disease-disease relationship from RAW input corpus.


2016 ◽  
Vol 12 (4) ◽  
pp. 21-44 ◽  
Author(s):  
R. Hema ◽  
T. V. Geetha

The two main challenges in chemical entity recognition are: (i) New chemical compounds are constantly being synthesized infinitely. (ii) High ambiguity in chemical representation in which a chemical entity is being described by different nomenclatures. Therefore, the identification and maintenance of chemical terminologies is a tough task. Since most of the existing text mining methods followed the term-based approaches, the problems of polysemy and synonymy came into the picture. So, a Named Entity Recognition (NER) system based on pattern matching in chemical domain is developed to extract the chemical entities from chemical documents. The Tf-idf and PMI association measures are used to filter out the non-chemical terms. The F-score of 92.19% is achieved for chemical NER. This proposed method is compared with the baseline method and other existing approaches. As the final step, the filtered chemical entities are classified into sixteen functional groups. The classification is done using SVM One against All multiclass classification approach and achieved the accuracy of 87%. One-way ANOVA is used to test the quality of pattern matching method with the other existing chemical NER methods.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 73729-73740 ◽  
Author(s):  
Donghyeon Kim ◽  
Jinhyuk Lee ◽  
Chan Ho So ◽  
Hwisang Jeon ◽  
Minbyul Jeong ◽  
...  

2021 ◽  
Author(s):  
Afia Fairoose Abedin ◽  
Amirul Islam Al Mamun ◽  
Rownak Jahan Nowrin ◽  
Amitabha Chakrabarty ◽  
Moin Mostakim ◽  
...  

In recent times, a large number of people have been involved in establishing their own businesses. Unlike humans, chatbots can serve multiple customers at a time, are available 24/7 and reply in less than a fraction of a second. Though chatbots perform well in task-oriented activities, in most cases they fail to understand personalized opinions, statements or even queries which later impact the organization for poor service management. Lack of understanding capabilities in bots disinterest humans to continue conversations with them. Usually, chatbots give absurd responses when they are unable to interpret a user’s text accurately. Extracting the client reviews from conversations by using chatbots, organizations can reduce the major gap of understanding between the users and the chatbot and improve their quality of products and services.Thus, in our research we incorporated all the key elements that are necessary for a chatbot to analyse andunderstand an input text precisely and accurately. We performed sentiment analysis, emotion detection, intent classification and named-entity recognition using deep learning to develop chatbots with humanistic understanding and intelligence. The efficiency of our approach can be demonstrated accordingly by the detailed analysis.


Author(s):  
Girish Keshav Palshikar

While building and using a fully semantic understanding of Web contents is a distant goal, named entities (NEs) provide a small, tractable set of elements carrying a well-defined semantics. Generic named entities are names of persons, locations, organizations, phone numbers, and dates, while domain-specific named entities includes names of for example, proteins, enzymes, organisms, genes, cells, et cetera, in the biological domain. An ability to automatically perform named entity recognition (NER) – i.e., identify occurrences of NE in Web contents – can have multiple benefits, such as improving the expressiveness of queries and also improving the quality of the search results. A number of factors make building highly accurate NER a challenging task. Given the importance of NER in semantic processing of text, this chapter presents a detailed survey of NER techniques for English text.


2013 ◽  
pp. 400-426 ◽  
Author(s):  
Girish Keshav Palshikar

While building and using a fully semantic understanding of Web contents is a distant goal, named entities (NEs) provide a small, tractable set of elements carrying a well-defined semantics. Generic named entities are names of persons, locations, organizations, phone numbers, and dates, while domain-specific named entities includes names of for example, proteins, enzymes, organisms, genes, cells, et cetera, in the biological domain. An ability to automatically perform named entity recognition (NER) – i.e., identify occurrences of NE in Web contents – can have multiple benefits, such as improving the expressiveness of queries and also improving the quality of the search results. A number of factors make building highly accurate NER a challenging task. Given the importance of NER in semantic processing of text, this chapter presents a detailed survey of NER techniques for English text.


2018 ◽  
Vol 4 (2) ◽  
pp. 81
Author(s):  
Fatra Nonggala Putra ◽  
Chastine Fatichah

Sistem deteksi kejadian dari data Twitter bertujuan untuk mendapatkan data secara real-time sebagai alternatif sistem deteksi kejadian yang murah. Penelitian tentang sistem deteksi kejadian telah dilakukan sebelumnya. Salah satu modul utama dari sistem deteksi kejadian adalah modul klasifikasi jenis kejadian. Informasi dapat diklasifikasikan sebagai kejadian penting jika memiliki entitas yang merepresentasikan di mana lokasi kejadian terjadi. Beberapa penelitian sebelumnya masih memanfaatkan fitur ‘buatan tangan’, maupun fitur model berbasis pipeline seperti n-gram sebagai penentuan fitur kunci klasifikasi yang tidak efektif dengan performa kurang optimal. Oleh karena itu, diusulkan penggabungan metode Neuro Named Entity Recognition (NeuroNER) dan klasifier Recurrent Convolutional Neural Network (RCNN) yang diharapkan dapat melakukan deteksi kejadian secara efektif dan optimal. Pertama, sistem melakukan pengenalan entitas bernama pada data tweet untuk mengenali entitas lokasi yang terdapat dalam teks tweet, karena informasi kejadian haruslah memiliki minimal satu entitas lokasi. Kedua, jika tweet terdeteksi memiliki entitas lokasi maka akan dilakukan proses klasifikasi kejadian menggunakan klasifier RCNN. Berdasarkan hasil uji coba, disimpulkan bahwa sistem deteksi kejadian menggunakan penggabungan NeuroNER dan RCNN bekerja dengan sangat baik dengan nilai rata-rata precision, recall, dan f-measure masing-masing 94,87%, 92,73%, dan 93,73%.    The incident detection system from Twitter data aims to obtain real-time information as an alternative low-cost incident detection system. One of the main modules in the incident detection system is the classification module. Information is classified as important incident if it has an entity that represents where the incident occurred. Some previous studies still use 'handmade' features as well as feature-based pipeline models such as n-grams as the key features for classification which are deemed as ineffective. Therefore, this research propose a combination of Neuro Named Entity Recognition (NeuroNER) and Recurrent Convolutional Neural Network (RCNN) as an effective classification method for incident detection. First, the system perform named entity recognition to identify the location contained in the tweet text because the event information should have at least one location entity. Then, if the location is successfully identified, the incident will be classified using RCNN. Experimental result shows that the incident detection system using combination  of NeuroNER and RCNN works very well with the average value of precision, recall, and f-measure 92.44%, 94.76%, and 93.53% respectively.


2020 ◽  
Vol 2019 ◽  
Author(s):  
Andrea Bertino ◽  
Luca Foppiano ◽  
Laurent Romary ◽  
Pierre Mounier

This paper addresses the integration of a Named Entity Recognition and Disambiguation (NERD) service within a group of open access (OA) publishing digital platforms and considers its potential impact on both research and scholarly publishing. The software powering this service, called entity-fishing, was initially developed by Inria in the context of the EU FP7 project CENDARI and provides automatic entity recognition and disambiguation using the Wikipedia and Wikidata data sets. The application is distributed with an open-source licence, and it has been deployed as a web service in DARIAH's infrastructure hosted by the French HumaNum. In the paper, we focus on the specific issues related to its integration on five OA platforms specialized in the publication of scholarly monographs in the social sciences and humanities (SSH), as part of the work carried out within the EU H2020 project HIRMEOS (High Integration of Research Monographs in the European Open Science infrastructure). In the first section, we give a brief overview of the current status and evolution of OA publications, considering specifically the challenges that OA monographs are encountering. In the second part, we show how the HIRMEOS project aims to face these challenges by optimizing five OA digital platforms for the publication of monographs from the SSH and ensuring their interoperability. In sections three and four we give a comprehensive description of the entity-fishing service, focusing on its concrete applications in real use cases together with some further possible ideas on how to exploit the annotations generated. We show that entity-fishing annotations can improve both research and publishing process. In the last chapter, we briefly present further possible application scenarios that could be made available through infrastructural projects.


2017 ◽  
Author(s):  
David Westergaard ◽  
Hans-Henrik Stærfeldt ◽  
Christian Tønsberg ◽  
Lars Juhl Jensen ◽  
Søren Brunak

AbstractAcross academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823–2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein–protein, disease–gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.


Sign in / Sign up

Export Citation Format

Share Document