scholarly journals WikiPathways: connecting communities

2020 ◽  
Vol 49 (D1) ◽  
pp. D613-D621 ◽  
Author(s):  
Marvin Martens ◽  
Ammar Ammar ◽  
Anders Riutta ◽  
Andra Waagmeester ◽  
Denise N Slenter ◽  
...  

Abstract WikiPathways (https://www.wikipathways.org) is a biological pathway database known for its collaborative nature and open science approaches. With the core idea of the scientific community developing and curating biological knowledge in pathway models, WikiPathways lowers all barriers for accessing and using its content. Increasingly more content creators, initiatives, projects and tools have started using WikiPathways. Central in this growth and increased use of WikiPathways are the various communities that focus on particular subsets of molecular pathways such as for rare diseases and lipid metabolism. Knowledge from published pathway figures helps prioritize pathway development, using optical character and named entity recognition. We show the growth of WikiPathways over the last three years, highlight the new communities and collaborations of pathway authors and curators, and describe various technologies to connect to external resources and initiatives. The road toward a sustainable, community-driven pathway database goes through integration with other resources such as Wikidata and allowing more use, curation and redistribution of WikiPathways content.

2021 ◽  
pp. 1-10
Author(s):  
Zhucong Li ◽  
Zhen Gan ◽  
Baoli Zhang ◽  
Yubo Chen ◽  
Jing Wan ◽  
...  

Abstract This paper describes our approach for the Chinese Medical named entity recognition(MER) task organized by the 2020 China conference on knowledge graph and semantic computing(CCKS) competition. In this task, we need to identify the entity boundary and category labels of six entities from Chinese electronic medical record(EMR). We construct a hybrid system composed of a semi-supervised noisy label learning model based on adversarial training and a rule postprocessing module. The core idea of the hybrid system is to reduce the impact of data noise by optimizing the model results. Besides, we use post-processing rules to correct three cases of redundant labeling, missing labeling, and wrong labeling in the model prediction results. Our method proposed in this paper achieved strict criteria of 0.9156 and relax criteria of 0.9660 on the final test set, ranking first.


2021 ◽  
pp. 1-13
Author(s):  
Xia Li ◽  
Qinghua Wen ◽  
Zengtao Jiao ◽  
Jiangtao Zhang

Abstract The China Conference on Knowledge Graph and Semantic Computing (CCKS) 2020 Evaluation Task 3 presented clinical named entity recognition and event extraction for the Chinese electronic medical records. Two annotated data sets and some other additional resources for these two subtasks were provided for participators. This evaluation competition attracted 354 teams and 46 of them successfully submitted the valid results. The pre-trained language models are widely applied in this evaluation task. Data argumentation and external resources are also helpful.


2020 ◽  
Vol 2019 ◽  
Author(s):  
Andrea Bertino ◽  
Luca Foppiano ◽  
Laurent Romary ◽  
Pierre Mounier

This paper addresses the integration of a Named Entity Recognition and Disambiguation (NERD) service within a group of open access (OA) publishing digital platforms and considers its potential impact on both research and scholarly publishing. The software powering this service, called entity-fishing, was initially developed by Inria in the context of the EU FP7 project CENDARI and provides automatic entity recognition and disambiguation using the Wikipedia and Wikidata data sets. The application is distributed with an open-source licence, and it has been deployed as a web service in DARIAH's infrastructure hosted by the French HumaNum. In the paper, we focus on the specific issues related to its integration on five OA platforms specialized in the publication of scholarly monographs in the social sciences and humanities (SSH), as part of the work carried out within the EU H2020 project HIRMEOS (High Integration of Research Monographs in the European Open Science infrastructure). In the first section, we give a brief overview of the current status and evolution of OA publications, considering specifically the challenges that OA monographs are encountering. In the second part, we show how the HIRMEOS project aims to face these challenges by optimizing five OA digital platforms for the publication of monographs from the SSH and ensuring their interoperability. In sections three and four we give a comprehensive description of the entity-fishing service, focusing on its concrete applications in real use cases together with some further possible ideas on how to exploit the annotations generated. We show that entity-fishing annotations can improve both research and publishing process. In the last chapter, we briefly present further possible application scenarios that could be made available through infrastructural projects.


2021 ◽  
Vol 22 (S11) ◽  
Author(s):  
Kwangmin Kim ◽  
Doheon Lee

Abstract Background Concept recognition is a term that corresponds to the two sequential steps of named entity recognition and named entity normalization, and plays an essential role in the field of bioinformatics. However, the conventional dictionary-based methods did not sufficiently addressed the variation of the concepts in actual use in literature, resulting in the particularly degraded performances in recognition of multi-token concepts. Results In this paper, we propose a concept recognition method of multi-token biological entities using neural models combined with literature contexts. The key aspect of our method is utilizing the contextual information from the biological knowledge-bases for concept normalization, which is followed by named entity recognition procedure. The model showed improved performances over conventional methods, particularly for multi-token concepts with higher variations. Conclusions We expect that our model can be utilized for effective concept recognition and variety of natural language processing tasks on bioinformatics.


2017 ◽  
Author(s):  
Bennett Kleinberg ◽  
Maximilian Mozes ◽  
Yaloe van der Toolen ◽  
Bruno Verschuere

Background: The shift towards open science, implies that researchers should share their data. Often there is a dilemma between publicly sharing data and protecting their subjects' confidentiality. Moreover, the case of unstructured text data (e.g. stories) poses an additional dilemma: anonymizing texts without deteriorating their content for secondary research. Existing text anonymization systems either deteriorate the content of the original or have not been tested empirically. We propose and empirically evaluate NETANOS: named entity-based text anonymization for open science. NETANOS is an open-source context-preserving anonymization system that identifies and modifies named entities (e.g. persons, locations, times, dates). The aim is to assist researchers in sharing their raw text data.Method & Results: NETANOS anonymizes critical, contextual information through a stepwise named entity recognition (NER) implementation: it identifies contextual information (e.g. "Munich") and then replaces them with a context-preserving category label (e.g. "Location_1"). We assessed how good participants were in re-identifying several travel stories (e.g. locations, names) that were presented in the original (“Max”), human anonymized (“Max” → “Person1”), NETANOS (”Max” → “Person1”), and in a context-deteriorating state (“Max” → “XXX”). Bayesian testing revealed that the NETANOS anonymization was practically equivalent to the human baseline anonymization.Conclusions: Named entity recognition can be applied to the anonymization of critical, identifiable information in text data. The proposed stepwise anonymization procedure provides a fully automated, fast system for text anonymization. NETANOS might be an important step to address researchers' dilemmas when sharing text data within the open science movement.


2020 ◽  
Author(s):  
Shintaro Tsuji ◽  
Andrew Wen ◽  
Naoki Takahashi ◽  
Hongjian Zhang ◽  
Katsuhiko Ogasawara ◽  
...  

BACKGROUND Named entity recognition (NER) plays an important role in extracting the features of descriptions for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities depends on its dictionary lookup. Especially, the recognition of compound terms is very complicated because there are a variety of patterns. OBJECTIVE The objective of the study is to develop and evaluate a NER tool concerned with compound terms using the RadLex for mining free-text radiology reports. METHODS We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general-purpose dictionary, GPD). We manually annotated 400 of radiology reports for compound terms (Cts) in noun phrases and used them as the gold standard for the performance evaluation (precision, recall, and F-measure). Additionally, we also created a compound-term-enhanced dictionary (CtED) by analyzing false negatives (FNs) and false positives (FPs), and applied it for another 100 radiology reports for validation. We also evaluated the stem terms of compound terms, through defining two measures: an occurrence ratio (OR) and a matching ratio (MR). RESULTS The F-measure of the cTAKES+RadLex+GPD was 32.2% (Precision 92.1%, Recall 19.6%) and that of combined the CtED was 67.1% (Precision 98.1%, Recall 51.0%). The OR indicated that stem terms of “effusion”, "node", "tube", and "disease" were used frequently, but it still lacks capturing Cts. The MR showed that 71.9% of stem terms matched with that of ontologies and RadLex improved about 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using ontologies. CONCLUSIONS We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance toward expanding vocabularies.


Sign in / Sign up

Export Citation Format

Share Document