Text Mining of Disease-lifestyle Associations to Explain Comorbidities in Electronic Health Registries

Mapping Intimacies ◽

10.1101/168211 ◽

2017 ◽

Author(s):

Lars Juhl Jensen

Keyword(s):

Text Mining ◽

Named Entity Recognition ◽

Lifestyle Factors ◽

Entity Recognition ◽

Named Entity ◽

Danish Health ◽

Underlying Causes ◽

Electronic Health ◽

Substance Consumption ◽

Health Registry

Mining of electronic health registries can reveal vast numbers of disease correlations (from hereon referred to as comorbidities for simplicity). However, the underlying causes can be hard to identify, in part because health registries usually do not record important lifestyle factors such as diet, substance consumption, and physical activity. To address this challenge, I developed a text-mining approach that uses dictionaries of diseases and lifestyle factors for named entity recognition and subsequently for co-occurrence extraction of disease–lifestyle associations from Medline. I show that this approach is able to extract many correct associations and provide proof-of-concept that these can provide plausible explanations for comorbidities observed in Swedish and Danish health registry data.

Download Full-text

De-identifying Spanish medical texts - named entity recognition applied to radiology reports

Journal of Biomedical Semantics ◽

10.1186/s13326-021-00236-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Irene Pérez-Díez ◽

Raúl Pérez-Moraga ◽

Adolfo López-Cerdán ◽

Jose-Maria Salinas-Serrano ◽

María de la Iglesia-Vayá

Keyword(s):

Electronic Health Records ◽

English Language ◽

Personal Information ◽

Named Entity Recognition ◽

Entity Recognition ◽

Medical Texts ◽

Health Records ◽

Named Entity ◽

Radiology Reports ◽

Electronic Health

Abstract Background Medical texts such as radiology reports or electronic health records are a powerful source of data for researchers. Anonymization methods must be developed to de-identify documents containing personal information from both patients and medical staff. Although currently there are several anonymization strategies for the English language, they are also language-dependent. Here, we introduce a named entity recognition strategy for Spanish medical texts, translatable to other languages. Results We tested 4 neural networks on our radiology reports dataset, achieving a recall of 97.18% of the identifying entities. Alongside, we developed a randomization algorithm to substitute the detected entities with new ones from the same category, making it virtually impossible to differentiate real data from synthetic data. The three best architectures were tested with the MEDDOCAN challenge dataset of electronic health records as an external test, achieving a recall of 69.18%. Conclusions The strategy proposed, combining named entity recognition tasks with randomization of entities, is suitable for Spanish radiology reports. It does not require a big training corpus, thus it could be easily extended to other languages and medical texts, such as electronic health records.

Download Full-text

SicknessMiner: a deep-learning-driven text-mining tool to abridge disease-disease associations

BMC Bioinformatics ◽

10.1186/s12859-021-04397-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Nícia Rosário-Ferreira ◽

Victor Guimarães ◽

Vítor S. Costa ◽

Irina S. Moreira

Keyword(s):

Text Mining ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Disease Similarity ◽

Disease Associations ◽

Named Entity Normalization ◽

Mining Tool ◽

Or Gene ◽

Text Mining Tool

Abstract Background Blood cancers (BCs) are responsible for over 720 K yearly deaths worldwide. Their prevalence and mortality-rate uphold the relevance of research related to BCs. Despite the availability of different resources establishing Disease-Disease Associations (DDAs), the knowledge is scattered and not accessible in a straightforward way to the scientific community. Here, we propose SicknessMiner, a biomedical Text-Mining (TM) approach towards the centralization of DDAs. Our methodology encompasses Named Entity Recognition (NER) and Named Entity Normalization (NEN) steps, and the DDAs retrieved were compared to the DisGeNET resource for qualitative and quantitative comparison. Results We obtained the DDAs via co-mention using our SicknessMiner or gene- or variant-disease similarity on DisGeNET. SicknessMiner was able to retrieve around 92% of the DisGeNET results and nearly 15% of the SicknessMiner results were specific to our pipeline. Conclusions SicknessMiner is a valuable tool to extract disease-disease relationship from RAW input corpus.

Download Full-text

A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining

IEEE Access ◽

10.1109/access.2019.2920708 ◽

2019 ◽

Vol 7 ◽

pp. 73729-73740 ◽

Cited By ~ 13

Author(s):

Donghyeon Kim ◽

Jinhyuk Lee ◽

Chan Ho So ◽

Hwisang Jeon ◽

Minbyul Jeong ◽

...

Keyword(s):

Text Mining ◽

Named Entity Recognition ◽

Entity Recognition ◽

Biomedical Text ◽

Biomedical Text Mining ◽

Named Entity

Download Full-text

Tagger: BeCalm API for rapid named entity recognition

10.1101/115022 ◽

2017 ◽

Cited By ~ 2

Author(s):

Lars Juhl Jensen

Keyword(s):

Open Access ◽

Text Mining ◽

Real Time ◽

Named Entity Recognition ◽

Entity Recognition ◽

The Real ◽

Practical Applications ◽

Named Entity ◽

Highly Efficient

AbstractMost BioCreative tasks to date have focused on assessing the quality of text-mining annotations in terms of precision of recall. Interoperability, speed, and stability are, however, other important factors to consider for practical applications of text mining. The new BioCreative/BeCalm TIPS task focuses purely on these. To participate in this task, I implemented a BeCalm API within the real-time tagging server also used by the Reflect and EXTRACT tools. In addition to retrieval of patent abstracts, PubMed abstracts, and Pub-Med Central open-access articles as required in the TIPS task, the BeCalm API implementation facilitates retrieval of documents from other sources specified as custom request parameters. As in earlier tests, the tagger proved to be both highly efficient and stable, being able to consistently process requests of 5000 abstracts in less than half a minute including retrieval of the document text.

Download Full-text

Named Entity Recognition Using BERT BiLSTM CRF for Chinese Electronic Health Records

2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) ◽

10.1109/cisp-bmei48845.2019.8965823 ◽

2019 ◽

Cited By ~ 5

Author(s):

Zhenjin Dai ◽

Xutao Wang ◽

Pin Ni ◽

Yuming Li ◽

Gangmin Li ◽

...

Keyword(s):

Electronic Health Records ◽

Named Entity Recognition ◽

Entity Recognition ◽

Health Records ◽

Named Entity ◽

Electronic Health

Download Full-text

Concept Attribute Labeling and Context-Aware Named Entity Recognition in Electronic Health Records

International Journal of Reliable and Quality E-Healthcare ◽

10.4018/ijrqeh.2018010101 ◽

2018 ◽

Vol 7 (1) ◽

pp. 1-15 ◽

Cited By ~ 1

Author(s):

Alexandra Pomares-Quimbaya ◽

Rafael A. Gonzalez ◽

Oscar Mauricio Muñoz Velandia ◽

Angel Alberto Garcia Peña ◽

Julián Camilo Daza Rodríguez ◽

...

Keyword(s):

Electronic Health Records ◽

Ad Hoc ◽

Named Entity Recognition ◽

Ensemble Classification ◽

Entity Recognition ◽

Classification Model ◽

Health Records ◽

Named Entity ◽

Electronic Health ◽

Concept Attribute

Extracting valuable knowledge from Electronic Health Records (EHR) represents a challenging task due to the presence of both structured and unstructured data, including codified fields, images and test results. Narrative text in particular contains a variety of notes which are diverse in language and detail, as well as being full of ad hoc terminology, including acronyms and jargon, which is especially challenging in non-English EHR, where there is a dearth of annotated corpora or trained case sets. This paper proposes an approach for NER and concept attribute labeling for EHR that takes into consideration the contextual words around the entity of interest to determine its sense. The approach proposes a composition method of three different NER methods, together with the analysis of the context (neighboring words) using an ensemble classification model. This contributes to disambiguate NER, as well as labeling the concept as confirmed, negated, speculative, pending or antecedent. Results show an improvement of the recall and a limited impact on precision for the NER process.

Download Full-text

Text mining of 15 million full-text scientific articles

10.1101/162099 ◽

2017 ◽

Cited By ~ 5

Author(s):

David Westergaard ◽

Hans-Henrik Stærfeldt ◽

Christian Tønsberg ◽

Lars Juhl Jensen ◽

Søren Brunak

Keyword(s):

Text Mining ◽

Full Text ◽

Disease Gene ◽

Scientific Literature ◽

Named Entity Recognition ◽

Recognition System ◽

Entity Recognition ◽

Data Sets ◽

Named Entity ◽

Benchmark Data

AbstractAcross academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823–2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein–protein, disease–gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.

Download Full-text

A BERT-Based Hybrid System for Chemical Identification and Indexing in Full-Text Articles

10.1101/2021.10.27.466183 ◽

2021 ◽

Author(s):

Arslan Erdengasileng ◽

Keqiao Li ◽

Qing Han ◽

Shubo Tian ◽

Jian Wang ◽

...

Keyword(s):

Text Mining ◽

Information Extraction ◽

Full Text ◽

Data Augmentation ◽

Named Entity Recognition ◽

Entity Recognition ◽

Chemical Identification ◽

Dictionary Matching ◽

Named Entity ◽

Chemical Named Entity Recognition

Identification and indexing of chemical compounds in full-text articles are essential steps in biomedical article categorization, information extraction, and biological text mining. BioCreative Challenge was established to evaluate methods for biological text mining and information extraction. Track 2 of BioCreative VII (summer 2021) consists of two subtasks: chemical identification and chemical indexing in full-text PubMed articles. The chemical identification subtask also includes two parts: chemical named entity recognition (NER) and chemical normalization. In this paper, we present our work on developing a hybrid pipeline for chemical named entity recognition, chemical normalization, and chemical indexing in full-text PubMed articles. Specifically, we applied BERT-based methods for chemical NER and chemical indexing, and a sieve-based dictionary matching method for chemical normalization. For subtask 1, we used PubMedBERT with data augmentation on the chemical NER task. Several chemical-MeSH dictionaries including MeSH.XML, SUPP.XML, MRCONSO.RFF, and PubTator chemical annotations are used in a specific order to get the best performance on chemical normalization. We achieved an F1 score of 0.86 and 0.7668 on chemical NER and chemical normalization, respectively. For subtask 2, we formulated it as a binary prediction problem for each individual chemical compound name. We then used a BERT-based model with engineered features and achieved a strict F1 score of 0.4825 on the test set, which is substantially higher than the median F1 score (0.3971) of all the submissions.

Download Full-text

Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods

JMIR Medical Informatics ◽

10.2196/medinform.9965 ◽

2018 ◽

Vol 6 (4) ◽

pp. e50 ◽

Cited By ~ 10

Author(s):

Yu Zhang ◽

Xuwen Wang ◽

Zhen Hou ◽

Jiao Li

Keyword(s):

Machine Learning ◽

Electronic Health Records ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Methods ◽

Health Records ◽

Named Entity ◽

Machine Learning Methods ◽

Electronic Health

Download Full-text

Named Entity Recognition for Clinical Portuguese Corpus with Conditional Random Fields and Semantic Groups

10.5753/sbcas.2019.6269 ◽

2019 ◽

Cited By ~ 1

Author(s):

João Vitor Andrioli De Souza ◽

Yohan Bonescki Gumiel ◽

Lucas Emanuel Silva e Oliveira ◽

Claudia Maria Cabral Moro

Keyword(s):

Electronic Health Records ◽

Random Fields ◽

Conditional Random Fields ◽

Named Entity Recognition ◽

Entity Recognition ◽

Health Records ◽

Named Entity ◽

Electronic Health

Considering the difficulties of extracting entities from Electronic Health Records (EHR) texts in Portuguese, we explore the Conditional Random Fields (CRF) algorithm to build a Named Entity Recognition (NER) system based on a corpus of clinical Portuguese data annotated by experts. We acquaint the challenges and methods to classify Abbreviations, Disorders, Procedures and Chemicals within the texts. By selecting a meaningful set of features, and parameters with the best performance the results demonstrate that the method is promising and may support other biomedical tasks, nonetheless, further experiments with more features, different architectures and sophisticated preprocessing steps are needed.

Download Full-text