Text Mining Biomedical Literature for Discovering Gene-to-Gene Relationships: A Comparative Study of Algorithms

Author(s):  
Ying Liu ◽  
S.B. Navathe ◽  
J. Civera ◽  
V. Dasigi ◽  
A. Ram ◽  
...  
Author(s):  
P. Tamije Selvy ◽  
V. Suriya Prakash ◽  
S. Shriram ◽  
N. Vimalesh

The number of Social Media users have increased rapidly these days and a lot of valuable as well as non valuable information is shared in the social which is capable of reaching many people in a short period of time and hence the valuable information that are shared in the social media can be used for many types of analysis. In this paper the tweets that are shared in the name of a disaster is taken and then a alert system is build. This alert system gives alert to the users after checking the received data with the centralized database. This paper also gives a comparative study on the algorithm used in extracting the data from the social media which gives us the accuracy rate of different algorithm that can be used for text mining.


2020 ◽  
Vol 11 ◽  
Author(s):  
Maria-Theodora Pandi ◽  
Peter J. van der Spek ◽  
Maria Koromina ◽  
George P. Patrinos

Text mining in biomedical literature is an emerging field which has already been shown to have a variety of implementations in many research areas, including genetics, personalized medicine, and pharmacogenomics. In this study, we describe a novel text-mining approach for the extraction of pharmacogenomics associations. The code that was used toward this end was implemented using R programming language, either through custom scripts, where needed, or through utilizing functions from existing libraries. Articles (abstracts or full texts) that correspond to a specified query were extracted from PubMed, while concept annotations were derived by PubTator Central. Terms that denote a Mutation or a Gene as well as Chemical compound terms corresponding to drug compounds were normalized and the sentences containing the aforementioned terms were filtered and preprocessed to create appropriate training sets. Finally, after training and adequate hyperparameter tuning, four text classifiers were created and evaluated (FastText, Linear kernel SVMs, XGBoost, Lasso, and Elastic-Net Regularized Generalized Linear Models) with regard to their performance in identifying pharmacogenomics associations. Although further improvements are essential toward proper implementation of this text-mining approach in the clinical practice, our study stands as a comprehensive, simplified, and up-to-date approach for the identification and assessment of research articles enriched in clinically relevant pharmacogenomics relationships. Furthermore, this work highlights a series of challenges concerning the effective application of text mining in biomedical literature, whose resolution could substantially contribute to the further development of this field.


2019 ◽  
Vol 19 (S13) ◽  
Author(s):  
Christian Simon ◽  
Kristian Davidsen ◽  
Christina Hansen ◽  
Emily Seymour ◽  
Mike Bogetofte Barnkob ◽  
...  

2020 ◽  
Author(s):  
Samir Gupta ◽  
Shruti Rao ◽  
Trisha Miglani ◽  
Yasaswini Iyer ◽  
Junxia Lin ◽  
...  

AbstractInterpretation of a given variant’s pathogenicity is one of the most profound challenges to realizing the promise of genomic medicine. A large amount of information about associations between variants and diseases used by curators and researchers for interpreting variant pathogenicity is buried in biomedical literature. The development of text-mining tools that can extract relevant information from the literature will speed up and assist the variant interpretation curation process. In this work, we present a text-mining tool, MACE2k that extracts evidence sentences containing associations between variants and diseases from full-length PMC Open Access articles. We use different machine learning models (classical and deep learning) to identify evidence sentences with variant-disease associations. Evaluation shows promising results with the best F1-score of 82.9% and AUC-ROC of 73.9%. Classical ML models had a better recall (96.6% for Random Forest) compared to deep learning models. The deep learning model, Convolutional Neural Network had the best precision (75.6%), which is essential for any curation task.


Sign in / Sign up

Export Citation Format

Share Document