TMT-HCC: A tool for text mining the biomedical literature for hepatocellular carcinoma (HCC) biomarkers identification

Rania A. Abul Seoud; Mai S. Mabrouk

doi:10.1016/j.cmpb.2013.07.014

A Novel Text-Mining Approach for Retrieving Pharmacogenomics Associations From the Literature

Frontiers in Pharmacology ◽

10.3389/fphar.2020.602030 ◽

2020 ◽

Vol 11 ◽

Author(s):

Maria-Theodora Pandi ◽

Peter J. van der Spek ◽

Maria Koromina ◽

George P. Patrinos

Keyword(s):

Text Mining ◽

Generalized Linear Models ◽

Linear Models ◽

Biomedical Literature ◽

Linear Kernel ◽

R Programming Language ◽

Research Areas ◽

Text Classifiers ◽

R Programming ◽

Further Development

Text mining in biomedical literature is an emerging field which has already been shown to have a variety of implementations in many research areas, including genetics, personalized medicine, and pharmacogenomics. In this study, we describe a novel text-mining approach for the extraction of pharmacogenomics associations. The code that was used toward this end was implemented using R programming language, either through custom scripts, where needed, or through utilizing functions from existing libraries. Articles (abstracts or full texts) that correspond to a specified query were extracted from PubMed, while concept annotations were derived by PubTator Central. Terms that denote a Mutation or a Gene as well as Chemical compound terms corresponding to drug compounds were normalized and the sentences containing the aforementioned terms were filtered and preprocessed to create appropriate training sets. Finally, after training and adequate hyperparameter tuning, four text classifiers were created and evaluated (FastText, Linear kernel SVMs, XGBoost, Lasso, and Elastic-Net Regularized Generalized Linear Models) with regard to their performance in identifying pharmacogenomics associations. Although further improvements are essential toward proper implementation of this text-mining approach in the clinical practice, our study stands as a comprehensive, simplified, and up-to-date approach for the identification and assessment of research articles enriched in clinically relevant pharmacogenomics relationships. Furthermore, this work highlights a series of challenges concerning the effective application of text mining in biomedical literature, whose resolution could substantially contribute to the further development of this field.

Download Full-text

BioReader: a text mining tool for performing classification of biomedical literature

BMC Bioinformatics ◽

10.1186/s12859-019-2607-x ◽

2019 ◽

Vol 19 (S13) ◽

Cited By ~ 9

Author(s):

Christian Simon ◽

Kristian Davidsen ◽

Christina Hansen ◽

Emily Seymour ◽

Mike Bogetofte Barnkob ◽

...

Keyword(s):

Text Mining ◽

Biomedical Literature ◽

Mining Tool ◽

Text Mining Tool

Download Full-text

Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature

BMC Bioinformatics ◽

10.1186/s12859-018-2103-8 ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 27

Author(s):

H.-M. Müller ◽

K. M. Van Auken ◽

Y. Li ◽

P. W. Sternberg

Keyword(s):

Text Mining ◽

Biomedical Literature

Download Full-text

Identifying Potential Early Biomarkers Of Acute Myocardial Infarction In The Biomedical Literature: A Comparison Of Text Mining And Manual Sifting Techniques

Value in Health ◽

10.1016/j.jval.2016.09.120 ◽

2016 ◽

Vol 19 (7) ◽

pp. A367 ◽

Cited By ~ 1

Author(s):

S Paisley ◽

J Seva ◽

M Stevenson ◽

R Archer ◽

L Preston ◽

...

Keyword(s):

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Text Mining ◽

Biomedical Literature ◽

Early Biomarkers

Download Full-text

BCISeach: A Searching Platform of Breast Cancer Text Mining for Biomedical Literature

2016 12th International Conference on Semantics, Knowledge and Grids (SKG) ◽

10.1109/skg.2016.034 ◽

2016 ◽

Cited By ~ 1

Author(s):

Lejun Gong ◽

Ronggen Yang ◽

Haoyu Yang ◽

Kaiyu Jiang ◽

Zhenjiang Dong ◽

...

Keyword(s):

Breast Cancer ◽

Text Mining ◽

Biomedical Literature

Download Full-text

MACE2K: A Text-Mining Tool to Extract Literature-based Evidence for Variant Interpretation using Machine Learning

10.1101/2020.12.03.409094 ◽

2020 ◽

Author(s):

Samir Gupta ◽

Shruti Rao ◽

Trisha Miglani ◽

Yasaswini Iyer ◽

Junxia Lin ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Text Mining ◽

Genomic Medicine ◽

Relevant Information ◽

Biomedical Literature ◽

Variant Interpretation ◽

Learning Models ◽

Mining Tool ◽

Text Mining Tool

AbstractInterpretation of a given variant’s pathogenicity is one of the most profound challenges to realizing the promise of genomic medicine. A large amount of information about associations between variants and diseases used by curators and researchers for interpreting variant pathogenicity is buried in biomedical literature. The development of text-mining tools that can extract relevant information from the literature will speed up and assist the variant interpretation curation process. In this work, we present a text-mining tool, MACE2k that extracts evidence sentences containing associations between variants and diseases from full-length PMC Open Access articles. We use different machine learning models (classical and deep learning) to identify evidence sentences with variant-disease associations. Evaluation shows promising results with the best F1-score of 82.9% and AUC-ROC of 73.9%. Classical ML models had a better recall (96.6% for Random Forest) compared to deep learning models. The deep learning model, Convolutional Neural Network had the best precision (75.6%), which is essential for any curation task.

Download Full-text

Classification of Biomedical Literature in Hypertension and Diabetes

International Journal on Data Science ◽

10.18517/ijods.1.2.114-119.2020 ◽

2020 ◽

Vol 1 (2) ◽

pp. 114-119

Author(s):

Nur Aniq Syafiq Rodzuan ◽

Shahreen Kasim ◽

Mohanavali Sithambranathan ◽

Muhammad Zaki Hassan

Keyword(s):

Text Mining ◽

New Technology ◽

Biomedical Literature ◽

Text Documents ◽

Textual Databases ◽

The Difference ◽

Classification Evaluation ◽

Linguistic Approaches ◽

Clear Information

Textual information gives us more clear information as it is presented using words and characters, which is easy for humans to understand. To extract this kind of information, text mining was introduced as new technology. Text mining is the process of extracting non-trivial patterns or knowledge from text documents or from textual databases. The purpose of this research paper is to perform and compare keyword extraction using statistical and linguistic extraction tools for 120 text documents related to hypertension and diabetes disease. In order to draw this comparison, RStudio, a statistical-based tool and TerMine, a linguistic-based tool have been used to demonstrate the process of extracting the specified keyword from the biomedical literature. Thus, classification evaluation using Naïve Bayes classifier is carried out in order to evaluate and compare the performance of the statistical and linguistic approaches using these tools. Experimental results show the result of the comparison and the difference between both tools in executing extraction keywords.

Download Full-text

ProPheno: An online dataset for completely characterizing the human protein-phenotype landscape in biomedical literature

10.7287/peerj.preprints.27479v1 ◽

2019 ◽

Author(s):

Morteza Pourreza Shahri ◽

Indika Kahanda

Keyword(s):

Text Mining ◽

Predictive Models ◽

Complex Diseases ◽

Biomedical Literature ◽

Human Protein ◽

Mining Tool ◽

Text Mining Tool

Identifying protein-phenotype relations is of paramount importance for applications such as uncovering rare and complex diseases. One of the best resources that captures the protein-phenotype relationships is the biomedical literature. In this work, we introduce ProPheno, a comprehensive online dataset composed of human protein/phenotype mentions extracted from the complete corpora of Medline and PubMed. Moreover, it includes co-occurrences of protein-phenotype pairs within different spans of text such as sentences and paragraphs. We use ProPheno for completely characterizing the human protein-phenotype landscape in biomedical literature. ProPheno, the reported findings and the gained insight has implications for (1) biocurators for expediting their curation efforts, (2) researches for quickly finding relevant articles, and (3) text mining tool developers for training their predictive models. The RESTful API of ProPheno is freely available at http://propheno.cs.montana.edu.

Download Full-text

Mining microbe–disease interactions from literature via a transfer learning model

BMC Bioinformatics ◽

10.1186/s12859-021-04346-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Chengkun Wu ◽

Xinyi Xiao ◽

Canqun Yang ◽

JinXiang Chen ◽

Jiacai Yi ◽

...

Keyword(s):

Text Mining ◽

Large Scale ◽

Named Entity Recognition ◽

Learning Model ◽

Biomedical Literature ◽

Fine Tuning ◽

Entity Recognition ◽

Interaction Extraction ◽

Biomedical Texts ◽

Data Browsing

Abstract Background Interactions of microbes and diseases are of great importance for biomedical research. However, large-scale of microbe–disease interactions are hidden in the biomedical literature. The structured databases for microbe–disease interactions are in limited amounts. In this paper, we aim to construct a large-scale database for microbe–disease interactions automatically. We attained this goal via applying text mining methods based on a deep learning model with a moderate curation cost. We also built a user-friendly web interface that allows researchers to navigate and query required information. Results Firstly, we manually constructed a golden-standard corpus and a sliver-standard corpus (SSC) for microbe–disease interactions for curation. Moreover, we proposed a text mining framework for microbe–disease interaction extraction based on a pretrained model BERE. We applied named entity recognition tools to detect microbe and disease mentions from the free biomedical texts. After that, we fine-tuned the pretrained model BERE to recognize relations between targeted entities, which was originally built for drug–target interactions or drug–drug interactions. The introduction of SSC for model fine-tuning greatly improved detection performance for microbe–disease interactions, with an average reduction in error of approximately 10%. The MDIDB website offers data browsing, custom searching for specific diseases or microbes, and batch downloading. Conclusions Evaluation results demonstrate that our method outperform the baseline model (rule-based PKDE4J) with an average $$F_1$$ F 1 -score of 73.81%. For further validation, we randomly sampled nearly 1000 predicted interactions by our model, and manually checked the correctness of each interaction, which gives a 73% accuracy. The MDIDB webiste is freely avaliable throuth http://dbmdi.com/index/

Download Full-text

A text-mining technique for extracting gene-disease associations from the biomedical literature

International Journal of Bioinformatics Research and Applications ◽

10.1504/ijbra.2010.034075 ◽

2010 ◽

Vol 6 (3) ◽

pp. 270 ◽

Cited By ~ 11

Author(s):

Hisham Al Mubaid ◽

Rajit K. Singh

Keyword(s):

Text Mining ◽

Biomedical Literature ◽

Mining Technique ◽

Disease Associations

Download Full-text