Where to search top-K biomedical ontologies?

Daniela Oliveira; Anila Sahar Butt; Armin Haller; Dietrich Rebholz-Schuhmann; Ratnesh Sahay

doi:10.1093/bib/bby015

Where to search top-K biomedical ontologies?

Briefings in Bioinformatics ◽

10.1093/bib/bby015 ◽

2018 ◽

Vol 20 (4) ◽

pp. 1477-1491 ◽

Cited By ~ 1

Author(s):

Daniela Oliveira ◽

Anila Sahar Butt ◽

Armin Haller ◽

Dietrich Rebholz-Schuhmann ◽

Ratnesh Sahay

Keyword(s):

Search Engines ◽

Ground Truth ◽

Systematic Evaluation ◽

Free Text ◽

Biomedical Data ◽

Biomedical Ontologies ◽

Daily Work ◽

Ranking Algorithms ◽

Retrieval Mechanism ◽

The Right

AbstractMotivationSearching for precise terms and terminological definitions in the biomedical data space is problematic, as researchers find overlapping, closely related and even equivalent concepts in a single or multiple ontologies. Search engines that retrieve ontological resources often suggest an extensive list of search results for a given input term, which leads to the tedious task of selecting the best-fit ontological resource (class or property) for the input term and reduces user confidence in the retrieval engines. A systematic evaluation of these search engines is necessary to understand their strengths and weaknesses in different search requirements.ResultWe have implemented seven comparable Information Retrieval ranking algorithms to search through ontologies and compared them against four search engines for ontologies. Free-text queries have been performed, the outcomes have been judged by experts and the ranking algorithms and search engines have been evaluated against the expert-based ground truth (GT). In addition, we propose a probabilistic GT that is developed automatically to provide deeper insights and confidence to the expert-based GT as well as evaluating a broader range of search queries.ConclusionThe main outcome of this work is the identification of key search factors for biomedical ontologies together with search requirements and a set of recommendations that will help biomedical experts and ontology engineers to select the best-suited retrieval mechanism in their search scenarios. We expect that this evaluation will allow researchers and practitioners to apply the current search techniques more reliably and that it will help them to select the right solution for their daily work.AvailabilityThe source code (of seven ranking algorithms), ground truths and experimental results are available at https://github.com/danielapoliveira/bioont-search-benchmark

Download Full-text

Obstacles to the reuse of study metadata in ClinicalTrials.gov

10.1101/850578 ◽

2019 ◽

Cited By ~ 1

Author(s):

Laura Miron ◽

Rafael S. Gonçalves ◽

Mark A. Musen

Keyword(s):

Free Text ◽

Biomedical Data ◽

Biomedical Ontologies ◽

Experimental Protocol ◽

Data Types ◽

Eligibility Criteria ◽

Government Regulations ◽

Link Type ◽

Contact Information ◽

Mesh Terms

AbstractMetadata that are structured using principled schemas and that use terms from ontologies are essential to making biomedical data findable and reusable for downstream analyses. The largest source of metadata that describes the experimental protocol, funding, and scientific leadership of clinical studies is ClinicalTrials.gov. We evaluated whether values in 302,091 trial records adhere to expected data types and use terms from biomedical ontologies, whether records contain fields required by government regulations, and whether structured elements could replace free-text elements. Contact information, outcome measures, and study design are frequently missing or underspecified. Important fields for search, such as condition and intervention, are not restricted to ontologies, and almost half of the conditions are not denoted by MeSH terms, as recommended. Eligibility criteria are stored as semi-structured free text. Enforcing the presence of all required elements, requiring values for certain fields to be drawn from ontologies, and creating a structured eligibility criteria element would improve the reusability of data from ClinicalTrials.gov in systematic reviews, metanalyses, and matching of eligible patients to trials.

Download Full-text

Obstacles to the reuse of study metadata in ClinicalTrials.gov

Scientific Data ◽

10.1038/s41597-020-00780-z ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Laura Miron ◽

Rafael S. Gonçalves ◽

Mark A. Musen

Keyword(s):

Clinical Studies ◽

Free Text ◽

Biomedical Data ◽

Biomedical Ontologies ◽

Experimental Protocol ◽

Data Types ◽

Eligibility Criteria ◽

Government Regulations ◽

Contact Information ◽

Mesh Terms

Download Full-text

ICD10Net: An Artificial Intelligence Algorithm with Medical Background Conducts ICD-10-CM Coding Task with Outstanding Performance (Preprint)

10.2196/preprints.13677 ◽

2019 ◽

Author(s):

Chin Lin ◽

Yu-Sheng Lou ◽

Chia-Cheng Lee ◽

Chia-Jung Hsu ◽

Ding-Chung Wu ◽

...

Keyword(s):

Artificial Intelligence ◽

General Hospital ◽

Pearson Correlation ◽

Model Performance ◽

International Classification Of Diseases ◽

Free Text ◽

Daily Work ◽

Medical Background ◽

Icd 10 ◽

F Measure

BACKGROUND An artificial intelligence-based algorithm has shown a powerful ability for coding the International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) in discharge notes. However, its performance still requires improvement compared with human experts. The major disadvantage of the previous algorithm is its lack of understanding medical terminologies. OBJECTIVE We propose some methods based on human-learning process and conduct a series of experiments to validate their improvements. METHODS We compared two data sources for training the word-embedding model: English Wikipedia and PubMed journal abstracts. Moreover, the fixed, changeable, and double-channel embedding tables were used to test their performance. Some additional tricks were also applied to improve accuracy. We used these methods to identify the three-chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. Subsequently, 94,483-labeled discharge notes from June 1, 2015 to June 30, 2017 were used from the Tri-Service General Hospital in Taipei, Taiwan. To evaluate performance, 24,762 discharge notes from July 1, 2017 to December 31, 2017, from the same hospital were used. Moreover, 74,324 additional discharge notes collected from other seven hospitals were also tested. The F-measure is the major global measure of effectiveness. RESULTS In understanding medical terminologies, the PubMed-embedding model (Pearson correlation = 0.60/0.57) shows a better performance compared with the Wikipedia-embedding model (Pearson correlation = 0.35/0.31). In the accuracy of ICD-10-CM coding, the changeable model both used the PubMed- and Wikipedia-embedding model has the highest testing mean F-measure (0.7311 and 0.6639 in Tri-Service General Hospital and other seven hospitals, respectively). Moreover, a proposed method called a hybrid sampling method, an augmentation trick to avoid algorithms identifying negative terms, was found to additionally improve the model performance. CONCLUSIONS The proposed model architecture and training method is named as ICD10Net, which is the first expert level model practically applied to daily work. This model can also be applied in unstructured information extraction from free-text medical writing. We have developed a web app to demonstrate our work (https://linchin.ndmctsgh.edu.tw/app/ICD10/).

Download Full-text

P005 Rheumatology virtual clinics during COVID-19: are our patients satisfied?

Rheumatology ◽

10.1093/rheumatology/keab247.004 ◽

2021 ◽

Vol 60 (Supplement_1) ◽

Author(s):

Fajer A Altamimi ◽

Una Martin

Keyword(s):

Medical Information ◽

Worker Safety ◽

University Hospital ◽

Free Text ◽

Risk Of Infection ◽

Further Training ◽

Return Envelope ◽

The Right ◽

Reduced Risk

Abstract Background/Aims Telemedicine can be broadly defined as the use of telecommunication technologies to provide medical information and services. It can be audio, visual, or text. Its use has increased dramatically during the COVID-19 pandemic to ensure patient and healthcare worker safety. Any healthcare professional can engage with it. It carries benefits like reduced stress and expense of traveling, maintenance of social distancing, and reduced risk of infection. There are some potential drawbacks such as lack of physical examination, liability and technological issues. Methods A questionnaire was sent to 200 patients, selected from different virtual clinics (new and review, doctor and ANP led) run between March and May 2020 in the rheumatology department of University Hospital Waterford. We formulated 14 questions to cover the following aspects: demography, the purpose of the consult, punctuality, feedback, medico-legal concerns, and free text for comments. A self-addressed return envelope was included. Results 83 responses were received. 2 were excluded. The ratio of females to male respondents was 59: 41, with the majority over 60 years old. The main appointment type was review 67 (83%). 80% of patients were called either before or at the time of their scheduled appointment. The vast majority (98.8%) of our patients had confidence in our data protection and trusted our system to maintain their confidentiality. 95% stated that they felt comfortable, were given enough time to explain their health problem and felt free from stress. The respondents who preferred attending the clinic in person (17 in total) compared to the virtual were mostly follow up patients- 12 vs. 5 new. Conclusion Patient satisfaction among those surveyed was high, despite having to introduce the service abruptly during the COVID-19 pandemic. There are many improvements we can adopt to improve our service and even maintain after the pandemic as a way of communicating with our stable patients. As we are covering a large geographical catchment, we can continue to implement the virtual clinic for some appointments. We should prioritize our efforts on identifying the right patient and the type of service we can offer, further training of staff, and increasing awareness of the patients as to how to get the most out of a virtual appointment. Disclosure F.A. Altamimi: None. U. Martin: None. C. Sheehy: None.

Download Full-text

Sentiment Analysis Techniques Applied to Raw-Text Data from a Csq-8 Questionnaire about Mindfulness in Times of COVID-19 to Improve Strategy Generation

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18126408 ◽

2021 ◽

Vol 18 (12) ◽

pp. 6408

Author(s):

Mario Jojoa Acosta ◽

Gema Castillo-Sánchez ◽

Begonya Garcia-Zapirain ◽

Isabel de la Torre Díez ◽

Manuel Franco-Martín

Keyword(s):

Health Care ◽

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Transfer Learning ◽

Language Processing ◽

Health Care Professionals ◽

Ground Truth ◽

Relevant Information ◽

Free Text

The use of artificial intelligence in health care has grown quickly. In this sense, we present our work related to the application of Natural Language Processing techniques, as a tool to analyze the sentiment perception of users who answered two questions from the CSQ-8 questionnaires with raw Spanish free-text. Their responses are related to mindfulness, which is a novel technique used to control stress and anxiety caused by different factors in daily life. As such, we proposed an online course where this method was applied in order to improve the quality of life of health care professionals in COVID 19 pandemic times. We also carried out an evaluation of the satisfaction level of the participants involved, with a view to establishing strategies to improve future experiences. To automatically perform this task, we used Natural Language Processing (NLP) models such as swivel embedding, neural networks, and transfer learning, so as to classify the inputs into the following three categories: negative, neutral, and positive. Due to the limited amount of data available—86 registers for the first and 68 for the second—transfer learning techniques were required. The length of the text had no limit from the user’s standpoint, and our approach attained a maximum accuracy of 93.02% and 90.53%, respectively, based on ground truth labeled by three experts. Finally, we proposed a complementary analysis, using computer graphic text representation based on word frequency, to help researchers identify relevant information about the opinions with an objective approach to sentiment. The main conclusion drawn from this work is that the application of NLP techniques in small amounts of data using transfer learning is able to obtain enough accuracy in sentiment analysis and text classification stages.

Download Full-text

Community Detection in Multiplex Networks

ACM Computing Surveys ◽

10.1145/3444688 ◽

2021 ◽

Vol 54 (3) ◽

pp. 1-35

Author(s):

Matteo Magnani ◽

Obaida Hanteer ◽

Roberto Interdonato ◽

Luca Rossi ◽

Andrea Tagarelli

Keyword(s):

Community Detection ◽

Experimental Evaluation ◽

Network Models ◽

Ground Truth ◽

Community Structures ◽

Multiplex Networks ◽

Detection Algorithms ◽

Multiplex Network ◽

The Right ◽

Modes Of Interaction

A multiplex network models different modes of interaction among same-type entities. In this article, we provide a taxonomy of community detection algorithms in multiplex networks. We characterize the different algorithms based on various properties and we discuss the type of communities detected by each method. We then provide an extensive experimental evaluation of the reviewed methods to answer three main questions: to what extent the evaluated methods are able to detect ground-truth communities, to what extent different methods produce similar community structures, and to what extent the evaluated methods are scalable. One goal of this survey is to help scholars and practitioners to choose the right methods for the data and the task at hand, while also emphasizing when such choice is problematic.

Download Full-text

“It still haunts me whether we did the right thing”: a qualitative analysis of free text survey data on the bereavement experiences and support needs of family caregivers

BMC Palliative Care ◽

10.1186/s12904-016-0165-9 ◽

2016 ◽

Vol 15 (1) ◽

Cited By ~ 23

Author(s):

Emily Harrop ◽

Fiona Morgan ◽

Anthony Byrne ◽

Annmarie Nelson

Keyword(s):

Qualitative Analysis ◽

Family Caregivers ◽

Survey Data ◽

Support Needs ◽

Free Text ◽

The Right

Download Full-text

Workforce experience of the implementation of an advanced clinical practice framework in England: a mixed methods evaluation

Human Resources for Health ◽

10.1186/s12960-020-00539-y ◽

2020 ◽

Vol 18 (1) ◽

Author(s):

Jessica Lawler ◽

Katrina Maclaine ◽

Alison Leary

Keyword(s):

Clinical Practice ◽

Large Scale ◽

Medical Model ◽

Free Text ◽

Advanced Practice ◽

Full Time ◽

Mixed Methods Evaluation ◽

Future Direction ◽

The Right ◽

Practice Framework

Abstract Background This study aims to understand how the implementation of the advanced clinical practice framework in England (2017) was experienced by the workforce to check assumptions for a national workforce modelling project. The advanced clinical practice framework was introduced in England in 2017 by Health Education England to clarify the role of advanced practice in the National Health Service. Methods As part of a large-scale workforce modelling project, a self-completed questionnaire was distributed via the Association of Advanced Practice Educators UK aimed at those studying to be an Advanced Clinical Practitioner or who are practicing at this level in order to check assumptions. Semi-structured phone interviews were carried out with this same group. Questionnaires were summarised using descriptive statistics in Excel for categorical responses and interviews and survey free-text were analysed using thematic analysis in NVivo 10. Results The questionnaire received over 500 respondents (ten times that expected) and 15 interviews were carried out. Advanced clinical practice was considered by many respondents the only viable clinical career progression. Respondents felt that employers were not clear about what practicing at this level involved or its future direction. 54% (287) thought that ‘ACP’ was the right job title for them. 19% (98) of respondents wanted their origin registered profession to be included in their title. Balancing advanced clinical practice education concurrently with a full-time role was challenging, participants underestimated the workload and expectations of employer’s training. There is an apparent dichotomy that has developed from the implementation of the 2017 framework: that of advanced clinical practice as an advanced level of practice within a profession, and that of Advanced Clinical Practitioner as a new generic role in the medical model. Conclusions Efforts to establish further clarity and structure around advanced clinical practice are needed for both the individuals practising at this level and their employers. A robust evaluation of the introduction of this role should take place.

Download Full-text

Benchmarking joint multi-omics dimensionality reduction approaches for cancer study

10.1101/2020.01.14.905760 ◽

2020 ◽

Cited By ~ 3

Author(s):

Laura Cantini ◽

Pooya Zakeri ◽

Celine Hernandez ◽

Aurelien Naldi ◽

Denis Thieffry ◽

...

Keyword(s):

Dimensionality Reduction ◽

Ground Truth ◽

Systematic Evaluation ◽

Omics Data ◽

Biological Processes ◽

Cancer Data ◽

Practical Guidelines ◽

Cell Data ◽

Omics Data Integration

AbstractHigh-dimensional multi-omics data are now standard in biology. They can greatly enhance our understanding of biological systems when effectively integrated. To achieve this multi-omics data integration, Joint Dimensionality Reduction (jDR) methods are among the most efficient approaches. However, several jDR methods are available, urging the need for a comprehensive benchmark with practical guidelines.We performed a systematic evaluation of nine representative jDR methods using three complementary benchmarks. First, we evaluated their performances in retrieving ground-truth sample clustering from simulated multi-omics datasets. Second, we used TCGA cancer data to assess their strengths in predicting survival, clinical annotations and known pathways/biological processes. Finally, we assessed their classification of multi-omics single-cell data.From these in-depth comparisons, we observed that intNMF performs best in clustering, while MCIA offers a consistent and effective behavior across many contexts. The full code of this benchmark is implemented in a Jupyter notebook - multi-omics mix (momix) - to foster reproducibility, and support data producers, users and future developers.

Download Full-text

Text mining-based word representations for biomedical data analysis and machine learning tasks

10.1101/2020.12.09.417733 ◽

2020 ◽

Author(s):

Halima Alachram ◽

Hryhorii Chereda ◽

Tim Beißbarth ◽

Edgar Wingender ◽

Philip Stegmaier

Keyword(s):

Text Mining ◽

Gene Networks ◽

Free Text ◽

Biomedical Data ◽

Science Literature ◽

Biological Databases ◽

Biomedical Analysis ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Corpus Size

AbstractBiomedical and life science literature is an essential way to publish experimental results. With the rapid growth of the number of new publications, the amount of scientific knowledge represented in free text is increasing remarkably. There has been much interest in developing techniques that can extract this knowledge and make it accessible to aid scientists in discovering new relationships between biological entities and answering biological questions. Making use of the word2vec approach, we generated word vector representations based on a corpus consisting of over 16 million PubMed abstracts. We developed a text mining pipeline to produce word2vec embeddings with different properties and performed validation experiments to assess their utility for biomedical analysis. An important pre-processing step consisted in the substitution of synonymous terms by their preferred terms in biomedical databases. Furthermore, we extracted gene-gene networks from two embedding versions and used them as prior knowledge to train Graph-Convolutional Neural Networks (CNNs) on breast cancer gene expression data to predict the occurrence of metastatic events. Performances of resulting models were compared to Graph-CNNs trained with protein-protein interaction (PPI) networks or with networks derived using other word embedding algorithms. We also assessed the effect of corpus size on the variability of word representations. Finally, we created a web service with a graphical and a RESTful interface to extract and explore relations between biomedical terms using annotated embeddings. Comparisons to biological databases showed that relations between entities such as known PPIs, signaling pathways and cellular functions, or narrower disease ontology groups correlated with higher cosine similarity. Graph-CNNs trained with word2vec-embedding-derived networks performed best for the metastatic event prediction task compared to other networks. Word representations as produced by text mining algorithms like word2vec, therefore capture biologically meaningful relations between entities.

Download Full-text