Identification of Similar Patients Through Medical Concept Embedding from Electronic Health Records: A Feasibility Study for Rare Disease Diagnosis

Studies in Health Technology and Informatics - Public Health and Informatics ◽

10.3233/shti210241 ◽

2021 ◽

Author(s):

Xiaoyi Chen ◽

Carole Faviez ◽

Marc Vincent ◽

Nicolas Garcelon ◽

Sophie Saunier ◽

...

Keyword(s):

Rare Disease ◽

Large Scale ◽

Disease Diagnosis ◽

Fine Tuning ◽

Patient Identification ◽

Medical Concept ◽

Large Scale Dataset ◽

Precise Diagnosis ◽

Clinical Profiles ◽

Medical Concepts

To identify patients with similar clinical profiles and derive insights from the records and outcomes of similar patients can help fast and precise diagnosis and other clinical decisions for rare diseases. Similarity methods are required to take into account the semantic relations between medical concepts and also the different relevance of all medical concepts presented in patients’ medical records. In this paper, we introduce the methods developed in the context of rare disease screening/diagnosis from clinical data warehouse using medical concept embedding and adjusted aggregations. Our methods provided better preliminary results than baseline methods, with a significant improvement of precision among the top ranked similar patients, which is encouraging for further fine-tuning and application on a large-scale dataset for new/candidate patient identification.

Download Full-text

TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6282 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7780-7788

Author(s):

Siddhant Garg ◽

Thuy Vu ◽

Alessandro Moschitti

Keyword(s):

Large Scale ◽

Question Answering ◽

Positive Impact ◽

Fine Tuning ◽

Target Domain ◽

Domain Specific ◽

Transfer Step ◽

Industrial Setting ◽

Large Scale Dataset ◽

Effective Use

We propose TandA, an effective technique for fine-tuning pre-trained Transformer models for natural language tasks. Specifically, we first transfer a pre-trained model into a model for a general task by fine-tuning it with a large and high-quality dataset. We then perform a second fine-tuning step to adapt the transferred model to the target domain. We demonstrate the benefits of our approach for answer sentence selection, which is a well-known inference task in Question Answering. We built a large scale dataset to enable the transfer step, exploiting the Natural Questions dataset. Our approach establishes the state of the art on two well-known benchmarks, WikiQA and TREC-QA, achieving the impressive MAP scores of 92% and 94.3%, respectively, which largely outperform the the highest scores of 83.4% and 87.5% of previous work. We empirically show that TandA generates more stable and robust models reducing the effort required for selecting optimal hyper-parameters. Additionally, we show that the transfer step of TandA makes the adaptation step more robust to noise. This enables a more effective use of noisy datasets for fine-tuning. Finally, we also confirm the positive impact of TandA in an industrial setting, using domain specific datasets subject to different types of noise.

Download Full-text

Clinical Profiles of Selected Biomarkers Identifying Infection and Cancer Patients: A Gorzów Hospital Example

Disease Markers ◽

10.1155/2019/6826127 ◽

2019 ◽

Vol 2019 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Katarzyna Brzeźniakiewicz-Janus ◽

Marcus Daniel Lancé ◽

Andrzej Tukiendorf ◽

Mirosław Franków ◽

Joanna Rupa-Matysek ◽

...

Keyword(s):

Cancer Patients ◽

Statistical Methods ◽

Medical Management ◽

Large Scale ◽

Roc Analysis ◽

Assessment Tool ◽

Risk Groups ◽

Disease Diagnosis ◽

Study Data ◽

Clinical Profiles

Introduction. Many pathobiological processes that manifest in a patient’s organs could be associated with biomarker levels that are detectable in different human systems. However, biomarkers that promote early disease diagnosis should not be tested only in personalized medicine but also in large-scale diagnostic evaluations of patients, such as for medical management. Objective. We aimed to create an easy algorithmic risk assessment tool that is based on obtainable “everyday” biomarkers, identifying infection and cancer patients. Patients. We obtained the study data from the electronic medical records of 517 patients (186 infection and 331 cancer episodes) hospitalized at Gorzów Hospital, Poland, over a one and a half-year period from the 1st of January 2017 to the 30th of June 2018. Methods and Results. A set of consecutive statistical methods (cluster analysis, ANOVA, and ROC analysis) was used to predict infection and cancer. For in-hospital diagnosis, our approach showed independent clusters of patients by age, sex, MPV, and disease fractions. From the set of available “everyday” biomarkers, we established the most likely bioindicators for infection and cancer together with their classification cutoffs. Conclusions. Despite infection and cancer being very different diseases in their clinical characteristics, it seems possible to discriminate them using “everyday” biomarkers and popular statistical methods. The estimated cutoffs for the specified biomarkers can be used to allocate patients to appropriate risk groups for stratification purposes (medical management or epidemiological administration).

Download Full-text

Towards Generating and Evaluating Iconographic Image Captions of Artworks

Journal of Imaging ◽

10.3390/jimaging7080123 ◽

2021 ◽

Vol 7 (8) ◽

pp. 123

Author(s):

Eva Cetinic

Keyword(s):

Large Scale ◽

Historical Context ◽

Fine Tuning ◽

Natural Image ◽

Learning Approaches ◽

Image Captioning ◽

Ongoing Research ◽

Qualitative Approaches ◽

Large Scale Dataset ◽

Research Challenge

To automatically generate accurate and meaningful textual descriptions of images is an ongoing research challenge. Recently, a lot of progress has been made by adopting multimodal deep learning approaches for integrating vision and language. However, the task of developing image captioning models is most commonly addressed using datasets of natural images, while not many contributions have been made in the domain of artwork images. One of the main reasons for that is the lack of large-scale art datasets of adequate image-text pairs. Another reason is the fact that generating accurate descriptions of artwork images is particularly challenging because descriptions of artworks are more complex and can include multiple levels of interpretation. It is therefore also especially difficult to effectively evaluate generated captions of artwork images. The aim of this work is to address some of those challenges by utilizing a large-scale dataset of artwork images annotated with concepts from the Iconclass classification system. Using this dataset, a captioning model is developed by fine-tuning a transformer-based vision-language pretrained model. Due to the complex relations between image and text pairs in the domain of artwork images, the generated captions are evaluated using several quantitative and qualitative approaches. The performance is assessed using standard image captioning metrics and a recently introduced reference-free metric. The quality of the generated captions and the model’s capacity to generalize to new data is explored by employing the model to another art dataset to compare the relation between commonly generated captions and the genre of artworks. The overall results suggest that the model can generate meaningful captions that indicate a stronger relevance to the art historical context, particularly in comparison to captions obtained from models trained only on natural image datasets.

Download Full-text

CNN-Based Target Recognition and Identification for Infrared Imaging in Defense Systems

Sensors ◽

10.3390/s19092040 ◽

2019 ◽

Vol 19 (9) ◽

pp. 2040 ◽

Cited By ~ 7

Author(s):

Antoine d’Acremont ◽

Ronan Fablet ◽

Alexandre Baussard ◽

Guillaume Quin

Keyword(s):

Large Scale ◽

Data Augmentation ◽

Infrared Imaging ◽

State Of The Art ◽

Object Identification ◽

Fine Tuning ◽

Support Vector ◽

Defense Systems ◽

Large Scale Dataset ◽

In The Wild

Convolutional neural networks (CNNs) have rapidly become the state-of-the-art models for image classification applications. They usually require large groundtruthed datasets for training. Here, we address object identification and recognition in the wild for infrared (IR) imaging in defense applications, where no such large-scale dataset is available. With a focus on robustness issues, especially viewpoint invariance, we introduce a compact and fully convolutional CNN architecture with global average pooling. We show that this model trained from realistic simulation datasets reaches a state-of-the-art performance compared with other CNNs with no data augmentation and fine-tuning steps. We also demonstrate a significant improvement in the robustness to viewpoint changes with respect to an operational support vector machine (SVM)-based scheme.

Download Full-text

Development of a Consumer Health Vocabulary by Mining Health Forum Texts Based on Word Embedding: Semiautomatic Approach (Preprint)

10.2196/preprints.12704 ◽

2018 ◽

Author(s):

Gen Gu ◽

Xingting Zhang ◽

Xingeng Zhu ◽

Zhe Jian ◽

Ken Chen ◽

...

Keyword(s):

Vector Space ◽

Information Seeking ◽

Large Scale ◽

Medical Knowledge ◽

Semantic Distance ◽

Word Embedding ◽

Fine Tuning ◽

Consumer Health ◽

Text Information ◽

Medical Concepts

BACKGROUND The vocabulary gap between consumers and professionals in the medical domain hinders information seeking and communication. Consumer health vocabularies have been developed to aid such informatics applications. This purpose is best served if the vocabulary evolves with consumers’ language. OBJECTIVE Our objective is to develop a method for identifying and adding new terms to consumer health vocabularies, so that it can keep up with the constantly evolving medical knowledge and language use. METHODS In this paper, we propose a consumer health term–finding framework based on a distributed word vector space model. We first learned word vectors from a large-scale text corpus and then adopted a supervised method with existing consumer health vocabularies for learning vector representation of words, which can provide additional supervised fine tuning after unsupervised word embedding learning. With a fine-tuned word vector space, we identified pairs of professional terms and their consumer variants by their semantic distance in the vector space. A subsequent manual review of the extracted and labeled pairs of entities was conducted to validate the results generated by the proposed approach. The results were evaluated using mean reciprocal rank (MRR). RESULTS Manual evaluation showed that it is feasible to identify alternative medical concepts by using professional or consumer concepts as queries in the word vector space without fine tuning, but the results are more promising in the final fine-tuned word vector space. The MRR values indicated that on an average, a professional or consumer concept is about 14th closest to its counterpart in the word vector space without fine tuning, and the MRR in the final fine-tuned word vector space is 8. Furthermore, the results demonstrate that our method can collect abbreviations and common typos frequently used by consumers. CONCLUSIONS By integrating a large amount of text information and existing consumer health vocabularies, our method outperformed several baseline ranking methods and is effective for generating a list of candidate terms for human review during consumer health vocabulary development.

Download Full-text

Rare disease diagnosis: A review of web search, social media and large-scale data-mining approaches

Rare Diseases ◽

10.1080/21675511.2015.1083145 ◽

2015 ◽

Vol 3 (1) ◽

pp. e1083145 ◽

Cited By ~ 25

Author(s):

Dan Svenstrup ◽

Henrik L Jørgensen ◽

Ole Winther

Keyword(s):

Data Mining ◽

Social Media ◽

Rare Disease ◽

Large Scale ◽

Web Search ◽

Disease Diagnosis ◽

Large Scale Data ◽

Scale Data

Download Full-text

Advancing Genomics for Rare Disease Diagnosis and Therapy Development

10.3389/978-2-88966-162-6 ◽

2020 ◽

Keyword(s):

Rare Disease ◽

Disease Diagnosis ◽

Diagnosis And Therapy ◽

Therapy Development

Download Full-text

Survey of Clustering Methods for Large Scale Dataset

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i5.13381344 ◽

2019 ◽

Vol 7 (5) ◽

pp. 1338-1344

Author(s):

Anupama Jawale ◽

Ganesh Magar

Keyword(s):

Large Scale ◽

Clustering Methods ◽

Large Scale Dataset

Download Full-text

The long road to rare disease diagnosis: a hero’s journey

Molecular Genetics and Metabolism ◽

10.1016/s1096-7192(21)00542-4 ◽

2021 ◽

Vol 132 ◽

pp. S297

Author(s):

Colin Halverson ◽

Clair Francomano

Keyword(s):

Rare Disease ◽

Disease Diagnosis ◽

Hero's Journey

Download Full-text

Joint regression and learning from pairwise rankings for personalized image aesthetic assessment

Computational Visual Media ◽

10.1007/s41095-021-0207-y ◽

2021 ◽

Author(s):

Jin Zhou ◽

Qing Zhang ◽

Jian-Hao Fan ◽

Wei Sun ◽

Wei-Shi Zheng

Keyword(s):

Large Scale ◽

Assessment Model ◽

Generic Model ◽

Small Subset ◽

Deep Convolutional Neural Networks ◽

Personal Taste ◽

Hinge Loss ◽

Novel Approach ◽

Large Scale Dataset ◽

Image Pairs

AbstractRecent image aesthetic assessment methods have achieved remarkable progress due to the emergence of deep convolutional neural networks (CNNs). However, these methods focus primarily on predicting generally perceived preference of an image, making them usually have limited practicability, since each user may have completely different preferences for the same image. To address this problem, this paper presents a novel approach for predicting personalized image aesthetics that fit an individual user’s personal taste. We achieve this in a coarse to fine manner, by joint regression and learning from pairwise rankings. Specifically, we first collect a small subset of personal images from a user and invite him/her to rank the preference of some randomly sampled image pairs. We then search for the K-nearest neighbors of the personal images within a large-scale dataset labeled with average human aesthetic scores, and use these images as well as the associated scores to train a generic aesthetic assessment model by CNN-based regression. Next, we fine-tune the generic model to accommodate the personal preference by training over the rankings with a pairwise hinge loss. Experiments demonstrate that our method can effectively learn personalized image aesthetic preferences, clearly outperforming state-of-the-art methods. Moreover, we show that the learned personalized image aesthetic benefits a wide variety of applications.

Download Full-text