Can Additional Patient Information Improve the Diagnostic Performance of Deep Learning for the Interpretation of Knee Osteoarthritis Severity

Dong Hyun Kim; Kyong Joon Lee; Dongjun Choi; Jae Ik Lee; Han Gyeol Choi; Yong Seuk Lee

doi:10.3390/jcm9103341

Can Additional Patient Information Improve the Diagnostic Performance of Deep Learning for the Interpretation of Knee Osteoarthritis Severity

Journal of Clinical Medicine ◽

10.3390/jcm9103341 ◽

2020 ◽

Vol 9 (10) ◽

pp. 3341

Author(s):

Dong Hyun Kim ◽

Kyong Joon Lee ◽

Dongjun Choi ◽

Jae Ik Lee ◽

Han Gyeol Choi ◽

...

Keyword(s):

Deep Learning ◽

Confidence Interval ◽

Knee Osteoarthritis ◽

Patient Information ◽

Diagnostic Performance ◽

Characteristic Curve ◽

Image Data ◽

Clinical Information ◽

Additional Patient ◽

Test Set

The study compares the diagnostic performance of deep learning (DL) with that of the former radiologist reading of the Kellgren–Lawrence (KL) grade and evaluates whether additional patient data can improve the diagnostic performance of DL. From March 2003 to February 2017, 3000 patients with 4366 knee AP radiographs were randomly selected. DL was trained using knee images and clinical information in two stages. In the first stage, DL was trained only with images and then in the second stage, it was trained with image data and clinical information. In the test set of image data, the areas under the receiver operating characteristic curve (AUC)s of the DL algorithm in diagnosing KL 0 to KL 4 were 0.91 (95% confidence interval (CI), 0.88–0.95), 0.80 (95% CI, 0.76–0.84), 0.69 (95% CI, 0.64–0.73), 0.86 (95% CI, 0.83–0.89), and 0.96 (95% CI, 0.94–0.98), respectively. In the test set with image data and additional patient information, the AUCs of the DL algorithm in diagnosing KL 0 to KL 4 were 0.97 (95% confidence interval (CI), 0.71–0.74), 0.85 (95% CI, 0.80–0.86), 0.75 (95% CI, 0.66–0.73), 0.86 (95% CI, 0.79–0.85), and 0.95 (95% CI, 0.91–0.97), respectively. The diagnostic performance of image data with additional patient information showed a statistically significantly higher AUC than image data alone in diagnosing KL 0, 1, and 2 (p-values were 0.008, 0.020, and 0.027, respectively).The diagnostic performance of DL was comparable to that of the former radiologist reading of the knee osteoarthritis KL grade. Additional patient information improved DL diagnosis in interpreting early knee osteoarthritis.

Download Full-text

IGRNet: A Deep Learning Model for Non-Invasive, Real-Time Diagnosis of Prediabetes through Electrocardiograms

Sensors ◽

10.3390/s20092556 ◽

2020 ◽

Vol 20 (9) ◽

pp. 2556

Author(s):

Liyang Wang ◽

Yao Mu ◽

Jing Zhao ◽

Xiaoya Wang ◽

Huilian Che

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Real Time ◽

Clinical Symptoms ◽

Characteristic Curve ◽

Learning Model ◽

Machine Learning Algorithms ◽

Test Set ◽

Non Invasive ◽

Deep Learning Model

The clinical symptoms of prediabetes are mild and easy to overlook, but prediabetes may develop into diabetes if early intervention is not performed. In this study, a deep learning model—referred to as IGRNet—is developed to effectively detect and diagnose prediabetes in a non-invasive, real-time manner using a 12-lead electrocardiogram (ECG) lasting 5 s. After searching for an appropriate activation function, we compared two mainstream deep neural networks (AlexNet and GoogLeNet) and three traditional machine learning algorithms to verify the superiority of our method. The diagnostic accuracy of IGRNet is 0.781, and the area under the receiver operating characteristic curve (AUC) is 0.777 after testing on the independent test set including mixed group. Furthermore, the accuracy and AUC are 0.856 and 0.825, respectively, in the normal-weight-range test set. The experimental results indicate that IGRNet diagnoses prediabetes with high accuracy using ECGs, outperforming existing other machine learning methods; this suggests its potential for application in clinical practice as a non-invasive, prediabetes diagnosis technology.

Download Full-text

Automated Radiology Alert System for Pneumothorax Detection on Chest Radiographs Improves Efficiency and Diagnostic Performance

Diagnostics ◽

10.3390/diagnostics11071182 ◽

2021 ◽

Vol 11 (7) ◽

pp. 1182

Author(s):

Cheng-Yi Kao ◽

Chiao-Yun Lin ◽

Cheng-Chen Chao ◽

Han-Sheng Huang ◽

Hsing-Yu Lee ◽

...

Keyword(s):

Deep Learning ◽

Diagnostic Performance ◽

Medical Center ◽

Characteristic Curve ◽

Learning Model ◽

Chest Radiographs ◽

Alert System ◽

Before And After ◽

Set Up ◽

Deep Learning Model

We aimed to set up an Automated Radiology Alert System (ARAS) for the detection of pneumothorax in chest radiographs by a deep learning model, and to compare its efficiency and diagnostic performance with the existing Manual Radiology Alert System (MRAS) at the tertiary medical center. This study retrospectively collected 1235 chest radiographs with pneumothorax labeling from 2013 to 2019, and 337 chest radiographs with negative findings in 2019 were separated into training and validation datasets for the deep learning model of ARAS. The efficiency before and after using the model was compared in terms of alert time and report time. During parallel running of the two systems from September to October 2020, chest radiographs prospectively acquired in the emergency department with age more than 6 years served as the testing dataset for comparison of diagnostic performance. The efficiency was improved after using the model, with mean alert time improving from 8.45 min to 0.69 min and the mean report time from 2.81 days to 1.59 days. The comparison of the diagnostic performance of both systems using 3739 chest radiographs acquired during parallel running showed that the ARAS was better than the MRAS as assessed in terms of sensitivity (recall), area under receiver operating characteristic curve, and F1 score (0.837 vs. 0.256, 0.914 vs. 0.628, and 0.754 vs. 0.407, respectively), but worse in terms of positive predictive value (PPV) (precision) (0.686 vs. 1.000). This study had successfully designed a deep learning model for pneumothorax detection on chest radiographs and set up an ARAS with improved efficiency and overall diagnostic performance.

Download Full-text

Nuances of Interpreting X-ray Analysis by Deep Learning and Lessons for Reporting Experimental Findings

Sci ◽

10.3390/sci4010003 ◽

2022 ◽

Vol 4 (1) ◽

pp. 3

Author(s):

Steinar Valsson ◽

Ognjen Arandjelović

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Spatial Information ◽

Characteristic Curve ◽

Image Data ◽

Qualitative Assessment ◽

Learning Approaches ◽

Clinical Value ◽

Diagnostic Significance ◽

X Ray

With the increase in the availability of annotated X-ray image data, there has been an accompanying and consequent increase in research on machine-learning-based, and ion particular deep-learning-based, X-ray image analysis. A major problem with this body of work lies in how newly proposed algorithms are evaluated. Usually, comparative analysis is reduced to the presentation of a single metric, often the area under the receiver operating characteristic curve (AUROC), which does not provide much clinical value or insight and thus fails to communicate the applicability of proposed models. In the present paper, we address this limitation of previous work by presenting a thorough analysis of a state-of-the-art learning approach and hence illuminate various weaknesses of similar algorithms in the literature, which have not yet been fully acknowledged and appreciated. Our analysis was performed on the ChestX-ray14 dataset, which has 14 lung disease labels and metainfo such as patient age, gender, and the relative X-ray direction. We examined the diagnostic significance of different metrics used in the literature including those proposed by the International Medical Device Regulators Forum, and present the qualitative assessment of the spatial information learned by the model. We show that models that have very similar AUROCs can exhibit widely differing clinical applicability. As a result, our work demonstrates the importance of detailed reporting and analysis of the performance of machine-learning approaches in this field, which is crucial both for progress in the field and the adoption of such models in practice.

Download Full-text

Deep learning-based Helicobacter pylori detection: A diagnostic pathology study

10.1101/2020.08.23.20179010 ◽

2020 ◽

Author(s):

Sharon Zhou ◽

Henrik Marklund ◽

Ondrej Blaha ◽

Manisha Desai ◽

Brock Martin ◽

...

Keyword(s):

Deep Learning ◽

High Performance ◽

Characteristic Curve ◽

Ground Truth ◽

Turnaround Time ◽

Great Promise ◽

Test Set ◽

Diagnostic Uncertainty ◽

Gastric Biopsies ◽

H Pylori

Aims: Deep learning (DL), a sub-area of artificial intelligence, has demonstrated great promise at automating diagnostic tasks in pathology, yet its translation into clinical settings has been slow. Few studies have examined its impact on pathologist performance, when embedded into clinical workflows. The identification of H. pylori on H&E stain is a tedious, imprecise task which might benefit from DL assistance. Here, we developed a DL assistant for diagnosing H. pylori in gastric biopsies and tested its impact on pathologist diagnostic accuracy and turnaround time. Methods and results: H&E-stained whole-slide images (WSI) of 303 gastric biopsies with ground truth confirmation by immunohistochemistry formed the study dataset; 47 and 126 WSI were respectively used to train and optimize our DL assistant to detect H. pylori, and 130 were used in a clinical experiment in which 3 experienced GI pathologists reviewed the same test set with and without assistance. On the test set, the assistant achieved high performance, with a WSI-level area-under-the-receiver-operating-characteristic curve (AUROC) of 0.965 (95% CI 0.934-0.987). On H. pylori-positive cases, assisted diagnoses were faster (β, the fixed effect size for assistance= -0.557, p=0.003) and much more accurate (OR=13.37, p<0.001) than unassisted diagnoses. However, assistance increased diagnostic uncertainty on H. pylori-negative cases, resulting in an overall decrease in assisted accuracy (OR=0.435, p=0.016) and negligible impact on overall turnaround time (β for assistance=0.010, p=0.860). Conclusions: DL can assist pathologists with H. pylori diagnosis, but its integration into clinical workflows requires optimization to mitigate diagnostic uncertainty as a potential consequence of assistance.

Download Full-text

Identification of the Facial Features of Patients With Cancer: A Deep Learning–Based Pilot Study

Journal of Medical Internet Research ◽

10.2196/17234 ◽

2020 ◽

Vol 22 (4) ◽

pp. e17234 ◽

Cited By ~ 2

Author(s):

Bin Liang ◽

Na Yang ◽

Guosheng He ◽

Peng Huang ◽

Yong Yang

Keyword(s):

Deep Learning ◽

Characteristic Curve ◽

Age Distribution ◽

Image Data ◽

Face Image ◽

Facial Features ◽

Facial Skin ◽

Data Set ◽

Patients With Cancer ◽

Face Region

Background Cancer has become the second leading cause of death globally. Most cancer cases are due to genetic mutations, which affect metabolism and result in facial changes. Objective In this study, we aimed to identify the facial features of patients with cancer using the deep learning technique. Methods Images of faces of patients with cancer were collected to build the cancer face image data set. A face image data set of people without cancer was built by randomly selecting images from the publicly available MegaAge data set according to the sex and age distribution of the cancer face image data set. Each face image was preprocessed to obtain an upright centered face chip, following which the background was filtered out to exclude the effects of nonrelative factors. A residual neural network was constructed to classify cancer and noncancer cases. Transfer learning, minibatches, few epochs, L2 regulation, and random dropout training strategies were used to prevent overfitting. Moreover, guided gradient-weighted class activation mapping was used to reveal the relevant features. Results A total of 8124 face images of patients with cancer (men: n=3851, 47.4%; women: n=4273, 52.6%) were collected from January 2018 to January 2019. The ages of the patients ranged from 1 year to 70 years (median age 52 years). The average faces of both male and female patients with cancer displayed more obvious facial adiposity than the average faces of people without cancer, which was supported by a landmark comparison. When testing the data set, the training process was terminated after 5 epochs. The area under the receiver operating characteristic curve was 0.94, and the accuracy rate was 0.82. The main relative feature of cancer cases was facial skin, while the relative features of noncancer cases were extracted from the complementary face region. Conclusions In this study, we built a face data set of patients with cancer and constructed a deep learning model to classify the faces of people with and those without cancer. We found that facial skin and adiposity were closely related to the presence of cancer.

Download Full-text

Deep-learning model for screening sepsis using electrocardiography

Scandinavian Journal of Trauma Resuscitation and Emergency Medicine ◽

10.1186/s13049-021-00953-8 ◽

2021 ◽

Vol 29 (1) ◽

Author(s):

Joon-myoung Kwon ◽

Ye Rang Lee ◽

Min-Seung Jung ◽

Yoon-Ji Lee ◽

Yong-Yeon Jo ◽

...

Keyword(s):

Septic Shock ◽

Deep Learning ◽

Confidence Interval ◽

Characteristic Curve ◽

External Validation ◽

Medical Emergency ◽

Validation Dataset ◽

Internal Validation ◽

Significant Difference ◽

Sepsis And Septic Shock

Abstract Background Sepsis is a life-threatening organ dysfunction and a major healthcare burden worldwide. Although sepsis is a medical emergency that requires immediate management, screening for the occurrence of sepsis is difficult. Herein, we propose a deep learning-based model (DLM) for screening sepsis using electrocardiography (ECG). Methods This retrospective cohort study included 46,017 patients who were admitted to two hospitals. A total of 1,548 and 639 patients had sepsis and septic shock, respectively. The DLM was developed using 73,727 ECGs from 18,142 patients, and internal validation was conducted using 7774 ECGs from 7,774 patients. Furthermore, we conducted an external validation with 20,101 ECGs from 20,101 patients from another hospital to verify the applicability of the DLM across centers. Results During the internal and external validations, the area under the receiver operating characteristic curve (AUC) of the DLM using 12-lead ECG was 0.901 (95% confidence interval, 0.882–0.920) and 0.863 (0.846–0.879), respectively, for screening sepsis and 0.906 (95% confidence interval (CI), 0.877–0.936) and 0.899 (95% CI, 0.872–0.925), respectively, for detecting septic shock. The AUC of the DLM for detecting sepsis using 6-lead and single-lead ECGs was 0.845–0.882. A sensitivity map revealed that the QRS complex and T waves were associated with sepsis. Subgroup analysis was conducted using ECGs from 4,609 patients who were admitted with an infectious disease, and the AUC of the DLM for predicting in-hospital mortality was 0.817 (0.793–0.840). There was a significant difference in the prediction score of DLM using ECG according to the presence of infection in the validation dataset (0.277 vs. 0.574, p < 0.001), including severe acute respiratory syndrome coronavirus 2 (0.260 vs. 0.725, p = 0.018). Conclusions The DLM delivered reasonable performance for sepsis screening using 12-, 6-, and single-lead ECGs. The results suggest that sepsis can be screened using not only conventional ECG devices but also diverse life-type ECG machines employing the DLM, thereby preventing irreversible disease progression and mortality.

Download Full-text

MR-Based Radiomics for Differential Diagnosis between Cystic Pituitary Adenoma and Rathke Cleft Cyst

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/6438861 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Yanping Wang ◽

Sixuan Chen ◽

Feng Shi ◽

Xiaoqing Cheng ◽

Qiang Xu ◽

...

Keyword(s):

Differential Diagnosis ◽

Pituitary Adenoma ◽

Diagnostic Performance ◽

Characteristic Curve ◽

Semantic Features ◽

Imaging Features ◽

Semantic Model ◽

Imaging Data ◽

Test Set ◽

Rathke Cleft Cyst

Background. It is often tricky to differentiate cystic pituitary adenoma from Rathke cleft cyst with visual inspection because of similar MRI presentations between them. We aimed to design an MR-based radiomics model for improving differential diagnosis between them. Methods. Conventional diagnostic MRI data (T1-,T2-, and postcontrast T1-weighted MR images) were obtained from 215 pathologically confirmed patients (105 cases with cystic pituitary adenoma and the other 110 cases with Rathke cleft cyst) and were divided into training ( n = 172 ) and test sets ( n = 43 ). MRI radiomics features were extracted from the imaging data, and semantic imaging features ( n = 15 ) were visually estimated by two radiologists. Four classifiers were used to construct radiomics models through 5-fold crossvalidation after feature selection with least absolute shrinkage and selection operator. An integrated model by combining radiomics and semantic features was further constructed. The diagnostic performance was validated in the test set. Receiver operating characteristic curve was used to evaluate and compare the performance of the models at the background of diagnostic performance by radiologist. Results. In test set, the combined radiomics and semantic model using ANN classifier obtained the best classification performance with an AUC of 0.848 (95% CI: 0.750-0.946), accuracy of 76.7% (95% CI: 64.1-89.4%), sensitivity of 73.9% (95% CI: 56.0-91.9%), and specificity of 80.0% (95% CI: 62.5-97.5%) and performed better than multiparametric model ( AUC = 0.792 , 95% CI: 0.674-0.910) or semantic model ( AUC = 0.823 , 95% CI: 0.705-0.941). The two radiologists had an accuracy of 69.8% and 74.4%, respectively, sensitivity of 69.6% and 73.9%, and specificity of 70.0% and 75.0%. Conclusions. The MR-based radiomics model had technical feasibility and good diagnostic performance in the differential diagnosis between cystic pituitary adenoma and Rathke cleft cyst.

Download Full-text

Abstract PS3-18: Differences in the diagnostic performance of breast ultrasound with or without additional patient information: A secondary analysis of an international multicenter trial

10.1158/1538-7445.sabcs20-ps3-18 ◽

2021 ◽

Author(s):

André Pfob ◽

Richard G. Barr ◽

Volker Duda ◽

Christopher Buesch ◽

Joerg Heil ◽

...

Keyword(s):

Patient Information ◽

Diagnostic Performance ◽

Secondary Analysis ◽

Multicenter Trial ◽

Breast Ultrasound ◽

Additional Patient ◽

International Multicenter

Download Full-text

Identification of the Facial Features of Patients With Cancer: A Deep Learning–Based Pilot Study (Preprint)

10.2196/preprints.17234 ◽

2019 ◽

Author(s):

Bin Liang ◽

Na Yang ◽

Guosheng He ◽

Peng Huang ◽

Yong Yang

Keyword(s):

Deep Learning ◽

Characteristic Curve ◽

Age Distribution ◽

Image Data ◽

Face Image ◽

Facial Features ◽

Facial Skin ◽

Data Set ◽

Patients With Cancer ◽

Face Region

BACKGROUND Cancer has become the second leading cause of death globally. Most cancer cases are due to genetic mutations, which affect metabolism and result in facial changes. OBJECTIVE In this study, we aimed to identify the facial features of patients with cancer using the deep learning technique. METHODS Images of faces of patients with cancer were collected to build the cancer face image data set. A face image data set of people without cancer was built by randomly selecting images from the publicly available MegaAge data set according to the sex and age distribution of the cancer face image data set. Each face image was preprocessed to obtain an upright centered face chip, following which the background was filtered out to exclude the effects of nonrelative factors. A residual neural network was constructed to classify cancer and noncancer cases. Transfer learning, minibatches, few epochs, L2 regulation, and random dropout training strategies were used to prevent overfitting. Moreover, guided gradient-weighted class activation mapping was used to reveal the relevant features. RESULTS A total of 8124 face images of patients with cancer (men: n=3851, 47.4%; women: n=4273, 52.6%) were collected from January 2018 to January 2019. The ages of the patients ranged from 1 year to 70 years (median age 52 years). The average faces of both male and female patients with cancer displayed more obvious facial adiposity than the average faces of people without cancer, which was supported by a landmark comparison. When testing the data set, the training process was terminated after 5 epochs. The area under the receiver operating characteristic curve was 0.94, and the accuracy rate was 0.82. The main relative feature of cancer cases was facial skin, while the relative features of noncancer cases were extracted from the complementary face region. CONCLUSIONS In this study, we built a face data set of patients with cancer and constructed a deep learning model to classify the faces of people with and those without cancer. We found that facial skin and adiposity were closely related to the presence of cancer.

Download Full-text

Fully Automatic Deep Learning Framework for Pancreatic Ductal Adenocarcinoma Detection on Computed Tomography

10.20944/preprints202112.0001.v1 ◽

2021 ◽

Author(s):

Natália Alves ◽

Megan Schuurmans ◽

Geke Litjens ◽

Joeran S. Bosma ◽

John Hermans ◽

...

Keyword(s):

Computed Tomography ◽

Deep Learning ◽

Pancreatic Ductal Adenocarcinoma ◽

State Of The Art ◽

Characteristic Curve ◽

Lesion Detection ◽

Ductal Adenocarcinoma ◽

Test Set ◽

Pancreas Tumor ◽

The Impact

Early detection improves prognosis in pancreatic ductal adenocarcinoma (PDAC) but is challenging as lesions are often small and poorly defined on contrast-enhanced computed tomography scans (CE-CT). Deep learning can facilitate PDAC diagnosis, however current models still fail to identify small (&lt;2cm) lesions. In this study, state-of-the-art deep learning models were used to develop an automatic framework for PDAC detection, focusing on small lesions. Additionally, the impact of integrating surrounding anatomy was investigated. CE-CT scans from a cohort of 119 pathology-proven PDAC patients and a cohort of 123 patients without PDAC were used to train a nnUnet for automatic lesion detection and segmentation (nnUnet_T). Two additional nnUnets were trained to investigate the impact of anatomy integration: (1) segmenting the pancreas and tumor (nnUnet_TP), (2) segmenting the pancreas, tumor, and multiple surrounding anatomical structures (nnUnet_MS). An external, publicly available test set was used to compare the performance of the three networks. The nnUnet_MS achieved the best performance, with an area under the receiver operating characteristic curve of 0.91 for the whole test set and 0.88 for tumors &lt;2cm, showing that state-of-the-art deep learning can detect small PDAC and benefits from anatomy information.

Download Full-text