scholarly journals Performance and limitation of machine learning algorithms for diabetic retinopathy screening: A meta-analysis (Preprint)

Author(s):  
Jo-Hsuan Wu ◽  
Tin-Yan Alvin Liu ◽  
Wan-Ting Hsu ◽  
Jennifer Hui-Chun Ho ◽  
Chien-Chang Lee
2020 ◽  
Author(s):  
Jo-Hsuan Wu ◽  
Tin-Yan Alvin Liu ◽  
Wan-Ting Hsu ◽  
Jennifer Hui-Chun Ho ◽  
Chien-Chang Lee

BACKGROUND Standardly diagnosed by human experts, the high prevalence of diabetic retinopathy (DR) warrants a more efficient screening method. Although machine learning (ML)-based automated DR diagnosis has gained attention due to recent approval of IDx-DR, performance of this tool has not be examined systematically, and the best ML technique for utilization in real-world setting has not been discussed. OBJECTIVE To examine systematically the overall diagnostic accuracy of ML in diagnosing DR of different categories based on color fundus photographs and to determine the state-of-the-art ML approach. METHODS Published studies in PubMed and EMBASE were searched from inception to June, 2020. Studies were screened for relevant outcomes, publication types, and data sufficiency, and a total of 60 (2.8%) out of 2128 studies were retrieved after study selection. Extraction of data was performed by 2 authors according to PRISMA, and the quality assessment was performed according to QUADUS-2. Meta-analysis of diagnostic accuracy was pooled using a bivariate random-effects model. The main outcomes included diagnostic accuracy, sensitivity, and specificity of ML in diagnosing DR based on color fundus photographs, as well as the performances of different major types of ML algorithms. RESULTS The primary meta-analysis included 60 color fundus photograph studies (445,175 interpretations). Overall, ML demonstrated high accuracy in diagnosing DR of various categories, with a pooled AUROC from 0.97 (95% CI: 0.96, 0.99) to 0.99 (95%CI: 0.98, 1.00). The performance of ML in detecting more-than-mild DR (mtmDR) was robust (Sen: 0.95, AUROC: 0.97), and by subgroup analyses, we observed that robust performance of ML was not limited to benchmark datasets (Sen: 0.92; AUROC: 0.96) but could be generalized to images collected in clinical practice (Sen: 0.97; AUROC: 097). Neural network was the most widely utilized method, and the subgroup analysis revealed a pooled AUROC of 0.98 (95% CI: 0.96, 0.99) for studies that utilized neural networks to diagnose mtmDR. CONCLUSIONS This meta-analysis demonstrated high diagnostic accuracy of ML algorithms in detecting diabetic retinopathy on color fundus photographs, suggesting that state-of-the-art, ML-based DR screening algorithms are likely ready for clinical applications. However, a significant portion of the earlier published studies had methodology flaws, such as the lack of external validation and presence of spectrum bias. The results of these studies should be interpreted with caution.


2019 ◽  
Author(s):  
Sun Jae Moon ◽  
Jin Seub Hwang ◽  
Rajesh Kana ◽  
John Torous ◽  
Jung Won Kim

BACKGROUND Over the recent years, machine learning algorithms have been more widely and increasingly applied in biomedical fields. In particular, its application has been drawing more attention in the field of psychiatry, for instance, as diagnostic tests/tools for autism spectrum disorder. However, given its complexity and potential clinical implications, there is ongoing need for further research on its accuracy. OBJECTIVE The current study aims to summarize the evidence for the accuracy of use of machine learning algorithms in diagnosing autism spectrum disorder (ASD) through systematic review and meta-analysis. METHODS MEDLINE, Embase, CINAHL Complete (with OpenDissertations), PsyINFO and IEEE Xplore Digital Library databases were searched on November 28th, 2018. Studies, which used a machine learning algorithm partially or fully in classifying ASD from controls and provided accuracy measures, were included in our analysis. Bivariate random effects model was applied to the pooled data in meta-analysis. Subgroup analysis was used to investigate and resolve the source of heterogeneity between studies. True-positive, false-positive, false negative and true-negative values from individual studies were used to calculate the pooled sensitivity and specificity values, draw SROC curves, and obtain area under the curve (AUC) and partial AUC. RESULTS A total of 43 studies were included for the final analysis, of which meta-analysis was performed on 40 studies (53 samples with 12,128 participants). A structural MRI subgroup meta-analysis (12 samples with 1,776 participants) showed the sensitivity at 0.83 (95% CI-0.76 to 0.89), specificity at 0.84 (95% CI -0.74 to 0.91), and AUC/pAUC at 0.90/0.83. An fMRI/deep neural network (DNN) subgroup meta-analysis (five samples with 1,345 participants) showed the sensitivity at 0.69 (95% CI- 0.62 to 0.75), the specificity at 0.66 (95% CI -0.61 to 0.70), and AUC/pAUC at 0.71/0.67. CONCLUSIONS Machine learning algorithms that used structural MRI features in diagnosis of ASD were shown to have accuracy that is similar to currently used diagnostic tools.


2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Aan Chu ◽  
David Squirrell ◽  
Andelka M. Phillips ◽  
Ehsan Vaghefi

This systematic review was performed to identify the specifics of an optimal diabetic retinopathy deep learning algorithm, by identifying the best exemplar research studies of the field, whilst highlighting potential barriers to clinical implementation of such an algorithm. Searching five electronic databases (Embase, MEDLINE, Scopus, PubMed, and the Cochrane Library) returned 747 unique records on 20 December 2019. Predetermined inclusion and exclusion criteria were applied to the search results, resulting in 15 highest-quality publications. A manual search through the reference lists of relevant review articles found from the database search was conducted, yielding no additional records. A validation dataset of the trained deep learning algorithms was used for creating a set of optimal properties for an ideal diabetic retinopathy classification algorithm. Potential limitations to the clinical implementation of such systems were identified as lack of generalizability, limited screening scope, and data sovereignty issues. It is concluded that deep learning algorithms in the context of diabetic retinopathy screening have reported impressive results. Despite this, the potential sources of limitations in such systems must be evaluated carefully. An ideal deep learning algorithm should be clinic-, clinician-, and camera-agnostic; complying with the local regulation for data sovereignty, storage, privacy, and reporting; whilst requiring minimum human input.


10.2196/14108 ◽  
2019 ◽  
Vol 6 (12) ◽  
pp. e14108 ◽  
Author(s):  
Sun Jae Moon ◽  
Jinseub Hwang ◽  
Rajesh Kana ◽  
John Torous ◽  
Jung Won Kim

Background In the recent years, machine learning algorithms have been more widely and increasingly applied in biomedical fields. In particular, their application has been drawing more attention in the field of psychiatry, for instance, as diagnostic tests/tools for autism spectrum disorder (ASD). However, given their complexity and potential clinical implications, there is an ongoing need for further research on their accuracy. Objective This study aimed to perform a systematic review and meta-analysis to summarize the available evidence for the accuracy of machine learning algorithms in diagnosing ASD. Methods The following databases were searched on November 28, 2018: MEDLINE, EMBASE, CINAHL Complete (with Open Dissertations), PsycINFO, and Institute of Electrical and Electronics Engineers Xplore Digital Library. Studies that used a machine learning algorithm partially or fully for distinguishing individuals with ASD from control subjects and provided accuracy measures were included in our analysis. The bivariate random effects model was applied to the pooled data in a meta-analysis. A subgroup analysis was used to investigate and resolve the source of heterogeneity between studies. True-positive, false-positive, false-negative, and true-negative values from individual studies were used to calculate the pooled sensitivity and specificity values, draw Summary Receiver Operating Characteristics curves, and obtain the area under the curve (AUC) and partial AUC (pAUC). Results A total of 43 studies were included for the final analysis, of which a meta-analysis was performed on 40 studies (53 samples with 12,128 participants). A structural magnetic resonance imaging (sMRI) subgroup meta-analysis (12 samples with 1776 participants) showed a sensitivity of 0.83 (95% CI 0.76-0.89), a specificity of 0.84 (95% CI 0.74-0.91), and AUC/pAUC of 0.90/0.83. A functional magnetic resonance imaging/deep neural network subgroup meta-analysis (5 samples with 1345 participants) showed a sensitivity of 0.69 (95% CI 0.62-0.75), specificity of 0.66 (95% CI 0.61-0.70), and AUC/pAUC of 0.71/0.67. Conclusions The accuracy of machine learning algorithms for diagnosis of ASD was considered acceptable by few accuracy measures only in cases of sMRI use; however, given the many limitations indicated in our study, further well-designed studies are warranted to extend the potential use of machine learning algorithms to clinical settings. Trial Registration PROSPERO CRD42018117779; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=117779


2018 ◽  
Author(s):  
Su Bin Lim ◽  
Swee Jin Tan ◽  
Wan-Teck Lim ◽  
Chwee Teck Lim

AbstractBackgroundThere exist massive transcriptome profiles in the form of microarray, enabling reuse. The challenge is that they are processed with diverse platforms and preprocessing tools, requiring considerable time and informatics expertise for cross-dataset or cross-cancer analyses. If there exists a single, integrated data source consisting of thousands of samples, similar to TCGA, data-reuse will be facilitated for discovery, analysis, and validation of biomarker-based clinical strategy.FindingsWe present 11 merged microarray-acquired datasets (MMDs) of major cancer types, curating 8,386 patient-derived tumor and tumor-free samples from 95 GEO datasets. Highly concordant MMD-derived patterns of genome-wide differential gene expression were observed with matching TCGA cohorts. Using machine learning algorithms, we show that clinical models trained from all MMDs, except breast MMD, can be directly applied to RNA-seq-acquired TCGA data with an average accuracy of 0.96 in classifying cancer. Machine learning optimized MMD further aids to reveal immune landscape of human cancers critically needed in disease management and clinical interventions.ConclusionsTo facilitate large-scale meta-analysis, we generated a newly curated, unified, large-scale MMD across 11 cancer types. Besides TCGA, this single data source may serve as an excellent training or test set to apply, develop, and refine machine learning algorithms that can be tapped to better define genomic landscape of human cancers.


2021 ◽  
Author(s):  
Yaltafit Abror Jeem ◽  
Refa Nabila ◽  
Dwi Ditha Emelia ◽  
Lutfan Lazuardi ◽  
Hari Kusnanto Josef

Abstract Background One strategy to resolve the increasing prevalence of T2DM is to identify and administer interventions to prediabetes patients. Risk assessment tools help detect diseases, by allowing screening to the high risk group. Machine learning is also used to help diagnosis and identification of prediabetes. This review aims to determine the diagnostic test accuracy of various machine learning algorithms for calculating prediabetes risk.Methods This protocol was written in compliance with the Preferred Reporting Items for Systematic Review and Meta-Analysis for Protocols (PRISMA-P) statement. The databases that will be used include PubMed, ProQuest and EBSCO restricted to January 1999 and May 2019 in English language only. Identification of articles will be done independently by two reviewers through the titles, the abstracts, and then the full-text-articles. Any disagreement will be resolved by consensus. The Newcastle-Ottawa Quality Assessment Scale will be used to measure the quality and potential of bias. Data extraction and content analysis will be performed systematically. Quantitative data will be visualized using a forest plot with the 95% Confidence Intervals. The diagnostic test outcome will be described by the summary receiver operating characteristic curve. Data will be analyzed using Review Manager 5.3 (RevMan 5.3) software package.Discussion We will obtain diagnostic accuracy of various machine learning algorithms for prediabetes risk estimation using this proposed systematic review and meta-analysis. Systematic review registration: This protocol has been registered in the Prospective Registry of Systematic Review (PROSPERO) database. The registration number is CRD42021251242.


Sign in / Sign up

Export Citation Format

Share Document