Fairness in Machine Learning: Against False Positive Rate Equality as a Measure of Fairness

Identification of newborns at risk for autism using electronic medical records and machine learning

10.1101/19008367 ◽

2019 ◽

Author(s):

Rayees Rahman ◽

Arad Kodesh ◽

Stephen Z Levine ◽

Sven Sandin ◽

Abraham Reichenberg ◽

...

Keyword(s):

Machine Learning ◽

Autism Spectrum Disorder ◽

Positive Predictive Value ◽

Electronic Medical Records ◽

Predictive Value ◽

False Positive ◽

Medical Records ◽

False Positive Rate ◽

Autism Spectrum ◽

Positive Rate

AbstractImportanceCurrent approaches for early identification of individuals at high risk for autism spectrum disorder (ASD) in the general population are limited, where most ASD patients are not identified until after the age of 4. This is despite substantial evidence suggesting that early diagnosis and intervention improves developmental course and outcome.ObjectiveDevelop a machine learning (ML) method predicting the diagnosis of ASD in offspring in a general population sample, using parental electronic medical records (EMR) available before childbirthDesignPrognostic study of EMR data within a single Israeli health maintenance organization, for the parents of 1,397 ASD children (ICD-9/10), and 94,741 non-ASD children born between January 1st, 1997 through December 31st, 2008. The complete EMR record of the parents was used to develop various ML models to predict the risk of having a child with ASD.Main outcomes and measuresRoutinely available parental sociodemographic information, medical histories and prescribed medications data until offspring’s birth were used to generate features to train various machine learning algorithms, including multivariate logistic regression, artificial neural networks, and random forest. Prediction performance was evaluated with 10-fold cross validation, by computing C statistics, sensitivity, specificity, accuracy, false positive rate, and precision (positive predictive value, PPV).ResultsAll ML models tested had similar performance, achieving an average C statistics of 0.70, sensitivity of 28.63%, specificity of 98.62%, accuracy of 96.05%, false positive rate of 1.37%, and positive predictive value of 45.85% for predicting ASD in this dataset.Conclusion and relevanceML algorithms combined with EMR capture early life ASD risk. Such approaches may be able to enhance the ability for accurate and efficient early detection of ASD in large populations of children.Key pointsQuestionCan autism risk in children be predicted using the pre-birth electronic medical record (EMR) of the parents?FindingsIn this population-based study that included 1,397 children with autism spectrum disorder (ASD) and 94,741 non-ASD children, we developed a machine learning classifier for predicting the likelihood of childhood diagnosis of ASD with an average C statistic of 0.70, sensitivity of 28.63%, specificity of 98.62%, accuracy of 96.05%, false positive rate of 1.37%, and positive predictive value of 45.85%.MeaningThe results presented serve as a proof-of-principle of the potential utility of EMR for the identification of a large proportion of future children at a high-risk of ASD.

Download Full-text

Machine Learning for Automated Polyp Detection in Computed Tomography Colonography

Machine Learning ◽

10.4018/978-1-60960-818-7.ch407 ◽

2012 ◽

pp. 830-850

Author(s):

Abhilash Alexander Miranda ◽

Olivier Caelen ◽

Gianluca Bontempi

Keyword(s):

Machine Learning ◽

Computed Tomography ◽

False Positive ◽

False Positive Rate ◽

Learning Algorithms ◽

Colorectal Polyps ◽

Machine Learning Algorithms ◽

Computed Tomography Colonography ◽

Positive Rate ◽

Independent Features

This chapter presents a comprehensive scheme for automated detection of colorectal polyps in computed tomography colonography (CTC) with particular emphasis on robust learning algorithms that differentiate polyps from non-polyp shapes. The authors’ automated CTC scheme introduces two orientation independent features which encode the shape characteristics that aid in classification of polyps and non-polyps with high accuracy, low false positive rate, and low computations making the scheme suitable for colorectal cancer screening initiatives. Experiments using state-of-the-art machine learning algorithms viz., lazy learning, support vector machines, and naïve Bayes classifiers reveal the robustness of the two features in detecting polyps at 100% sensitivity for polyps with diameter greater than 10 mm while attaining total low false positive rates, respectively, of 3.05, 3.47 and 0.71 per CTC dataset at specificities above 99% when tested on 58 CTC datasets. The results were validated using colonoscopy reports provided by expert radiologists.

Download Full-text

Identification of newborns at risk for autism using electronic medical records and machine learning

European Psychiatry ◽

10.1192/j.eurpsy.2020.17 ◽

2020 ◽

Vol 63 (1) ◽

Author(s):

Rayees Rahman ◽

Arad Kodesh ◽

Stephen Z. Levine ◽

Sven Sandin ◽

Abraham Reichenberg ◽

...

Keyword(s):

Machine Learning ◽

General Population ◽

Electronic Medical Records ◽

False Positive ◽

Medical Records ◽

False Positive Rate ◽

Autism Spectrum ◽

Health Maintenance ◽

C Statistic ◽

Positive Rate

Abstract Background. Current approaches for early identification of individuals at high risk for autism spectrum disorder (ASD) in the general population are limited, and most ASD patients are not identified until after the age of 4. This is despite substantial evidence suggesting that early diagnosis and intervention improves developmental course and outcome. The aim of the current study was to test the ability of machine learning (ML) models applied to electronic medical records (EMRs) to predict ASD early in life, in a general population sample. Methods. We used EMR data from a single Israeli Health Maintenance Organization, including EMR information for parents of 1,397 ASD children (ICD-9/10) and 94,741 non-ASD children born between January 1st, 1997 and December 31st, 2008. Routinely available parental sociodemographic information, parental medical histories, and prescribed medications data were used to generate features to train various ML algorithms, including multivariate logistic regression, artificial neural networks, and random forest. Prediction performance was evaluated with 10-fold cross-validation by computing the area under the receiver operating characteristic curve (AUC; C-statistic), sensitivity, specificity, accuracy, false positive rate, and precision (positive predictive value [PPV]). Results. All ML models tested had similar performance. The average performance across all models had C-statistic of 0.709, sensitivity of 29.93%, specificity of 98.18%, accuracy of 95.62%, false positive rate of 1.81%, and PPV of 43.35% for predicting ASD in this dataset. Conclusions. We conclude that ML algorithms combined with EMR capture early life ASD risk as well as reveal previously unknown features to be associated with ASD-risk. Such approaches may be able to enhance the ability for accurate and efficient early detection of ASD in large populations of children.

Download Full-text

Data Mining Validation of Fluconazole Breakpoints Established by the European Committee on Antimicrobial Susceptibility Testing

Antimicrobial Agents and Chemotherapy ◽

10.1128/aac.00081-09 ◽

2009 ◽

Vol 53 (7) ◽

pp. 2949-2954 ◽

Cited By ~ 18

Author(s):

Isabel Cuesta ◽

Concha Bielza ◽

Pedro Larrañaga ◽

Manuel Cuenca-Estrella ◽

Fernando Laguna ◽

...

Keyword(s):

Machine Learning ◽

Antimicrobial Susceptibility ◽

Roc Curve ◽

False Positive ◽

Statistical Power ◽

Susceptibility Testing ◽

False Positive Rate ◽

Antimicrobial Susceptibility Testing ◽

European Committee ◽

Positive Rate

ABSTRACT European Committee on Antimicrobial Susceptibility Testing (EUCAST) breakpoints classify Candida strains with a fluconazole MIC ≤ 2 mg/liter as susceptible, those with a fluconazole MIC of 4 mg/liter as representing intermediate susceptibility, and those with a fluconazole MIC > 4 mg/liter as resistant. Machine learning models are supported by complex statistical analyses assessing whether the results have statistical relevance. The aim of this work was to use supervised classification algorithms to analyze the clinical data used to produce EUCAST fluconazole breakpoints. Five supervised classifiers (J48, Correlation and Regression Trees [CART], OneR, Naïve Bayes, and Simple Logistic) were used to analyze two cohorts of patients with oropharyngeal candidosis and candidemia. The target variable was the outcome of the infections, and the predictor variables consisted of values for the MIC or the proportion between the dose administered and the MIC of the isolate (dose/MIC). Statistical power was assessed by determining values for sensitivity and specificity, the false-positive rate, the area under the receiver operating characteristic (ROC) curve, and the Matthews correlation coefficient (MCC). CART obtained the best statistical power for a MIC > 4 mg/liter for detecting failures (sensitivity, 87%; false-positive rate, 8%; area under the ROC curve, 0.89; MCC index, 0.80). For dose/MIC determinations, the target was >75, with a sensitivity of 91%, a false-positive rate of 10%, an area under the ROC curve of 0.90, and an MCC index of 0.80. Other classifiers gave similar breakpoints with lower statistical power. EUCAST fluconazole breakpoints have been validated by means of machine learning methods. These computer tools must be incorporated in the process for developing breakpoints to avoid researcher bias, thus enhancing the statistical power of the model.

Download Full-text

A hybrid approach to reducing the false positive rate in unsupervised machine learning intrusion detection

SoutheastCon 2016 ◽

10.1109/secon.2016.7506773 ◽

2016 ◽

Cited By ~ 3

Author(s):

Angela Denise Landress

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

False Positive ◽

False Positive Rate ◽

Hybrid Approach ◽

Unsupervised Machine Learning ◽

Positive Rate

Download Full-text

A novel machine learning algorithm has the potential to reduce by 1/3 the quantity of ILR episodes needing review

European Heart Journal ◽

10.1093/eurheartj/ehab724.0316 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

Author(s):

A Rosier ◽

E Crespin ◽

A Lazarus ◽

G Laurent ◽

A Menet ◽

...

Keyword(s):

Machine Learning ◽

False Positive ◽

Learning Algorithm ◽

False Positive Rate ◽

Machine Learning Algorithm ◽

The Novel ◽

Funding Sources ◽

High Workload ◽

Positive Rate ◽

The Impact

Abstract Background Implantable Loop Recorders (ILRs) are increasingly used and generate a high workload for timely adjudication of ECG recordings. In particular, the excessive false positive rate leads to a significant review burden. Purpose A novel machine learning algorithm was developed to reclassify ILR episodes in order to decrease by 80% the False Positive rate while maintaining 99% sensitivity. This study aims to evaluate the impact of this algorithm to reduce the number of abnormal episodes reported in Medtronic ILRs. Methods Among 20 European centers, all Medtronic ILR patients were enrolled during the 2nd semester of 2020. Using a remote monitoring platform, every ILR transmitted episode was collected and anonymised. For every ILR detected episode with a transmitted ECG, the new algorithm reclassified it applying the same labels as the ILR (asystole, brady, AT/AF, VT, artifact, normal). We measured the number of episodes identified as false positive and reclassified as normal by the algorithm, and their proportion among all episodes. Results In 370 patients, ILRs recorded 3755 episodes including 305 patient-triggered and 629 with no ECG transmitted. 2821 episodes were analyzed by the novel algorithm, which reclassified 1227 episodes as normal rhythm. These reclassified episodes accounted for 43% of analyzed episodes and 32.6% of all episodes recorded. Conclusion A novel machine learning algorithm significantly reduces the quantity of episodes flagged as abnormal and typically reviewed by healthcare professionals. FUNDunding Acknowledgement Type of funding sources: None. Figure 1. ILR episodes analysis

Download Full-text

Machine Learning for Automated Polyp Detection in Computed Tomography Colonography

Advances in Bioinformatics and Biomedical Engineering - Biomedical Image Analysis and Machine Learning Technologies ◽

10.4018/978-1-60566-956-4.ch003 ◽

2010 ◽

pp. 54-77

Author(s):

Abhilash Alexander Miranda ◽

Olivier Caelen ◽

Gianluca Bontempi

Keyword(s):

Machine Learning ◽

Computed Tomography ◽

False Positive ◽

False Positive Rate ◽

Learning Algorithms ◽

Colorectal Polyps ◽

Machine Learning Algorithms ◽

Computed Tomography Colonography ◽

Positive Rate ◽

Independent Features

This chapter presents a comprehensive scheme for automated detection of colorectal polyps in computed tomography colonography (CTC) with particular emphasis on robust learning algorithms that differentiate polyps from non-polyp shapes. The authors’ automated CTC scheme introduces two orientation independent features which encode the shape characteristics that aid in classification of polyps and non-polyps with high accuracy, low false positive rate, and low computations making the scheme suitable for colorectal cancer screening initiatives. Experiments using state-of-the-art machine learning algorithms viz., lazy learning, support vector machines, and naïve Bayes classifiers reveal the robustness of the two features in detecting polyps at 100% sensitivity for polyps with diameter greater than 10 mm while attaining total low false positive rates, respectively, of 3.05, 3.47 and 0.71 per CTC dataset at specificities above 99% when tested on 58 CTC datasets. The results were validated using colonoscopy reports provided by expert radiologists.

Download Full-text

Machine Learning for Detecting Scallops in AUV Benthic Images

Advances in Environmental Engineering and Green Technologies - Computer Vision and Pattern Recognition in Environmental Informatics ◽

10.4018/978-1-4666-9435-4.ch002 ◽

2015 ◽

pp. 22-40

Author(s):

Prasanna Kannappan ◽

Herbert G. Tanner ◽

Arthur C. Trembanis ◽

Justin H. Walker

Keyword(s):

Machine Learning ◽

False Positive ◽

Template Matching ◽

False Positive Rate ◽

Image Data ◽

Machine Learning Techniques ◽

Detection And Counting ◽

Attractive Option ◽

Positive Rate ◽

Histogram Of Gradients

A large volume of image data, in the order of thousands to millions of images, can be generated by robotic marine surveys aimed at assessment of organism populations. Manual processing and annotation of individual images in such large datasets is not an attractive option. It would seem that computer vision and machine learning techniques can be used to automate this process, yet to this date, available automated detection and counting tools for scallops do not work well with noisy low-resolution images and are bound to produce very high false positive rates. In this chapter, we hone a recently developed method for automated scallop detection and counting for the purpose of drastically reducing its false positive rate. In the process, we compare the performance of two customized false positive filtering alternatives, histogram of gradients and weighted correlation template matching.

Download Full-text

Hadoop based Parallel Machine Learning Algorithms for Intrusion Detection System

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a4443.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1152-1156

Keyword(s):

Machine Learning ◽

False Positive ◽

Naive Bayes ◽

False Positive Rate ◽

True Positive Rate ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

True Positive ◽

Positive Rate ◽

Bayes Algorithm

Web use and digitized information are getting expanded each day. The measure of information created is likewise getting expanded. On the opposite side, the security assaults cause numerous security dangers in the system, sites and Internet. Interruption discovery in a fast system is extremely a hard undertaking. The Hadoop Implementation is utilized to address the previously mentioned test that is distinguishing interruption in a major information condition at constant. To characterize the strange bundle stream, AI methodologies are used. Innocent Bayes does grouping by a vector of highlight esteems produced using some limited set. Choice Tree is another Machine Learning classifier which is likewise an administered learning model. Choice tree is the stream diagram like tree structure. J48 and Naïve Bayes Algorithm are actualized in Hadoop MapReduce Framework for parallel preparing by utilizing the KDDCup Data Corrected Benchmark dataset records. The outcome acquired is 89.9% True Positive rate and 0.04% False Positive rate for Naive Bayes Algorithm and 98.06% True Positive rate and 0.001% False Positive rate for Decision Tree Algorithm.

Download Full-text

Hybrid rule-based botnet detection approach using machine learning for analysing DNS traffic

PeerJ Computer Science ◽

10.7717/peerj-cs.640 ◽

2021 ◽

Vol 7 ◽

pp. e640

Author(s):

Saif Al-mashhadi ◽

Mohammed Anbar ◽

Iznan Hasbullah ◽

Taief Alaa Alamiedy

Keyword(s):

Machine Learning ◽

False Positive ◽

False Positive Rate ◽

Communication Protocols ◽

Cyber Attacks ◽

Machine Learning Algorithms ◽

Detection Accuracy ◽

Botnet Detection ◽

Internet Service ◽

Positive Rate

Botnets can simultaneously control millions of Internet-connected devices to launch damaging cyber-attacks that pose significant threats to the Internet. In a botnet, bot-masters communicate with the command and control server using various communication protocols. One of the widely used communication protocols is the ‘Domain Name System’ (DNS) service, an essential Internet service. Bot-masters utilise Domain Generation Algorithms (DGA) and fast-flux techniques to avoid static blacklists and reverse engineering while remaining flexible. However, botnet’s DNS communication generates anomalous DNS traffic throughout the botnet life cycle, and such anomaly is considered an indicator of DNS-based botnets presence in the network. Despite several approaches proposed to detect botnets based on DNS traffic analysis; however, the problem still exists and is challenging due to several reasons, such as not considering significant features and rules that contribute to the detection of DNS-based botnet. Therefore, this paper examines the abnormality of DNS traffic during the botnet lifecycle to extract significant enriched features. These features are further analysed using two machine learning algorithms. The union of the output of two algorithms proposes a novel hybrid rule detection model approach. Two benchmark datasets are used to evaluate the performance of the proposed approach in terms of detection accuracy and false-positive rate. The experimental results show that the proposed approach has a 99.96% accuracy and a 1.6% false-positive rate, outperforming other state-of-the-art DNS-based botnet detection approaches.

Download Full-text