scholarly journals Fairness in Machine Learning: Against False Positive Rate Equality as a Measure of Fairness

2021 ◽  
pp. 1-30
Author(s):  
Robert Long

Abstract As machine learning informs increasingly consequential decisions, different metrics have been proposed for measuring algorithmic bias or unfairness. Two popular “fairness measures” are calibration and equality of false positive rate. Each measure seems intuitively important, but notably, it is usually impossible to satisfy both measures. For this reason, a large literature in machine learning speaks of a “fairness tradeoff” between these two measures. This framing assumes that both measures are, in fact, capturing something important. To date, philosophers have seldom examined this crucial assumption, and examined to what extent each measure actually tracks a normatively important property. This makes this inevitable statistical conflict – between calibration and false positive rate equality – an important topic for ethics. In this paper, I give an ethical framework for thinking about these measures and argue that, contrary to initial appearances, false positive rate equality is in fact morally irrelevant and does not measure fairness.

2019 ◽  
Author(s):  
Rayees Rahman ◽  
Arad Kodesh ◽  
Stephen Z Levine ◽  
Sven Sandin ◽  
Abraham Reichenberg ◽  
...  

AbstractImportanceCurrent approaches for early identification of individuals at high risk for autism spectrum disorder (ASD) in the general population are limited, where most ASD patients are not identified until after the age of 4. This is despite substantial evidence suggesting that early diagnosis and intervention improves developmental course and outcome.ObjectiveDevelop a machine learning (ML) method predicting the diagnosis of ASD in offspring in a general population sample, using parental electronic medical records (EMR) available before childbirthDesignPrognostic study of EMR data within a single Israeli health maintenance organization, for the parents of 1,397 ASD children (ICD-9/10), and 94,741 non-ASD children born between January 1st, 1997 through December 31st, 2008. The complete EMR record of the parents was used to develop various ML models to predict the risk of having a child with ASD.Main outcomes and measuresRoutinely available parental sociodemographic information, medical histories and prescribed medications data until offspring’s birth were used to generate features to train various machine learning algorithms, including multivariate logistic regression, artificial neural networks, and random forest. Prediction performance was evaluated with 10-fold cross validation, by computing C statistics, sensitivity, specificity, accuracy, false positive rate, and precision (positive predictive value, PPV).ResultsAll ML models tested had similar performance, achieving an average C statistics of 0.70, sensitivity of 28.63%, specificity of 98.62%, accuracy of 96.05%, false positive rate of 1.37%, and positive predictive value of 45.85% for predicting ASD in this dataset.Conclusion and relevanceML algorithms combined with EMR capture early life ASD risk. Such approaches may be able to enhance the ability for accurate and efficient early detection of ASD in large populations of children.Key pointsQuestionCan autism risk in children be predicted using the pre-birth electronic medical record (EMR) of the parents?FindingsIn this population-based study that included 1,397 children with autism spectrum disorder (ASD) and 94,741 non-ASD children, we developed a machine learning classifier for predicting the likelihood of childhood diagnosis of ASD with an average C statistic of 0.70, sensitivity of 28.63%, specificity of 98.62%, accuracy of 96.05%, false positive rate of 1.37%, and positive predictive value of 45.85%.MeaningThe results presented serve as a proof-of-principle of the potential utility of EMR for the identification of a large proportion of future children at a high-risk of ASD.


2012 ◽  
pp. 830-850
Author(s):  
Abhilash Alexander Miranda ◽  
Olivier Caelen ◽  
Gianluca Bontempi

This chapter presents a comprehensive scheme for automated detection of colorectal polyps in computed tomography colonography (CTC) with particular emphasis on robust learning algorithms that differentiate polyps from non-polyp shapes. The authors’ automated CTC scheme introduces two orientation independent features which encode the shape characteristics that aid in classification of polyps and non-polyps with high accuracy, low false positive rate, and low computations making the scheme suitable for colorectal cancer screening initiatives. Experiments using state-of-the-art machine learning algorithms viz., lazy learning, support vector machines, and naïve Bayes classifiers reveal the robustness of the two features in detecting polyps at 100% sensitivity for polyps with diameter greater than 10 mm while attaining total low false positive rates, respectively, of 3.05, 3.47 and 0.71 per CTC dataset at specificities above 99% when tested on 58 CTC datasets. The results were validated using colonoscopy reports provided by expert radiologists.


2020 ◽  
Vol 63 (1) ◽  
Author(s):  
Rayees Rahman ◽  
Arad Kodesh ◽  
Stephen Z. Levine ◽  
Sven Sandin ◽  
Abraham Reichenberg ◽  
...  

Abstract Background. Current approaches for early identification of individuals at high risk for autism spectrum disorder (ASD) in the general population are limited, and most ASD patients are not identified until after the age of 4. This is despite substantial evidence suggesting that early diagnosis and intervention improves developmental course and outcome. The aim of the current study was to test the ability of machine learning (ML) models applied to electronic medical records (EMRs) to predict ASD early in life, in a general population sample. Methods. We used EMR data from a single Israeli Health Maintenance Organization, including EMR information for parents of 1,397 ASD children (ICD-9/10) and 94,741 non-ASD children born between January 1st, 1997 and December 31st, 2008. Routinely available parental sociodemographic information, parental medical histories, and prescribed medications data were used to generate features to train various ML algorithms, including multivariate logistic regression, artificial neural networks, and random forest. Prediction performance was evaluated with 10-fold cross-validation by computing the area under the receiver operating characteristic curve (AUC; C-statistic), sensitivity, specificity, accuracy, false positive rate, and precision (positive predictive value [PPV]). Results. All ML models tested had similar performance. The average performance across all models had C-statistic of 0.709, sensitivity of 29.93%, specificity of 98.18%, accuracy of 95.62%, false positive rate of 1.81%, and PPV of 43.35% for predicting ASD in this dataset. Conclusions. We conclude that ML algorithms combined with EMR capture early life ASD risk as well as reveal previously unknown features to be associated with ASD-risk. Such approaches may be able to enhance the ability for accurate and efficient early detection of ASD in large populations of children.


2009 ◽  
Vol 53 (7) ◽  
pp. 2949-2954 ◽  
Author(s):  
Isabel Cuesta ◽  
Concha Bielza ◽  
Pedro Larrañaga ◽  
Manuel Cuenca-Estrella ◽  
Fernando Laguna ◽  
...  

ABSTRACT European Committee on Antimicrobial Susceptibility Testing (EUCAST) breakpoints classify Candida strains with a fluconazole MIC ≤ 2 mg/liter as susceptible, those with a fluconazole MIC of 4 mg/liter as representing intermediate susceptibility, and those with a fluconazole MIC > 4 mg/liter as resistant. Machine learning models are supported by complex statistical analyses assessing whether the results have statistical relevance. The aim of this work was to use supervised classification algorithms to analyze the clinical data used to produce EUCAST fluconazole breakpoints. Five supervised classifiers (J48, Correlation and Regression Trees [CART], OneR, Naïve Bayes, and Simple Logistic) were used to analyze two cohorts of patients with oropharyngeal candidosis and candidemia. The target variable was the outcome of the infections, and the predictor variables consisted of values for the MIC or the proportion between the dose administered and the MIC of the isolate (dose/MIC). Statistical power was assessed by determining values for sensitivity and specificity, the false-positive rate, the area under the receiver operating characteristic (ROC) curve, and the Matthews correlation coefficient (MCC). CART obtained the best statistical power for a MIC > 4 mg/liter for detecting failures (sensitivity, 87%; false-positive rate, 8%; area under the ROC curve, 0.89; MCC index, 0.80). For dose/MIC determinations, the target was >75, with a sensitivity of 91%, a false-positive rate of 10%, an area under the ROC curve of 0.90, and an MCC index of 0.80. Other classifiers gave similar breakpoints with lower statistical power. EUCAST fluconazole breakpoints have been validated by means of machine learning methods. These computer tools must be incorporated in the process for developing breakpoints to avoid researcher bias, thus enhancing the statistical power of the model.


2021 ◽  
Vol 42 (Supplement_1) ◽  
Author(s):  
A Rosier ◽  
E Crespin ◽  
A Lazarus ◽  
G Laurent ◽  
A Menet ◽  
...  

Abstract Background Implantable Loop Recorders (ILRs) are increasingly used and generate a high workload for timely adjudication of ECG recordings. In particular, the excessive false positive rate leads to a significant review burden. Purpose A novel machine learning algorithm was developed to reclassify ILR episodes in order to decrease by 80% the False Positive rate while maintaining 99% sensitivity. This study aims to evaluate the impact of this algorithm to reduce the number of abnormal episodes reported in Medtronic ILRs. Methods Among 20 European centers, all Medtronic ILR patients were enrolled during the 2nd semester of 2020. Using a remote monitoring platform, every ILR transmitted episode was collected and anonymised. For every ILR detected episode with a transmitted ECG, the new algorithm reclassified it applying the same labels as the ILR (asystole, brady, AT/AF, VT, artifact, normal). We measured the number of episodes identified as false positive and reclassified as normal by the algorithm, and their proportion among all episodes. Results In 370 patients, ILRs recorded 3755 episodes including 305 patient-triggered and 629 with no ECG transmitted. 2821 episodes were analyzed by the novel algorithm, which reclassified 1227 episodes as normal rhythm. These reclassified episodes accounted for 43% of analyzed episodes and 32.6% of all episodes recorded. Conclusion A novel machine learning algorithm significantly reduces the quantity of episodes flagged as abnormal and typically reviewed by healthcare professionals. FUNDunding Acknowledgement Type of funding sources: None. Figure 1. ILR episodes analysis


Author(s):  
Abhilash Alexander Miranda ◽  
Olivier Caelen ◽  
Gianluca Bontempi

This chapter presents a comprehensive scheme for automated detection of colorectal polyps in computed tomography colonography (CTC) with particular emphasis on robust learning algorithms that differentiate polyps from non-polyp shapes. The authors’ automated CTC scheme introduces two orientation independent features which encode the shape characteristics that aid in classification of polyps and non-polyps with high accuracy, low false positive rate, and low computations making the scheme suitable for colorectal cancer screening initiatives. Experiments using state-of-the-art machine learning algorithms viz., lazy learning, support vector machines, and naïve Bayes classifiers reveal the robustness of the two features in detecting polyps at 100% sensitivity for polyps with diameter greater than 10 mm while attaining total low false positive rates, respectively, of 3.05, 3.47 and 0.71 per CTC dataset at specificities above 99% when tested on 58 CTC datasets. The results were validated using colonoscopy reports provided by expert radiologists.


Author(s):  
Prasanna Kannappan ◽  
Herbert G. Tanner ◽  
Arthur C. Trembanis ◽  
Justin H. Walker

A large volume of image data, in the order of thousands to millions of images, can be generated by robotic marine surveys aimed at assessment of organism populations. Manual processing and annotation of individual images in such large datasets is not an attractive option. It would seem that computer vision and machine learning techniques can be used to automate this process, yet to this date, available automated detection and counting tools for scallops do not work well with noisy low-resolution images and are bound to produce very high false positive rates. In this chapter, we hone a recently developed method for automated scallop detection and counting for the purpose of drastically reducing its false positive rate. In the process, we compare the performance of two customized false positive filtering alternatives, histogram of gradients and weighted correlation template matching.


Web use and digitized information are getting expanded each day. The measure of information created is likewise getting expanded. On the opposite side, the security assaults cause numerous security dangers in the system, sites and Internet. Interruption discovery in a fast system is extremely a hard undertaking. The Hadoop Implementation is utilized to address the previously mentioned test that is distinguishing interruption in a major information condition at constant. To characterize the strange bundle stream, AI methodologies are used. Innocent Bayes does grouping by a vector of highlight esteems produced using some limited set. Choice Tree is another Machine Learning classifier which is likewise an administered learning model. Choice tree is the stream diagram like tree structure. J48 and Naïve Bayes Algorithm are actualized in Hadoop MapReduce Framework for parallel preparing by utilizing the KDDCup Data Corrected Benchmark dataset records. The outcome acquired is 89.9% True Positive rate and 0.04% False Positive rate for Naive Bayes Algorithm and 98.06% True Positive rate and 0.001% False Positive rate for Decision Tree Algorithm.


2021 ◽  
Vol 7 ◽  
pp. e640
Author(s):  
Saif Al-mashhadi ◽  
Mohammed Anbar ◽  
Iznan Hasbullah ◽  
Taief Alaa Alamiedy

Botnets can simultaneously control millions of Internet-connected devices to launch damaging cyber-attacks that pose significant threats to the Internet. In a botnet, bot-masters communicate with the command and control server using various communication protocols. One of the widely used communication protocols is the ‘Domain Name System’ (DNS) service, an essential Internet service. Bot-masters utilise Domain Generation Algorithms (DGA) and fast-flux techniques to avoid static blacklists and reverse engineering while remaining flexible. However, botnet’s DNS communication generates anomalous DNS traffic throughout the botnet life cycle, and such anomaly is considered an indicator of DNS-based botnets presence in the network. Despite several approaches proposed to detect botnets based on DNS traffic analysis; however, the problem still exists and is challenging due to several reasons, such as not considering significant features and rules that contribute to the detection of DNS-based botnet. Therefore, this paper examines the abnormality of DNS traffic during the botnet lifecycle to extract significant enriched features. These features are further analysed using two machine learning algorithms. The union of the output of two algorithms proposes a novel hybrid rule detection model approach. Two benchmark datasets are used to evaluate the performance of the proposed approach in terms of detection accuracy and false-positive rate. The experimental results show that the proposed approach has a 99.96% accuracy and a 1.6% false-positive rate, outperforming other state-of-the-art DNS-based botnet detection approaches.


Sign in / Sign up

Export Citation Format

Share Document