Reducing Pseudo-error Rate of Industrial Machine Vision Systems with Machine Learning Methods

Balázs Szűcs; Áron Ballagi

doi:10.14513/actatechjaur.v12.n4.511

Machine Learning for Automated Polyp Detection in Computed Tomography Colonography

Machine Learning ◽

10.4018/978-1-60960-818-7.ch407 ◽

2012 ◽

pp. 830-850

Author(s):

Abhilash Alexander Miranda ◽

Olivier Caelen ◽

Gianluca Bontempi

Keyword(s):

Machine Learning ◽

Computed Tomography ◽

False Positive ◽

False Positive Rate ◽

Learning Algorithms ◽

Colorectal Polyps ◽

Machine Learning Algorithms ◽

Computed Tomography Colonography ◽

Positive Rate ◽

Independent Features

This chapter presents a comprehensive scheme for automated detection of colorectal polyps in computed tomography colonography (CTC) with particular emphasis on robust learning algorithms that differentiate polyps from non-polyp shapes. The authors’ automated CTC scheme introduces two orientation independent features which encode the shape characteristics that aid in classification of polyps and non-polyps with high accuracy, low false positive rate, and low computations making the scheme suitable for colorectal cancer screening initiatives. Experiments using state-of-the-art machine learning algorithms viz., lazy learning, support vector machines, and naïve Bayes classifiers reveal the robustness of the two features in detecting polyps at 100% sensitivity for polyps with diameter greater than 10 mm while attaining total low false positive rates, respectively, of 3.05, 3.47 and 0.71 per CTC dataset at specificities above 99% when tested on 58 CTC datasets. The results were validated using colonoscopy reports provided by expert radiologists.

Download Full-text

Early Weed Detection Using Image Processing and Machine Learning Techniques in an Australian Chilli Farm

Agriculture ◽

10.3390/agriculture11050387 ◽

2021 ◽

Vol 11 (5) ◽

pp. 387

Author(s):

Nahina Islam ◽

Md Mamunur Rashid ◽

Santoso Wibowo ◽

Cheng-Yuan Xu ◽

Ahsan Morshed ◽

...

Keyword(s):

Machine Learning ◽

False Positive Rate ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Weed Detection ◽

Learning Techniques ◽

Positive Rate ◽

Uav Images

This paper explores the potential of machine learning algorithms for weed and crop classification from UAV images. The identification of weeds in crops is a challenging task that has been addressed through orthomosaicing of images, feature extraction and labelling of images to train machine learning algorithms. In this paper, the performances of several machine learning algorithms, random forest (RF), support vector machine (SVM) and k-nearest neighbours (KNN), are analysed to detect weeds using UAV images collected from a chilli crop field located in Australia. The evaluation metrics used in the comparison of performance were accuracy, precision, recall, false positive rate and kappa coefficient. MATLAB is used for simulating the machine learning algorithms; and the achieved weed detection accuracies are 96% using RF, 94% using SVM and 63% using KNN. Based on this study, RF and SVM algorithms are efficient and practical to use, and can be implemented easily for detecting weed from UAV images.

Download Full-text

Malicious Intrusion Detection Using Machine Learning Schemes

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8839.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 4194-4198

Keyword(s):

Machine Learning ◽

Wireless Networks ◽

Intrusion Detection ◽

False Positive Rate ◽

Feature Selection Method ◽

Training Model ◽

True Positive Rate ◽

Machine Learning Algorithms ◽

Detection Mechanism ◽

Positive Rate

Wireless networks are continuously facing challenges in the field of Information Security. This leads to major researches in the area of Intrusion detection. The working of Intrusion detection is performed mainly by signature based detection and anomaly based detection. Anomaly based detection is based on the behavior of the network. One of the major challenge in this domain is to identify and detect the malicious node in wireless networks. The intrusion detection mechanism has to analyse the behavior of the node in the network by means of the several features possessed by each node. Intelligent schemes are the need of the hour in such scenario. This paper has taken a standard dataset for studying the features of the wireless node and reduced the features by applying the most efficient Correlation Attribute feature selection method. The machine learning algorithms are applied to obtain an effective training model which is then applied on the testing dataset to validate the model. The accuracy of the model is determined by the performance parameters such as true positive rate, false positive rate and ROC area. Neural network, bagging and decision tree algorithm RepTree are giving promising results in comparison with other classification algorithms.

Download Full-text

Tuning the False Positive Rate / False Negative Rate with Phishing Detection Models

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1002.1291s52019 ◽

2019 ◽

Vol 9 (1S5) ◽

pp. 7-13

Keyword(s):

Machine Learning ◽

Neural Networks ◽

False Positive Rate ◽

False Negative ◽

False Negative Rate ◽

Trade Off ◽

Detection Model ◽

Phishing Attacks ◽

Positive Rate ◽

Phishing Detection

Phishing attacks have risen by 209% in the last 10 years according to the Anti Phishing Working Group (APWG) statistics [19]. Machine learning is commonly used to detect phishing attacks. Researchers have traditionally judged phishing detection models with either accuracy or F1-scores, however in this paper we argue that a single metric alone will never correlate to a successful deployment of machine learning phishing detection model. This is because every machine learning model will have an inherent trade-off between it’s False Positive Rate (FPR) and False Negative Rate (FNR). Tuning the trade-off is important since a higher or lower FPR/FNR will impact the user acceptance rate of any deployment of a phishing detection model. When models have high FPR, they tend to block users from accessing legitimate webpages, whereas a model with a high FNR will allow the users to inadvertently access phishing webpages. Either one of these extremes may cause a user base to either complain (due to blocked pages) or fall victim to phishing attacks. Depending on the security needs of a deployment (secure vs relaxed setting) phishing detection models should be tuned accordingly. In this paper, we demonstrate two effective techniques to tune the trade-off between FPR and FNR: varying the class distribution of the training data and adjusting the probabilistic prediction threshold. We demonstrate both techniques using a data set of 50,000 phishing and 50,000 legitimate sites to perform all experiments using three common machine learning algorithms for example, Random Forest, Logistic Regression, and Neural Networks. Using our techniques we are able to regulate a model’s FPR/FNR. We observed that among the three algorithms we used, Neural Networks performed best; resulting in an higher F1-score of 0.98 with corresponding FPR/FNR values of 0.0003 and 0.0198 respectively.

Download Full-text

LC-MS Peak Assignment Based on Unanimous Selection by Six Machine Learning Algorithms

10.21203/rs.3.rs-845859/v1 ◽

2021 ◽

Author(s):

Hiroaki Ito ◽

Takashi Matsui ◽

Ryo Konno ◽

Makoto Itakura ◽

Yoshio Kodera

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

False Positive Rate ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Weak Signals ◽

Accuracy And Precision ◽

Peak Assignment ◽

Positive Rate ◽

Assignment Strategy

Abstract Recent Mass spectrometry (MS)-based techniques enable deep proteome coverage with relative quantitative analysis, resulting in increased identification of very weak signals accompanied by increased data size of liquid chromatography (LC)–MS/MS spectra. However, the identification of weak signals using an assignment strategy with poorer performance resulted in imperfect quantification with misidentification of peaks and ratio distortions. Manually annotating a large number of signals within a very large dataset is not a realistic approach. In this study, therefore, we utilized machine learning algorithms to successfully extract a higher number of peptide peaks with high accuracy and precision. Our strategy evaluated each peak identified using six different algorithms; peptide peaks identified by all six algorithms (i.e., unanimously selected) were subsequently assigned as true peaks, which resulted in a reduction in the false-positive rate. Hence, exact and highly quantitative peptide peaks were obtained, providing better performance than obtained applying the conventional criteria or using a single machine learning algorithm.

Download Full-text

Machine Learning for Automated Polyp Detection in Computed Tomography Colonography

Advances in Bioinformatics and Biomedical Engineering - Biomedical Image Analysis and Machine Learning Technologies ◽

10.4018/978-1-60566-956-4.ch003 ◽

2010 ◽

pp. 54-77

Author(s):

Abhilash Alexander Miranda ◽

Olivier Caelen ◽

Gianluca Bontempi

Keyword(s):

Machine Learning ◽

Computed Tomography ◽

False Positive ◽

False Positive Rate ◽

Learning Algorithms ◽

Colorectal Polyps ◽

Machine Learning Algorithms ◽

Computed Tomography Colonography ◽

Positive Rate ◽

Independent Features

This chapter presents a comprehensive scheme for automated detection of colorectal polyps in computed tomography colonography (CTC) with particular emphasis on robust learning algorithms that differentiate polyps from non-polyp shapes. The authors’ automated CTC scheme introduces two orientation independent features which encode the shape characteristics that aid in classification of polyps and non-polyps with high accuracy, low false positive rate, and low computations making the scheme suitable for colorectal cancer screening initiatives. Experiments using state-of-the-art machine learning algorithms viz., lazy learning, support vector machines, and naïve Bayes classifiers reveal the robustness of the two features in detecting polyps at 100% sensitivity for polyps with diameter greater than 10 mm while attaining total low false positive rates, respectively, of 3.05, 3.47 and 0.71 per CTC dataset at specificities above 99% when tested on 58 CTC datasets. The results were validated using colonoscopy reports provided by expert radiologists.

Download Full-text

Comparative Study of Various Machine Learning Algorithms for Prediction of Insomnia

Advances in Medical Technologies and Clinical Practice - Advanced Classification Techniques for Healthcare Analysis ◽

10.4018/978-1-5225-7796-6.ch011 ◽

2019 ◽

pp. 234-257 ◽

Cited By ~ 5

Author(s):

Ravinder Ahuja ◽

Vishal Vivek ◽

Manika Chandna ◽

Shivani Virmani ◽

Alisha Banga

Keyword(s):

Machine Learning ◽

Heart Diseases ◽

False Positive Rate ◽

Learning Algorithms ◽

True Positive Rate ◽

Machine Learning Algorithms ◽

Support Vector ◽

Mobility Problem ◽

Positive Rate ◽

F Measure

An early diagnosis of insomnia can prevent further medical aids such as anger issues, heart diseases, anxiety, depression, and hypertension. Fifteen machine learning algorithms have been applied and 14 leading factors have been taken into consideration for predicting insomnia. Seven performance parameters (accuracy, kappa, the true positive rate, false positive rate, precision, f-measure, and AUC) are used and for implementation. The authors have used python language. The support vector machine is giving higher performance out of all algorithms giving accuracy 91.6%, f-measure is 92.13, and kappa is 0.83. Further, SVM is applied on another dataset of 100 patients and giving accuracy 92%. In addition, an analysis of the variable importance of CART, C5.0, decision tree, random forest, adaptive boost, and XG boost is calculated. The analysis shows that insomnia primarily depends on the factors, which are the vision problem, mobility problem, and sleep disorder. This chapter mainly finds the usages and effectiveness of machine learning algorithms in Insomnia diseases prediction.

Download Full-text

Hadoop based Parallel Machine Learning Algorithms for Intrusion Detection System

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a4443.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1152-1156

Keyword(s):

Machine Learning ◽

False Positive ◽

Naive Bayes ◽

False Positive Rate ◽

True Positive Rate ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

True Positive ◽

Positive Rate ◽

Bayes Algorithm

Web use and digitized information are getting expanded each day. The measure of information created is likewise getting expanded. On the opposite side, the security assaults cause numerous security dangers in the system, sites and Internet. Interruption discovery in a fast system is extremely a hard undertaking. The Hadoop Implementation is utilized to address the previously mentioned test that is distinguishing interruption in a major information condition at constant. To characterize the strange bundle stream, AI methodologies are used. Innocent Bayes does grouping by a vector of highlight esteems produced using some limited set. Choice Tree is another Machine Learning classifier which is likewise an administered learning model. Choice tree is the stream diagram like tree structure. J48 and Naïve Bayes Algorithm are actualized in Hadoop MapReduce Framework for parallel preparing by utilizing the KDDCup Data Corrected Benchmark dataset records. The outcome acquired is 89.9% True Positive rate and 0.04% False Positive rate for Naive Bayes Algorithm and 98.06% True Positive rate and 0.001% False Positive rate for Decision Tree Algorithm.

Download Full-text

Hybrid rule-based botnet detection approach using machine learning for analysing DNS traffic

PeerJ Computer Science ◽

10.7717/peerj-cs.640 ◽

2021 ◽

Vol 7 ◽

pp. e640

Author(s):

Saif Al-mashhadi ◽

Mohammed Anbar ◽

Iznan Hasbullah ◽

Taief Alaa Alamiedy

Keyword(s):

Machine Learning ◽

False Positive ◽

False Positive Rate ◽

Communication Protocols ◽

Cyber Attacks ◽

Machine Learning Algorithms ◽

Detection Accuracy ◽

Botnet Detection ◽

Internet Service ◽

Positive Rate

Botnets can simultaneously control millions of Internet-connected devices to launch damaging cyber-attacks that pose significant threats to the Internet. In a botnet, bot-masters communicate with the command and control server using various communication protocols. One of the widely used communication protocols is the ‘Domain Name System’ (DNS) service, an essential Internet service. Bot-masters utilise Domain Generation Algorithms (DGA) and fast-flux techniques to avoid static blacklists and reverse engineering while remaining flexible. However, botnet’s DNS communication generates anomalous DNS traffic throughout the botnet life cycle, and such anomaly is considered an indicator of DNS-based botnets presence in the network. Despite several approaches proposed to detect botnets based on DNS traffic analysis; however, the problem still exists and is challenging due to several reasons, such as not considering significant features and rules that contribute to the detection of DNS-based botnet. Therefore, this paper examines the abnormality of DNS traffic during the botnet lifecycle to extract significant enriched features. These features are further analysed using two machine learning algorithms. The union of the output of two algorithms proposes a novel hybrid rule detection model approach. Two benchmark datasets are used to evaluate the performance of the proposed approach in terms of detection accuracy and false-positive rate. The experimental results show that the proposed approach has a 99.96% accuracy and a 1.6% false-positive rate, outperforming other state-of-the-art DNS-based botnet detection approaches.

Download Full-text

Landslide Susceptibility Assessment by Novel Hybrid Machine Learning Algorithms

Sustainability ◽

10.3390/su11164386 ◽

2019 ◽

Vol 11 (16) ◽

pp. 4386 ◽

Cited By ~ 45

Author(s):

Pham ◽

Shirzadi ◽

Shahabi ◽

Omidvar ◽

Singh ◽

...

Keyword(s):

Machine Learning ◽

Characteristic Curve ◽

False Positive Rate ◽

Spatial Prediction ◽

Machine Learning Algorithms ◽

Ensemble Model ◽

Power Prediction ◽

Landslide Occurrence ◽

Positive Rate ◽

Hybrid Machine

: Landslides have multidimensional effects on the socioeconomic as well as environmental conditions of the impacted areas. The aim of this study is the spatial prediction of landslide using hybrid machine learning models including bagging (BA), random subspace (RS) and rotation forest (RF) with alternating decision tree (ADTree) as base classifier in the northern part of the Pithoragarh district, Uttarakhand, Himalaya, India. To construct the database, ten conditioning factors and a total of 103 landslide locations with a ratio of 70/30 were used. The significant factors were determined by chi-square attribute evaluation (CSEA) technique. The validity of the hybrid models was assessed by true positive rate (TP Rate), false positive rate (FP Rate), recall (sensitivity), precision, F-measure and area under the receiver operatic characteristic curve (AUC). Results concluded that land cover was the most important factor while curvature had no effect on landslide occurrence in the study area and it was removed from the modelling process. Additionally, results indicated that although all ensemble models enhanced the power prediction of the ADTree classifier (AUCtraining = 0.859; AUCvalidation = 0.813); however, the RS ensemble model (AUCtraining = 0.883; AUCvalidation = 0.842) outperformed and outclassed the RF (AUCtraining = 0.871; AUCvalidation = 0.840), and the BA (AUCtraining = 0.865; AUCvalidation = 0.836) ensemble model. The obtained results would be helpful for recognizing the landslide prone areas in future to better manage and decrease the damage and negative impacts on the environment.

Download Full-text