Directed adversarial sampling attacks on phishing detection

Hossein Shirazi; Bruhadeshwar Bezawada; Indrakshi Ray; Chuck Anderson

doi:10.3233/jcs-191411

Directed adversarial sampling attacks on phishing detection

Journal of Computer Security ◽

10.3233/jcs-191411 ◽

2021 ◽

Vol 29 (1) ◽

pp. 1-23

Author(s):

Hossein Shirazi ◽

Bruhadeshwar Bezawada ◽

Indrakshi Ray ◽

Chuck Anderson

Keyword(s):

Machine Learning ◽

Credit Card ◽

Personal Information ◽

Success Probability ◽

Training Dataset ◽

Sensitive Information ◽

Learning Approaches ◽

Adversarial Learning ◽

The Face ◽

Phishing Detection

Phishing websites trick honest users into believing that they interact with a legitimate website and capture sensitive information, such as user names, passwords, credit card numbers, and other personal information. Machine learning is a promising technique to distinguish between phishing and legitimate websites. However, machine learning approaches are susceptible to adversarial learning attacks where a phishing sample can bypass classifiers. Our experiments on publicly available datasets reveal that the phishing detection mechanisms are vulnerable to adversarial learning attacks. We investigate the robustness of machine learning-based phishing detection in the face of adversarial learning attacks. We propose a practical approach to simulate such attacks by generating adversarial samples through direct feature manipulation. To enhance the sample’s success probability, we describe a clustering approach that guides an attacker to select the best possible phishing samples that can bypass the classifier by appearing as legitimate samples. We define the notion of vulnerability level for each dataset that measures the number of features that can be manipulated and the cost for such manipulation. Further, we clustered phishing samples and showed that some clusters of samples are more likely to exhibit higher vulnerability levels than others. This helps an adversary identify the best candidates of phishing samples to generate adversarial samples at a lower cost. Our finding can be used to refine the dataset and develop better learning models to compensate for the weak samples in the training dataset.

Download Full-text

Hybrid Machine Learning: A Tool to Detect Phishing Attacks in Communication Networks

ECTI Transactions on Computer and Information Technology (ECTI-CIT) ◽

10.37936/ecti-cit.2021153.240565 ◽

2021 ◽

Vol 15 (3) ◽

pp. 374-389

Author(s):

Ademola Philip Abidoye ◽

Boniface Kabaso

Keyword(s):

Machine Learning ◽

Communication Networks ◽

Credit Card ◽

Personal Information ◽

False Positive Rate ◽

False Negative ◽

False Negative Rate ◽

Machine Learning Techniques ◽

Sensitive Information ◽

Cyber Attack

Phishing is a cyber-attack that uses disguised email as a weapon and has been on the rise in recent times. Innocent Internet user if peradventure clicking on a fraudulent link may cause him to fall victim of divulging his personal information such as credit card pin, login credentials, banking information and other sensitive information. There are many ways in which the attackers can trick victims to reveal their personal information. In this article, we select important phishing URLs features that can be used by attacker to trick Internet users into taking the attacker’s desired action. We use two machine learning techniques to accurately classify our data sets. We compare the performance of other related techniques with our scheme. The results of the experiments show that the approach is highly effective in detecting phishing URLs and attained an accuracy of 97.8% with 1.06% false positive rate, 0.5% false negative rate, and an error rate of 0.3%. The proposed scheme performs better compared to other selected related work. This shows that our approach can be used for real-time application in detecting phishing URLs.

Download Full-text

Detecting phishing websites using machine learning technique

PLoS ONE ◽

10.1371/journal.pone.0258361 ◽

2021 ◽

Vol 16 (10) ◽

pp. e0258361

Author(s):

Ashit Kumar Dutta

Keyword(s):

Machine Learning ◽

Detection System ◽

Cyber Attacks ◽

Sensitive Information ◽

Learning Approaches ◽

Security Technologies ◽

Online Purchases ◽

Intelligent Technique ◽

Cloud Technologies ◽

Phishing Detection

In recent years, advancements in Internet and cloud technologies have led to a significant increase in electronic trading in which consumers make online purchases and transactions. This growth leads to unauthorized access to users’ sensitive information and damages the resources of an enterprise. Phishing is one of the familiar attacks that trick users to access malicious content and gain their information. In terms of website interface and uniform resource locator (URL), most phishing webpages look identical to the actual webpages. Various strategies for detecting phishing websites, such as blacklist, heuristic, Etc., have been suggested. However, due to inefficient security technologies, there is an exponential increase in the number of victims. The anonymous and uncontrollable framework of the Internet is more vulnerable to phishing attacks. Existing research works show that the performance of the phishing detection system is limited. There is a demand for an intelligent technique to protect users from the cyber-attacks. In this study, the author proposed a URL detection technique based on machine learning approaches. A recurrent neural network method is employed to detect phishing URL. Researcher evaluated the proposed method with 7900 malicious and 5800 legitimate sites, respectively. The experiments’ outcome shows that the proposed method’s performance is better than the recent approaches in malicious URL detection.

Download Full-text

FedPARL: Client Activity and Resource-Oriented Lightweight Federated Learning Model for Resource-Constrained Heterogeneous IoT Environment

Frontiers in Communications and Networks ◽

10.3389/frcmn.2021.657653 ◽

2021 ◽

Vol 2 ◽

Author(s):

Ahmed Imteaj ◽

M. Hadi Amini

Keyword(s):

Machine Learning ◽

Resource Availability ◽

Resource Constraints ◽

Training Model ◽

Convergence Time ◽

Battery Life ◽

Sensitive Information ◽

Learning Approaches ◽

Resource Constrained ◽

Distributed Machine Learning

Federated Learning (FL) is a recently invented distributed machine learning technique that allows available network clients to perform model training at the edge, rather than sharing it with a centralized server. Unlike conventional distributed machine learning approaches, the hallmark feature of FL is to allow performing local computation and model generation on the client side, ultimately protecting sensitive information. Most of the existing FL approaches assume that each FL client has sufficient computational resources and can accomplish a given task without facing any resource-related issues. However, if we consider FL for a heterogeneous Internet of Things (IoT) environment, a major portion of the FL clients may face low resource availability (e.g., lower computational power, limited bandwidth, and battery life). Consequently, the resource-constrained FL clients may give a very slow response, or may be unable to execute expected number of local iterations. Further, any FL client can inject inappropriate model during a training phase that can prolong convergence time and waste resources of all the network clients. In this paper, we propose a novel tri-layer FL scheme, Federated Proximal, Activity and Resource-Aware 31 Lightweight model (FedPARL), that reduces model size by performing sample-based pruning, avoids misbehaved clients by examining their trust score, and allows partial amount of work by considering their resource-availability. The pruning mechanism is particularly useful while dealing with resource-constrained FL-based IoT (FL-IoT) clients. In this scenario, the lightweight training model will consume less amount of resources to accomplish a target convergence. We evaluate each interested client's resource-availability before assigning a task, monitor their activities, and update their trust scores based on their previous performance. To tackle system and statistical heterogeneities, we adapt a re-parameterization and generalization of the current state-of-the-art Federated Averaging (FedAvg) algorithm. The modification of FedAvg algorithm allows clients to perform variable or partial amounts of work considering their resource-constraints. We demonstrate that simultaneously adapting the coupling of pruning, resource and activity awareness, and re-parameterization of FedAvg algorithm leads to more robust convergence of FL in IoT environment.

Download Full-text

IDS for Industrial Applications: A Federated Learning Approach with Active Personalization

Sensors ◽

10.3390/s21206743 ◽

2021 ◽

Vol 21 (20) ◽

pp. 6743

Author(s):

Vasiliki Kelli ◽

Vasileios Argyriou ◽

Thomas Lagkas ◽

George Fragulis ◽

Elisavet Grigoriou ◽

...

Keyword(s):

Machine Learning ◽

Active Learning ◽

Network Flow ◽

Detection System ◽

Human Life ◽

Industrial Sector ◽

Machine Learning Techniques ◽

Sensitive Information ◽

Learning Approaches ◽

Monitoring And Control

Internet of Things (IoT) is a concept adopted in nearly every aspect of human life, leading to an explosive utilization of intelligent devices. Notably, such solutions are especially integrated in the industrial sector, to allow the remote monitoring and control of critical infrastructure. Such global integration of IoT solutions has led to an expanded attack surface against IoT-enabled infrastructures. Artificial intelligence and machine learning have demonstrated their ability to resolve issues that would have been impossible or difficult to address otherwise; thus, such solutions are closely associated with securing IoT. Classical collaborative and distributed machine learning approaches are known to compromise sensitive information. In our paper, we demonstrate the creation of a network flow-based Intrusion Detection System (IDS) aiming to protecting critical infrastructures, stemming from the pairing of two machine learning techniques, namely, federated learning and active learning. The former is utilized for privately training models in federation, while the latter is a semi-supervised approach applied for global model adaptation to each of the participant’s traffic. Experimental results indicate that global models perform significantly better for each participant, when locally personalized with just a few active learning queries. Specifically, we demonstrate how the accuracy increase can reach 7.07% in only 10 queries.

Download Full-text

Detecting Phishing Website Using Machine Learning

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1082 ◽

2021 ◽

pp. 16-19

Author(s):

Aarti Chile ◽

Mrunal Jadhav ◽

Shital Thakare ◽

Prof. Yogita Chavan

Keyword(s):

Machine Learning ◽

Electronic Communication ◽

Personal Information ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Uniform Resource Locator ◽

Debit Card ◽

Maximum Accuracy ◽

Phishing Detection

A fraud attempt to get sensitive and personal information like password, username, and bank details like credit/debit card details by masking as a reliable organization in electronic communication. The phishing website will appear the same as the legitimate website and directs the user to a page to enter personal details of the user on the fake website. Through machine learning algorithms one can improve the accuracy of the prediction. The proposed method predicts the URL based phishing websites based on features and also gives maximum accuracy. This method uses uniform resource locator (URL) features. We identified features that phishing site URLs contain. The proposed method employs those features for phishing detection. The proposed system predicts the URL based phishing websites with maximum accuracy.

Download Full-text

Detection of fraudulent credit card transactions: A comparative analysis of data sampling and classification techniques

Journal of Physics Conference Series ◽

10.1088/1742-6596/2161/1/012072 ◽

2022 ◽

Vol 2161 (1) ◽

pp. 012072

Author(s):

Konduri Praveen Mahesh ◽

Shaik Ashar Afrouz ◽

Anu Shaju Areeckal

Keyword(s):

Machine Learning ◽

Credit Card ◽

Research Problem ◽

Machine Learning Algorithms ◽

Support Vector ◽

Unbalanced Data ◽

Learning Approaches ◽

Data Sampling ◽

Sampled Data ◽

Under Sampling

Abstract Every year there is an increasing loss of a huge amount of money due to fraudulent credit card transactions. Recently there is a focus on using machine learning algorithms to identify fraud transactions. The number of fraud cases to non-fraud transactions is very low. This creates a skewed or unbalanced data, which poses a challenge to training the machine learning models. The availability of a public dataset for this research problem is scarce. The dataset used for this work is obtained from Kaggle. In this paper, we explore different sampling techniques such as under-sampling, Synthetic Minority Oversampling Technique (SMOTE) and SMOTE-Tomek, to work on the unbalanced data. Classification models, such as k-Nearest Neighbour (KNN), logistic regression, random forest and Support Vector Machine (SVM), are trained on the sampled data to detect fraudulent credit card transactions. The performance of the various machine learning approaches are evaluated for its precision, recall and F1-score. The classification results obtained is promising and can be used for credit card fraud detection.

Download Full-text

Detection of Phishing Websites using an Efficient Feature-Based Machine Learning Framework

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c5909.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2857-2862

Keyword(s):

Machine Learning ◽

Personal Information ◽

Machine Learning Algorithms ◽

Sensitive Information ◽

Cyber Attack ◽

Learning Framework ◽

Internet Users ◽

User Data ◽

Feature Based ◽

Classification Prediction

Phishing is a cyber-attack which is socially engineered to trick naive online users into revealing sensitive information such as user data, login credentials, social security number, banking information etc. Attackers fool the Internet users by posing as a legitimate webpage to retrieve personal information. This can also be done by sending emails posing as reputable companies or businesses. Phishing exploits several vulnerabilities effectively and there is no one solution which protects users from all vulnerabilities. A classification/prediction model is designed based on heuristic features that are extracted from website domain, URL, web protocol, source code to eliminate the drawbacks of existing anti-phishing techniques. In the model we combine some existing solutions such as blacklisting and whitelisting, heuristics and visual-based similarity which provides higher level security. We use the model with different Machine Learning Algorithms, namely Logistic Regression, Decision Trees, K-Nearest Neighbours and Random Forests, and compare the results to find the most efficient machine learning framework.

Download Full-text

Machine Learning Approaches for Credit Card Fraud Detection

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.9356 ◽

2018 ◽

Vol 7 (2) ◽

pp. 917

Author(s):

S Venkata Suryanarayana ◽

G N. Balaji ◽

G Venkateswara Rao

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Language Processing ◽

Credit Card ◽

Fraud Detection ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Credit Card Fraud ◽

Public Data ◽

The Impact

With the extensive use of credit cards, fraud appears as a major issue in the credit card business. It is hard to have some figures on the impact of fraud, since companies and banks do not like to disclose the amount of losses due to frauds. At the same time, public data are scarcely available for confidentiality issues, leaving unanswered many questions about what is the best strategy. Another problem in credit-card fraud loss estimation is that we can measure the loss of only those frauds that have been detected, and it is not possible to assess the size of unreported/undetected frauds. Fraud patterns are changing rapidly where fraud detection needs to be re-evaluated from a reactive to a proactive approach. In recent years, machine learning has gained lot of popularity in image analysis, natural language processing and speech recognition. In this regard, implementation of efficient fraud detection algorithms using machine-learning techniques is key for reducing these losses, and to assist fraud investigators. In this paper logistic regression, based machine learning approach is utilized to detect credit card fraud. The results show logistic regression based approaches outperforms with the highest accuracy and it can be effectively used for fraud investigators.

Download Full-text

Informational and emotional elements in online support groups: a Bayesian approach to large-scale content analysis

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocv190 ◽

2016 ◽

Vol 23 (3) ◽

pp. 508-513 ◽

Cited By ~ 14

Author(s):

Ulrike Deetjen ◽

John A Powell

Keyword(s):

Machine Learning ◽

Support Groups ◽

Emotional Support ◽

Large Scale ◽

Training Dataset ◽

Online Support ◽

Online Support Groups ◽

Learning Approaches ◽

Physical Conditions ◽

Irritable Bowel

Objective This research examines the extent to which informational and emotional elements are employed in online support forums for 14 purposively sampled chronic medical conditions and the factors that influence whether posts are of a more informational or emotional nature. Methods Large-scale qualitative data were obtained from Dailystrength.org. Based on a hand-coded training dataset, all posts were classified into informational or emotional using a Bayesian classification algorithm to generalize the findings. Posts that could not be classified with a probability of at least 75% were excluded. Results The overall tendency toward emotional posts differs by condition: mental health (depression, schizophrenia) and Alzheimer’s disease consist of more emotional posts, while informational posts relate more to nonterminal physical conditions (irritable bowel syndrome, diabetes, asthma). There is no gender difference across conditions, although prostate cancer forums are oriented toward informational support, whereas breast cancer forums rather feature emotional support. Across diseases, the best predictors for emotional content are lower age and a higher number of overall posts by the support group member. Discussion The results are in line with previous empirical research and unify empirical findings from single/2-condition research. Limitations include the analytical restriction to predefined categories (informational, emotional) through the chosen machine-learning approach. Conclusion Our findings provide an empirical foundation for building theory on informational versus emotional support across conditions, give insights for practitioners to better understand the role of online support groups for different patients, and show the usefulness of machine-learning approaches to analyze large-scale qualitative health data from online settings.

Download Full-text

Data mining tools -a case study for network intrusion detection

Multimedia Tools and Applications ◽

10.1007/s11042-020-09916-0 ◽

2020 ◽

Author(s):

Soodeh Hosseini ◽

Saman Rafiee Sardo

Keyword(s):

Machine Learning ◽

Data Mining ◽

Intrusion Detection ◽

Network Intrusion Detection ◽

Learning Approaches ◽

Learning Tools ◽

Network Intrusion ◽

The Face ◽

Level Of Knowledge ◽

Mining Tools

Abstract With the growth of data mining and machine learning approaches in recent years, many efforts have been made to generalize these sciences so that researchers from any field can easily utilize these sciences. One of the most important of these efforts is the development of data mining tools that try to hide the complexities from researchers so that they can achieve a professional output with any level of knowledge. This paper is focused on reviewing and comparing data mining and machine learning tools including WEKA, KNIME, Keel, Orange, Azure, IBM SPSS Modeler, R and Scikit-Learn to show what approach each of these methods has taken in the face of the complexities and problems of different scenarios of generalization of data mining and machine learning. In addition, for a more detailed review, this paper examines the challenge of network intrusion detection in two tools, Knime with graphical interface and Scikit-Learn with coding environment.

Download Full-text