random undersampling Latest Research Papers

Exploring Ensemble-Based Class Imbalance Learners for Intrusion Detection in Industrial Control Networks

Big Data and Cognitive Computing ◽

10.3390/bdcc5040072 ◽

2021 ◽

Vol 5 (4) ◽

pp. 72

Author(s):

Maya Hilda Lestari Louk ◽

Bayu Adhi Tama

Keyword(s):

Intrusion Detection ◽

Class Imbalance ◽

Imbalanced Data ◽

Kappa Statistics ◽

Classifier Ensembles ◽

Industrial Control ◽

Control Networks ◽

Detection Systems ◽

Random Undersampling ◽

Industrial Control Networks

Classifier ensembles have been utilized in the industrial cybersecurity sector for many years. However, their efficacy and reliability for intrusion detection systems remain questionable in current research, owing to the particularly imbalanced data issue. The purpose of this article is to address a gap in the literature by illustrating the benefits of ensemble-based models for identifying threats and attacks in a cyber-physical power grid. We provide a framework that compares nine cost-sensitive individual and ensemble models designed specifically for handling imbalanced data, including cost-sensitive C4.5, roughly balanced bagging, random oversampling bagging, random undersampling bagging, synthetic minority oversampling bagging, random undersampling boosting, synthetic minority oversampling boosting, AdaC2, and EasyEnsemble. Each ensemble’s performance is tested against a range of benchmarked power system datasets utilizing balanced accuracy, Kappa statistics, and AUC metrics. Our findings demonstrate that EasyEnsemble outperformed significantly in comparison to its rivals across the board. Furthermore, undersampling and oversampling strategies were effective in a boosting-based ensemble but not in a bagging-based ensemble.

Download Full-text

Prediction of Drug-Target Interaction Using Random Forest in Coronavirus Disease 2019 Case

Bioinformatics and Biomedical Research Journal ◽

10.11594/bbrj.04.01.01 ◽

2021 ◽

Vol 4 (1) ◽

pp. 1-7

Author(s):

Aulia Fadli ◽

Annisa Annisa ◽

Wisnu Ananta Kusuma

Keyword(s):

Random Forest ◽

Drug Target ◽

Imbalanced Data ◽

Drug Repurposing ◽

Extraction Process ◽

Random Forest Model ◽

Target Interaction ◽

Forest Model ◽

Random Undersampling ◽

Original Dataset

Coronavirus disease 2019 is an infectious disease that causes severe respiratory, digestive, and systemic infections that caused a pandemic in 2019. One of the focuses of the drug development process to fight the coronavirus disease 2019 is by carrying out drug repurposing. This study uses random forest with a feature-based chemogenomics approach on the drug-target interaction data of coronavirus disease 2019. The feature extraction process is carried out on compounds and protein using PubChem fingerprint and amino acid composition respectively. Feature selection using XGBoost is done to reduce the data dimension. The random undersampling process was also carried out to solve the problem of imbalanced data in the dataset. Using the cross-validation process, the random forest model produced an average accuracy value of 0.98, recall value of 0.92, precision value of 0.95, AUROC value of 0.95, and F1 score of 0.93. The random forest model also produced an accuracy value of 0.99, recall value of 0.93, the precision value of 0.94, AUROC value of 0.99, and F-measure of 0.94 when used to predict the original dataset (dataset without random undersampling process).

Download Full-text

Random Undersampling on Imbalance Time Series Data for Anomaly Detection

10.1145/3490725.3490748 ◽

2021 ◽

Author(s):

Mulyana Saripuddin ◽

Azizah Suliman ◽

Sera Syarmila Sameon ◽

Bo Norregaard Jorgensen

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Series Data ◽

Random Undersampling

Download Full-text

Classification of Electrocardiography Signals for User Authentication Based on Ensembles with Random Undersampling

2021 International Wireless Communications and Mobile Computing (IWCMC) ◽

10.1109/iwcmc51323.2021.9498788 ◽

2021 ◽

Author(s):

Silas L. Albuquerque ◽

Cristiano J. Miosso ◽

Adson Ferreira da Rocha ◽

Paulo de Lira Gondim

Keyword(s):

User Authentication ◽

Random Undersampling

Download Full-text

Detecting web attacks using random undersampling and ensemble learners

Journal Of Big Data ◽

10.1186/s40537-021-00460-8 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Richard Zuech ◽

John Hancock ◽

Taghi M. Khoshgoftaar

Keyword(s):

Operating Characteristic ◽

Performance Metrics ◽

Characteristic Curve ◽

Class Imbalance ◽

Classification Performance ◽

Web Attacks ◽

Random Undersampling ◽

Precision Recall Curve ◽

Research Questions ◽

Recall Curve

AbstractClass imbalance is an important consideration for cybersecurity and machine learning. We explore classification performance in detecting web attacks in the recent CSE-CIC-IDS2018 dataset. This study considers a total of eight random undersampling (RUS) ratios: no sampling, 999:1, 99:1, 95:5, 9:1, 3:1, 65:35, and 1:1. Additionally, seven different classifiers are employed: Decision Tree (DT), Random Forest (RF), CatBoost (CB), LightGBM (LGB), XGBoost (XGB), Naive Bayes (NB), and Logistic Regression (LR). For classification performance metrics, Area Under the Receiver Operating Characteristic Curve (AUC) and Area Under the Precision-Recall Curve (AUPRC) are both utilized to answer the following three research questions. The first question asks: “Are various random undersampling ratios statistically different from each other in detecting web attacks?” The second question asks: “Are different classifiers statistically different from each other in detecting web attacks?” And, our third question asks: “Is the interaction between different classifiers and random undersampling ratios significant for detecting web attacks?” Based on our experiments, the answers to all three research questions is “Yes”. To the best of our knowledge, we are the first to apply random undersampling techniques to web attacks from the CSE-CIC-IDS2018 dataset while exploring various sampling ratios.

Download Full-text

Investigating rarity in web attacks with ensemble learners

Journal Of Big Data ◽

10.1186/s40537-021-00462-6 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Richard Zuech ◽

John Hancock ◽

Taghi M. Khoshgoftaar

Keyword(s):

Characteristic Curve ◽

Class Imbalance ◽

Poor Performance ◽

Classification Performance ◽

Brute Force ◽

Experimental Conditions ◽

Web Attacks ◽

Random Undersampling ◽

Positive Class ◽

Performance Area

AbstractClass rarity is a frequent challenge in cybersecurity. Rarity occurs when the positive (attack) class only has a small number of instances for machine learning classifiers to train upon, thus making it difficult for the classifiers to discriminate and learn from the positive class. To investigate rarity, we examine three individual web attacks in big data from the CSE-CIC-IDS2018 dataset: “Brute Force-Web”, “Brute Force-XSS”, and “SQL Injection”. These three individual web attacks are also severely imbalanced, and so we evaluate whether random undersampling (RUS) treatments can improve the classification performance for these three individual web attacks. The following eight different levels of RUS ratios are evaluated: no sampling, 999:1, 99:1, 95:5, 9:1, 3:1, 65:35, and 1:1. For measuring classification performance, Area Under the Receiver Operating Characteristic Curve (AUC) metrics are obtained for the following seven different classifiers: Random Forest (RF), CatBoost (CB), LightGBM (LGB), XGBoost (XGB), Decision Tree (DT), Naive Bayes (NB), and Logistic Regression (LR) (with the first four learners being ensemble learners and for comparison, the last three being single learners). We find that applying random undersampling does improve overall classification performance with the AUC metric in a statistically significant manner. Ensemble learners achieve the top AUC scores after massive undersampling is applied, but the ensemble learners break down and have poor performance (worse than NB and DT) when no sampling is applied to our unique and harsh experimental conditions of severe class imbalance and rarity.

Download Full-text

Using Random Undersampling Boosting Classifier to Estimate Mode Shift Response to Bus Local Network Expansion and Bus Rapid Transit Services

International Journal of Civil Engineering ◽

10.1007/s40999-021-00635-7 ◽

2021 ◽

Author(s):

Qing Li ◽

Ana Karina Ramirez Huerta ◽

Andrew C. Mao ◽

Fengxiang Qiao

Keyword(s):

Local Network ◽

Bus Rapid Transit ◽

Rapid Transit ◽

Mode Shift ◽

Network Expansion ◽

Random Undersampling ◽

Shift Response ◽

Transit Services

Download Full-text

Resampling imbalanced data for network intrusion detection datasets

Journal Of Big Data ◽

10.1186/s40537-020-00390-x ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Sikha Bagui ◽

Kunqi Li

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Imbalanced Data ◽

Network Intrusion Detection ◽

Resampling Methods ◽

Training Time ◽

Detection Systems ◽

Network Intrusion ◽

Random Undersampling ◽

Network Intrusion Detection Systems

AbstractMachine learning plays an increasingly significant role in the building of Network Intrusion Detection Systems. However, machine learning models trained with imbalanced cybersecurity data cannot recognize minority data, hence attacks, effectively. One way to address this issue is to use resampling, which adjusts the ratio between the different classes, making the data more balanced. This research looks at resampling’s influence on the performance of Artificial Neural Network multi-class classifiers. The resampling methods, random undersampling, random oversampling, random undersampling and random oversampling, random undersampling with Synthetic Minority Oversampling Technique, and random undersampling with Adaptive Synthetic Sampling Method were used on benchmark Cybersecurity datasets, KDD99, UNSW-NB15, UNSW-NB17 and UNSW-NB18. Macro precision, macro recall, macro F1-score were used to evaluate the results. The patterns found were: First, oversampling increases the training time and undersampling decreases the training time; second, if the data is extremely imbalanced, both oversampling and undersampling increase recall significantly; third, if the data is not extremely imbalanced, resampling will not have much of an impact; fourth, with resampling, mostly oversampling, more of the minority data (attacks) were detected.

Download Full-text

Prediction of On-Street Parking Level of Service Based on Random Undersampling Decision Trees

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2021.3077985 ◽

2021 ◽

pp. 1-10

Author(s):

Ruben Fernandez Pozo ◽

Ana Belen Rodriguez Gonzalez ◽

Mark Richard Wilby ◽

Juan Jose Vinagre Diaz ◽

Miguel Viana Matesanz

Keyword(s):

Decision Trees ◽

Level Of Service ◽

Random Undersampling

Download Full-text

Improvement of Random Undersampling to Avoid Excessive Removal of Points from a Given Area of the Majority Class

Computational Science – ICCS 2021 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-77967-2_15 ◽

2021 ◽

pp. 172-186

Author(s):

Małgorzata Bach ◽

Aleksandra Werner

Keyword(s):

Random Undersampling

Download Full-text

random undersampling
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Exploring Ensemble-Based Class Imbalance Learners for Intrusion Detection in Industrial Control Networks

Prediction of Drug-Target Interaction Using Random Forest in Coronavirus Disease 2019 Case

Random Undersampling on Imbalance Time Series Data for Anomaly Detection

Classification of Electrocardiography Signals for User Authentication Based on Ensembles with Random Undersampling

Detecting web attacks using random undersampling and ensemble learners

Investigating rarity in web attacks with ensemble learners

Using Random Undersampling Boosting Classifier to Estimate Mode Shift Response to Bus Local Network Expansion and Bus Rapid Transit Services

Resampling imbalanced data for network intrusion detection datasets

Prediction of On-Street Parking Level of Service Based on Random Undersampling Decision Trees

Improvement of Random Undersampling to Avoid Excessive Removal of Points from a Given Area of the Majority Class

Export Citation Format

random undersamplingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Exploring Ensemble-Based Class Imbalance Learners for Intrusion Detection in Industrial Control Networks

Prediction of Drug-Target Interaction Using Random Forest in Coronavirus Disease 2019 Case

Random Undersampling on Imbalance Time Series Data for Anomaly Detection

Classification of Electrocardiography Signals for User Authentication Based on Ensembles with Random Undersampling

Detecting web attacks using random undersampling and ensemble learners

Investigating rarity in web attacks with ensemble learners

Using Random Undersampling Boosting Classifier to Estimate Mode Shift Response to Bus Local Network Expansion and Bus Rapid Transit Services

Resampling imbalanced data for network intrusion detection datasets

Prediction of On-Street Parking Level of Service Based on Random Undersampling Decision Trees

Improvement of Random Undersampling to Avoid Excessive Removal of Points from a Given Area of the Majority Class

random undersampling
Recently Published Documents