scholarly journals An Assessment of Intrusion Detection using Machine Learning on Traffic Statistical Data

Author(s):  
Qianru Zhou ◽  
Rongzhen Li ◽  
Lei Xu ◽  
Hongyi Zhu ◽  
Wanli Liu

<div> <div> <div> <p>Detecting Zero-Day intrusions has been the goal of Cybersecurity, especially intrusion detection for a long time. Machine learning is believed to be the promising methodology to solve that problem, numerous models have been proposed but a practical solution is still yet to come, mainly due to the limitation caused by the out-of-date open datasets available. In this paper, we propose an approach for Zero-Day intrusion detection based on machine learning, using flow-based statistical data generated by CICFlowMeter as training dataset. The machine learning classification model used is selected from eight most popular classification models, based on their cross validation results, in terms of precision, recall, F1 value, area under curve (AUC) and time overhead. Finally, the proposed system is tested on the testing dataset. To evaluate the feasibility and efficiency of tested models, the testing datasets are designed to contain novel types of intrusions (intrusions have not been trained during the training process). The normal data in the datasets are generated from real life traffic flows generated from daily use. Promising results have been received with the accuracy as high as almost 100%, false positive rate as low as nearly 0%, and with a reasonable time overhead. We argue that with the proper selected flow based statistical data, certain machine learning models such as MLP classifier, Quadratic discriminant analysis, K-Neighbor classifier have satisfying performance in detecting Zero-Day attacks. </p> </div> </div> </div>

2021 ◽  
Author(s):  
Qianru Zhou ◽  
Rongzhen Li ◽  
Lei Xu ◽  
Hongyi Zhu ◽  
Wanli Liu

<div> <div> <div> <p>Detecting Zero-Day intrusions has been the goal of Cybersecurity, especially intrusion detection for a long time. Machine learning is believed to be the promising methodology to solve that problem, numerous models have been proposed but a practical solution is still yet to come, mainly due to the limitation caused by the out-of-date open datasets available. In this paper, we propose an approach for Zero-Day intrusion detection based on machine learning, using flow-based statistical data generated by CICFlowMeter as training dataset. The machine learning classification model used is selected from eight most popular classification models, based on their cross validation results, in terms of precision, recall, F1 value, area under curve (AUC) and time overhead. Finally, the proposed system is tested on the testing dataset. To evaluate the feasibility and efficiency of tested models, the testing datasets are designed to contain novel types of intrusions (intrusions have not been trained during the training process). The normal data in the datasets are generated from real life traffic flows generated from daily use. Promising results have been received with the accuracy as high as almost 100%, false positive rate as low as nearly 0%, and with a reasonable time overhead. We argue that with the proper selected flow based statistical data, certain machine learning models such as MLP classifier, Quadratic discriminant analysis, K-Neighbor classifier have satisfying performance in detecting Zero-Day attacks. </p> </div> </div> </div>


Algorithms ◽  
2021 ◽  
Vol 14 (6) ◽  
pp. 187
Author(s):  
Aaron Barbosa ◽  
Elijah Pelofske ◽  
Georg Hahn ◽  
Hristo N. Djidjev

Quantum annealers, such as the device built by D-Wave Systems, Inc., offer a way to compute solutions of NP-hard problems that can be expressed in Ising or quadratic unconstrained binary optimization (QUBO) form. Although such solutions are typically of very high quality, problem instances are usually not solved to optimality due to imperfections of the current generations quantum annealers. In this contribution, we aim to understand some of the factors contributing to the hardness of a problem instance, and to use machine learning models to predict the accuracy of the D-Wave 2000Q annealer for solving specific problems. We focus on the maximum clique problem, a classic NP-hard problem with important applications in network analysis, bioinformatics, and computational chemistry. By training a machine learning classification model on basic problem characteristics such as the number of edges in the graph, or annealing parameters, such as the D-Wave’s chain strength, we are able to rank certain features in the order of their contribution to the solution hardness, and present a simple decision tree which allows to predict whether a problem will be solvable to optimality with the D-Wave 2000Q. We extend these results by training a machine learning regression model that predicts the clique size found by D-Wave.


2021 ◽  
Vol 13 (11) ◽  
pp. 6376
Author(s):  
Junseo Bae ◽  
Sang-Guk Yum ◽  
Ji-Myong Kim

Given the highly visible nature, transportation infrastructure construction projects are often exposed to numerous unexpected events, compared to other types of construction projects. Despite the importance of predicting financial losses caused by risk, it is still difficult to determine which risk factors are generally critical and when these risks tend to occur, without benchmarkable references. Most of existing methods are prediction-focused, project type-specific, while ignoring the timing aspect of risk. This study filled these knowledge gaps by developing a neural network-driven machine-learning classification model that can categorize causes of financial losses depending on insurance claim payout proportions and risk occurrence timing, drawing on 625 transportation infrastructure construction projects including bridges, roads, and tunnels. The developed network model showed acceptable classification accuracy of 74.1%, 69.4%, and 71.8% in training, cross-validation, and test sets, respectively. This study is the first of its kind by providing benchmarkable classification references of economic damage trends in transportation infrastructure projects. The proposed holistic approach will help construction practitioners consider the uncertainty of project management and the potential impact of natural hazards proactively, with the risk occurrence timing trends. This study will also assist insurance companies with developing sustainable financial management plans for transportation infrastructure projects.


Computers ◽  
2021 ◽  
Vol 10 (6) ◽  
pp. 79
Author(s):  
Henry Clausen ◽  
Gudmund Grov ◽  
David Aspinall

Anomaly-based intrusion detection methods aim to combat the increasing rate of zero-day attacks, however, their success is currently restricted to the detection of high-volume attacks using aggregated traffic features. Recent evaluations show that the current anomaly-based network intrusion detection methods fail to reliably detect remote access attacks. These are smaller in volume and often only stand out when compared to their surroundings. Currently, anomaly methods try to detect access attack events mainly as point anomalies and neglect the context they appear in. We present and examine a contextual bidirectional anomaly model (CBAM) based on deep LSTM-networks that is specifically designed to detect such attacks as contextual network anomalies. The model efficiently learns short-term sequential patterns in network flows as conditional event probabilities. Access attacks frequently break these patterns when exploiting vulnerabilities, and can thus be detected as contextual anomalies. We evaluated CBAM on an assembly of three datasets that provide both representative network access attacks, real-life traffic over a long timespan, and traffic from a real-world red-team attack. We contend that this assembly is closer to a potential deployment environment than current NIDS benchmark datasets. We show that, by building a deep model, we are able to reduce the false positive rate to 0.16% while effectively detecting six out of seven access attacks, which is significantly lower than the operational range of other methods. We further demonstrate that short-term flow structures remain stable over long periods of time, making the CBAM robust against concept drift.


BJGP Open ◽  
2018 ◽  
Vol 2 (2) ◽  
pp. bjgpopen18X101589 ◽  
Author(s):  
Emmanuel A Jammeh ◽  
Camille, B Carroll ◽  
Stephen, W Pearson ◽  
Javier Escudero ◽  
Athanasios Anastasiou ◽  
...  

BackgroundUp to half of patients with dementia may not receive a formal diagnosis, limiting access to appropriate services. It is hypothesised that it may be possible to identify undiagnosed dementia from a profile of symptoms recorded in routine clinical practice.AimThe aim of this study is to develop a machine learning-based model that could be used in general practice to detect dementia from routinely collected NHS data. The model would be a useful tool for identifying people who may be living with dementia but have not been formally diagnosed.Design & settingThe study involved a case-control design and analysis of primary care data routinely collected over a 2-year period. Dementia diagnosed during the study period was compared to no diagnosis of dementia during the same period using pseudonymised routinely collected primary care clinical data.MethodRoutinely collected Read-encoded data were obtained from 18 consenting GP surgeries across Devon, for 26 483 patients aged >65 years. The authors determined Read codes assigned to patients that may contribute to dementia risk. These codes were used as features to train a machine-learning classification model to identify patients that may have underlying dementia.ResultsThe model obtained sensitivity and specificity values of 84.47% and 86.67%, respectively.ConclusionThe results show that routinely collected primary care data may be used to identify undiagnosed dementia. The methodology is promising and, if successfully developed and deployed, may help to increase dementia diagnosis in primary care.


2019 ◽  
Author(s):  
Zied Hosni ◽  
Annalisa Riccardi ◽  
Stephanie Yerdelen ◽  
Alan R. G. Martin ◽  
Deborah Bowering ◽  
...  

<div><div><p>Polymorphism is the capacity of a molecule to adopt different conformations or molecular packing arrangements in the solid state. This is a key property to control during pharmaceutical manufacturing because it can impact a range of properties including stability and solubility. In this study, a novel approach based on machine learning classification methods is used to predict the likelihood for an organic compound to crystallise in multiple forms. A training dataset of drug-like molecules was curated from the Cambridge Structural Database (CSD) and filtered according to entries in the Drug Bank database. The number of separate forms in the CSD for each molecule was recorded. A metaclassifier was trained using this dataset to predict the expected number of crystalline forms from the compound descriptors. This approach was used to estimate the number of crystallographic forms for an external validation dataset. These results suggest this novel methodology can be used to predict the extent of polymorphism of new drugs or not-yet experimentally screened molecules. This promising method complements expensive ab initio methods for crystal structure prediction and as integral to experimental physical form screening, may identify systems that with unexplored potential.</p> </div> </div>


2020 ◽  
Vol 9 (10) ◽  
pp. 580 ◽  
Author(s):  
Maria Antonia Brovelli ◽  
Yaru Sun ◽  
Vasil Yordanov

Deforestation causes diverse and profound consequences for the environment and species. Direct or indirect effects can be related to climate change, biodiversity loss, soil erosion, floods, landslides, etc. As such a significant process, timely and continuous monitoring of forest dynamics is important, to constantly follow existing policies and develop new mitigation measures. The present work had the aim of mapping and monitoring the forest change from 2000 to 2019 and of simulating the future forest development of a rainforest region located in the Pará state, Brazil. The land cover dynamics were mapped at five-year intervals based on a supervised classification model deployed on the cloud processing platform Google Earth Engine. Besides the benefits of reduced computational time, the service is coupled with a vast data catalogue providing useful access to global products, such as multispectral images of the missions Landsat five, seven, eight and Sentinel-2. The validation procedures were done through photointerpretation of high-resolution panchromatic images obtained from CBERS (China–Brazil Earth Resources Satellite). The more than satisfactory results allowed an estimation of peak deforestation rates for the period 2000–2006; for the period 2006–2015, a significant decrease and stabilization, followed by a slight increase till 2019. Based on the derived trends a forest dynamics was simulated for the period 2019–2028, estimating a decrease in the deforestation rate. These results demonstrate that such a fusion of satellite observations, machine learning, and cloud processing, benefits the analysis of the forest dynamics and can provide useful information for the development of forest policies.


2013 ◽  
Vol 756-759 ◽  
pp. 3506-3510
Author(s):  
Qiu Chen Wang ◽  
Lei Wang ◽  
Ji Xiang

Traffic classification is a critical technology in the areas of network management and security monitoring. Traditional port-based and payload-based classification are no longer effective due to the fact that many applications utilize unpredictable port numbers and packet encryption. Researchers tend to apply machine learning (ML) techniques to identify the traffic flows by recognizing statistical features. Unfortunately, looking back upon the related work, most of the ML-based classification algorithms have similar performance, and what really matters now is how to optimize these techniques. In this paper, we analyzed two critical issues (Feature Selection, Configuration of Parameters) of ML classification, and presented the corresponding viable methods to optimize the classification model. This paper also reported the experimental evaluation to assess the performance improvements introduced by our optimized methods; experimental results on real-life datasets and network traffic show that the classification model successfully achieves significant accuracy improvement.


Sign in / Sign up

Export Citation Format

Share Document