Traffic Refinery

Francesco Bronzino; Paul Schmitt; Sara Ayoubi; Hyojoon Kim; Renata Teixeira; Nick Feamster

doi:10.1145/3491052

Traffic Refinery

Proceedings of the ACM on Measurement and Analysis of Computing Systems ◽

10.1145/3491052 ◽

2021 ◽

Vol 5 (3) ◽

pp. 1-24

Author(s):

Francesco Bronzino ◽

Paul Schmitt ◽

Sara Ayoubi ◽

Hyojoon Kim ◽

Renata Teixeira ◽

...

Keyword(s):

Machine Learning ◽

Network Management ◽

Network Traffic ◽

Two Dimensions ◽

Learning Performance ◽

Model Accuracy ◽

Feature Representations ◽

Model Training ◽

Joint Evaluation ◽

New Framework

Network management often relies on machine learning to make predictions about performance and security from network traffic. Often, the representation of the traffic is as important as the choice of the model. The features that the model relies on, and the representation of those features, ultimately determine model accuracy, as well as where and whether the model can be deployed in practice. Thus, the design and evaluation of these models ultimately requires understanding not only model accuracy but also the systems costs associated with deploying the model in an operational network. Towards this goal, this paper develops a new framework and system that enables a joint evaluation of both the conventional notions of machine learning performance (e.g., model accuracy) and the systems-level costs of different representations of network traffic. We highlight these two dimensions for two practical network management tasks, video streaming quality inference and malware detection, to demonstrate the importance of exploring different representations to find the appropriate operating point. We demonstrate the benefit of exploring a range of representations of network traffic and present Traffic Refinery, a proof-of-concept implementation that both monitors network traffic at 10~Gbps and transforms traffic in real time to produce a variety of feature representations for machine learning. Traffic Refinery both highlights this design space and makes it possible to explore different representations for learning, balancing systems costs related to feature extraction and model training against model accuracy.

Download Full-text

A Comparative Study of Traffic Classification Techniques for Smart City Networks

Sensors ◽

10.3390/s21144677 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4677

Author(s):

Razan M. AlZoman ◽

Mohammed J. F. Alenazi

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Network Management ◽

Network Traffic ◽

Smart City ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Traffic Classification ◽

Network Traffic Classification

Smart city networks involve many applications that impose specific Quality of Service (QoS) requirements, thus representing a challenging scenario for network management. Solutions aiming to guarantee QoS support have not been deployed in large-scale networks. Traffic classification is a mechanism used to manage different aspects, including QoS requirements. However, conventional traffic classification methods, such as the port-based method, are inefficient because of their inability to handle dynamic port allocation and encryption. Traffic classification using machine learning has gained research interest as an alternative method to achieve high performance. In fact, machine learning embeds intelligence into network functions, thus improving network management. In this study, we apply machine learning algorithms to predict network traffic classification. We apply four supervised learning algorithms: support vector machine, random forest, k-nearest neighbors, and decision tree. We also apply a port-based method of traffic classification based on applications’ popular assigned port numbers. Then, we compare the results of this method to those obtained from the machine learning algorithms. The evaluation results indicate that the decision tree algorithm provides the highest average accuracy among the evaluated algorithms, at 99.18%. Moreover, network traffic classification using machine learning provides more accurate results and higher performance than the port-based method.

Download Full-text

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

10.26434/chemrxiv.5513581.v1 ◽

2017 ◽

Author(s):

Sabrina Jaeger ◽

Simone Fulle ◽

Samo Turk

Keyword(s):

Machine Learning ◽

Language Processing ◽

Supervised Machine Learning ◽

Learning Approach ◽

Learning Approaches ◽

Unsupervised Machine Learning ◽

Feature Representations ◽

Machine Learning Approach ◽

The Individual ◽

Vector Representations

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.

Download Full-text

Comparison of Machine Learning Performance for Earnings Forecasting

Journal of Taxation and Accounting ◽

10.35850/kjta.20.6.01 ◽

2019 ◽

Vol 20 (6) ◽

pp. 9-34

Author(s):

Woo June Jung

Keyword(s):

Machine Learning ◽

Learning Performance ◽

Earnings Forecasting

Download Full-text

The Disagreement Deconvolution: Bringing Machine Learning Performance Metrics In Line With Reality

Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems ◽

10.1145/3411764.3445423 ◽

2021 ◽

Author(s):

Mitchell L. Gordon ◽

Kaitlyn Zhou ◽

Kayur Patel ◽

Tatsunori Hashimoto ◽

Michael S. Bernstein

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Learning Performance

Download Full-text

Semi-Lagrangian Subgrid Reconstruction for Advection-Dominant Multiscale Problems with Rough Data

Journal of Scientific Computing ◽

10.1007/s10915-021-01451-w ◽

2021 ◽

Vol 87 (2) ◽

Author(s):

Konrad Simon ◽

Jörn Behrens

Keyword(s):

Nonlinear Problems ◽

Higher Dimensions ◽

Two Dimensions ◽

Multiscale Methods ◽

Galerkin Methods ◽

Standard Methods ◽

Multiscale Problems ◽

Eulerian Method ◽

Lower Order Terms ◽

New Framework

AbstractWe introduce a new framework of numerical multiscale methods for advection-dominated problems motivated by climate sciences. Current numerical multiscale methods (MsFEM) work well on stationary elliptic problems but have difficulties when the model involves dominant lower order terms. Our idea to overcome the associated difficulties is a semi-Lagrangian based reconstruction of subgrid variability into a multiscale basis by solving many local inverse problems. Globally the method looks like a Eulerian method with multiscale stabilized basis. We show example runs in one and two dimensions and a comparison to standard methods to support our ideas and discuss possible extensions to other types of Galerkin methods, higher dimensions and nonlinear problems.

Download Full-text

Web Attack Detection through Network-Traffic-Based Feature Engineering and Machine Learning

Proceedings of the 2020 International Conference on Cyberspace Innovation of Advanced Technologies ◽

10.1145/3444370.3444555 ◽

2020 ◽

Author(s):

Jian Yang ◽

Hao Wang ◽

Yueming Lu

Keyword(s):

Machine Learning ◽

Network Traffic ◽

Attack Detection ◽

Feature Engineering

Download Full-text

Evaluating Machine Learning Performance for Safe, Intelligent Robots

2021 IEEE International Conference on Intelligence and Safety for Robotics (ISR) ◽

10.1109/isr50024.2021.9419381 ◽

2021 ◽

Author(s):

Raymond Sheh

Keyword(s):

Machine Learning ◽

Learning Performance ◽

Intelligent Robots

Download Full-text

Machine Learning Performance Validation and Training Using a ‘Perfect’ Expert System

MethodsX ◽

10.1016/j.mex.2021.101477 ◽

2021 ◽

pp. 101477

Author(s):

Jeremy Straub

Keyword(s):

Machine Learning ◽

Expert System ◽

Learning Performance ◽

Performance Validation ◽

And Training

Download Full-text

Utilising Flow Aggregation to Classify Benign Imitating Attacks

Sensors ◽

10.3390/s21051761 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1761

Author(s):

Hanan Hindy ◽

Robert Atkinson ◽

Christos Tachtatzis ◽

Ethan Bayne ◽

Miroslav Bures ◽

...

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Network Traffic ◽

Cyber Attacks ◽

Detection Accuracy ◽

Computational Power ◽

Human Understanding ◽

Flow Aggregation ◽

Additional Level ◽

Attack Surfaces

Cyber-attacks continue to grow, both in terms of volume and sophistication. This is aided by an increase in available computational power, expanding attack surfaces, and advancements in the human understanding of how to make attacks undetectable. Unsurprisingly, machine learning is utilised to defend against these attacks. In many applications, the choice of features is more important than the choice of model. A range of studies have, with varying degrees of success, attempted to discriminate between benign traffic and well-known cyber-attacks. The features used in these studies are broadly similar and have demonstrated their effectiveness in situations where cyber-attacks do not imitate benign behaviour. To overcome this barrier, in this manuscript, we introduce new features based on a higher level of abstraction of network traffic. Specifically, we perform flow aggregation by grouping flows with similarities. This additional level of feature abstraction benefits from cumulative information, thus qualifying the models to classify cyber-attacks that mimic benign traffic. The performance of the new features is evaluated using the benchmark CICIDS2017 dataset, and the results demonstrate their validity and effectiveness. This novel proposal will improve the detection accuracy of cyber-attacks and also build towards a new direction of feature extraction for complex ones.

Download Full-text

Machine Learning Classification and Regression Approaches for Optical Network Traffic Prediction

Electronics ◽

10.3390/electronics10131578 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1578

Author(s):

Daniel Szostak ◽

Adam Włodarczyk ◽

Krzysztof Walkowiak

Keyword(s):

Machine Learning ◽

Optical Networks ◽

Network Traffic ◽

Optical Network ◽

Optimization Methods ◽

Supervised Machine Learning ◽

Traffic Prediction ◽

Machine Learning Classification ◽

Classification And Regression ◽

Network Technologies

Rapid growth of network traffic causes the need for the development of new network technologies. Artificial intelligence provides suitable tools to improve currently used network optimization methods. In this paper, we propose a procedure for network traffic prediction. Based on optical networks’ (and other network technologies) characteristics, we focus on the prediction of fixed bitrate levels called traffic levels. We develop and evaluate two approaches based on different supervised machine learning (ML) methods—classification and regression. We examine four different ML models with various selected features. The tested datasets are based on real traffic patterns provided by the Seattle Internet Exchange Point (SIX). Obtained results are analyzed using a new quality metric, which allows researchers to find the best forecasting algorithm in terms of network resources usage and operational costs. Our research shows that regression provides better results than classification in case of all analyzed datasets. Additionally, the final choice of the most appropriate ML algorithm and model should depend on the network operator expectations.

Download Full-text