PCIU: Hardware Implementations of an Efficient Packet Classification Algorithm with an Incremental Update Capability

International Journal of Reconfigurable Computing ◽

10.1155/2011/648483 ◽

2011 ◽

Vol 2011 ◽

pp. 1-21 ◽

Cited By ~ 5

Author(s):

O. Ahmed ◽

S. Areibi ◽

K. Chattha ◽

B. Kelly

Keyword(s):

State Of The Art ◽

Packet Classification ◽

Classification Algorithm ◽

Software Implementation ◽

Network Services ◽

Hardware Accelerator ◽

Memory Consumption ◽

Hardware Implementations ◽

Speed Up ◽

Incremental Update

Packet classification plays a crucial role for a number of network services such as policy-based routing, firewalls, and traffic billing, to name a few. However, classification can be a bottleneck in the above-mentioned applications if not implemented properly and efficiently. In this paper, we propose PCIU, a novel classification algorithm, which improves upon previously published work. PCIU provides lower preprocessing time, lower memory consumption, ease of incremental rule update, and reasonable classification time compared to state-of-the-art algorithms. The proposed algorithm was evaluated and compared to RFC and HiCut using several benchmarks. Results obtained indicate that PCIU outperforms these algorithms in terms of speed, memory usage, incremental update capability, and preprocessing time. The algorithm, furthermore, was improved and made more accessible for a variety of applications through implementation in hardware. Two such implementations are detailed and discussed in this paper. The results indicate that a hardware/software codesign approach results in a slower, but easier to optimize and improve within time constraints, PCIU solution. A hardware accelerator based on an ESL approach using Handel-C, on the other hand, resulted in a 31x speed-up over a pure software implementation running on a state of the art Xeon processor.

Download Full-text

Corrigendum to “PCIU: Hardware Implementations of an Efficient Packet Classification Algorithm with an Incremental Update Capability”

International Journal of Reconfigurable Computing ◽

10.1155/2018/9595171 ◽

2018 ◽

Vol 2018 ◽

pp. 1-1

Author(s):

O. Ahmed ◽

S. Areibi ◽

K. Chattha ◽

B. Kelly

Keyword(s):

Packet Classification ◽

Classification Algorithm ◽

Hardware Implementations ◽

Incremental Update

Download Full-text

A Fast, Smart Packet Classification Algorithm Based on Decomposition

Journal of Control Science and Engineering ◽

10.1155/2020/8843471 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Chuanhong Li ◽

Xuewen Zeng ◽

Lei Song ◽

Yan Jiang

Keyword(s):

Vital Role ◽

Packet Classification ◽

Classification Algorithm ◽

Set Partitioning ◽

Experimental Results ◽

Classification Algorithms ◽

Memory Consumption ◽

Memory Overhead ◽

Rule Sets ◽

Rule Set

Packet classification algorithms have been the focus of research for the last few years, due to the vital role they play in various services based on packet forwarding. However, as the number of rules in the rule set increases, not only the preprocessing time but also the memory consumption is increasing greatly. In this paper, we first model and analyze the above issue in depth. Then, a fast, smart packet classification algorithm based on decomposition is proposed. By boundary-based rule traversal and smart rule set partitioning, both the preprocessing time and memory consumption are reduced dramatically. Experimental results show that the preprocessing time of our method achieves 8.8-time improvement at maximum compared with the PCIU and achieves about 31.5-time improvement on average compared with CutSplit for large rule sets. Meanwhile, the memory overhead is reduced by 40% at maximum and 27.5% on average compared with the PCIU.

Download Full-text

An Impulse-C Hardware Accelerator for Packet Classification Based on Fine/Coarse Grain Optimization

International Journal of Reconfigurable Computing ◽

10.1155/2013/130765 ◽

2013 ◽

Vol 2013 ◽

pp. 1-23 ◽

Cited By ~ 1

Author(s):

O. Ahmed ◽

S. Areibi ◽

R. Collier ◽

G. Grewal

Keyword(s):

Poor Performance ◽

Electronic System ◽

General Purpose ◽

Packet Classification ◽

Optimization Techniques ◽

System Level ◽

Coarse Grain ◽

Hardware Accelerator ◽

General Purpose Processor ◽

Incremental Update

Current software-based packet classification algorithms exhibit relatively poor performance, prompting many researchers to concentrate on novel frameworks and architectures that employ both hardware and software components. The Packet Classification with Incremental Update (PCIU) algorithm, Ahmed et al. (2010), is a novel and efficient packet classification algorithm with a unique incremental update capability that demonstrated excellent results and was shown to be scalable for many different tasks and clients. While a pure software implementation can generate powerful results on a server machine, an embedded solution may be more desirable for some applications and clients. Embedded, specialized hardware accelerator based solutions are typically much more efficient in speed, cost, and size than solutions that are implemented on general-purpose processor systems. This paper seeks to explore the design space of translating the PCIU algorithm into hardware by utilizing several optimization techniques, ranging from fine grain to coarse grain and parallel coarse grain approaches. The paper presents a detailed implementation of a hardware accelerator of the PCIU based on an Electronic System Level (ESL) approach. Results obtained indicate that the hardware accelerator achieves on average 27x speedup over a state-of-the-art Xeon processor.

Download Full-text

Hardware Accelerators Targeting a Novel Group Based Packet Classification Algorithm

International Journal of Reconfigurable Computing ◽

10.1155/2013/681894 ◽

2013 ◽

Vol 2013 ◽

pp. 1-33 ◽

Cited By ~ 3

Author(s):

O. Ahmed ◽

S. Areibi ◽

G. Grewal

Keyword(s):

Packet Classification ◽

Classification Algorithm ◽

Hardware Accelerators ◽

Instruction Set ◽

Worst Case ◽

Hardware Implementations ◽

Fast Network ◽

Rule Set ◽

Application Specific ◽

Case Classification

Packet classification is a ubiquitous and key building block for many critical network devices. However, it remains as one of the main bottlenecks faced when designing fast network devices. In this paper, we propose a novel Group Based Search packet classification Algorithm (GBSA) that is scalable, fast, and efficient. GBSA consumes an average of 0.4 megabytes of memory for a 10 k rule set. The worst-case classification time per packet is 2 microseconds, and the preprocessing speed is 3 M rules/second based on an Xeon processor operating at 3.4 GHz. When compared with other state-of-the-art classification techniques, the results showed that GBSA outperforms the competition with respect to speed, memory usage, and processing time. Moreover, GBSA is amenable to implementation in hardware. Three different hardware implementations are also presented in this paper including an Application Specific Instruction Set Processor (ASIP) implementation and two pure Register-Transfer Level (RTL) implementations based on Impulse-C and Handel-C flows, respectively. Speedups achieved with these hardware accelerators ranged from 9x to 18x compared with a pure software implementation running on an Xeon processor.

Download Full-text

OLBVH: octree linear bounding volume hierarchy for volumetric meshes

The Visual Computer ◽

10.1007/s00371-020-01886-6 ◽

2020 ◽

Vol 36 (10-12) ◽

pp. 2327-2340 ◽

Cited By ~ 1

Author(s):

Daniel Ströter ◽

Johannes S. Mueller-Roemer ◽

André Stork ◽

Dieter W. Fellner

Keyword(s):

Data Structure ◽

State Of The Art ◽

Memory Consumption ◽

Bounding Volumes ◽

Bounding Volume ◽

Speed Up ◽

Slicing Method ◽

Testing Speed ◽

Bounding Volume Hierarchy ◽

Memory Efficient

Abstract We present a novel bounding volume hierarchy for GPU-accelerated direct volume rendering (DVR) as well as volumetric mesh slicing and inside-outside intersection testing. Our novel octree-based data structure is laid out linearly in memory using space filling Morton curves. As our new data structure results in tightly fitting bounding volumes, boundary markers can be associated with nodes in the hierarchy. These markers can be used to speed up all three use cases that we examine. In addition, our data structure is memory-efficient, reducing memory consumption by up to 75%. Tree depth and memory consumption can be controlled using a parameterized heuristic during construction. This allows for significantly shorter construction times compared to the state of the art. For GPU-accelerated DVR, we achieve performance gain of 8.4$$\times $$ × –13$$\times $$ × . For 3D printing, we present an efficient conservative slicing method that results in a 3$$\times $$ × –25$$\times $$ × speedup when using our data structure. Furthermore, we improve volumetric mesh intersection testing speed by 5$$\times $$ × –52$$\times $$ × .

Download Full-text

Review on biomass feedstocks, pyrolysis mechanism and physicochemical properties of biochar: State-of-the-art framework to speed up vision of circular bioeconomy

Journal of Cleaner Production ◽

10.1016/j.jclepro.2021.126645 ◽

2021 ◽

Vol 297 ◽

pp. 126645

Author(s):

Gajanan Sampatrao Ghodake ◽

Surendra Krushna Shinde ◽

Avinash Ashok Kadam ◽

Rijuta Ganesh Saratale ◽

Ganesh Dattatraya Saratale ◽

...

Keyword(s):

Physicochemical Properties ◽

State Of The Art ◽

Pyrolysis Mechanism ◽

Biomass Feedstocks ◽

Speed Up

Download Full-text

MultilayerTuple: A General, Scalable and High-performance Packet Classification Algorithm for Software Defined Network System

2021 IFIP Networking Conference (IFIP Networking) ◽

10.23919/ifipnetworking52078.2021.9472824 ◽

2021 ◽

Author(s):

Chunyang Zhang ◽

Gaogang Xie

Keyword(s):

High Performance ◽

Packet Classification ◽

Classification Algorithm ◽

Network System ◽

Software Defined Network

Download Full-text

A Speed-up K-Nearest Neighbor Classification Algorithm for Trojan Detection

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Advanced Hybrid Information Processing ◽

10.1007/978-3-030-19086-6_24 ◽

2019 ◽

pp. 214-224

Author(s):

Tianshuang Li ◽

Xiang Ji ◽

Jingmei Li

Keyword(s):

Nearest Neighbor ◽

Classification Algorithm ◽

K Nearest Neighbor ◽

Nearest Neighbor Classification ◽

Trojan Detection ◽

Speed Up ◽

Neighbor Classification

Download Full-text

Efficient packet classification algorithm based on entropy

Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems - ANCS '10 ◽

10.1145/1872007.1872021 ◽

2010 ◽

Author(s):

Michal Kajan ◽

Jan Kořenek

Keyword(s):

Packet Classification ◽

Classification Algorithm

Download Full-text

ConnectIt

Proceedings of the VLDB Endowment ◽

10.14778/3436905.3436923 ◽

2020 ◽

Vol 14 (4) ◽

pp. 653-667

Author(s):

Laxman Dhulipala ◽

Changwan Hong ◽

Julian Shun

Keyword(s):

Experimental Evaluation ◽

Comprehensive Evaluation ◽

State Of The Art ◽

Graph Connectivity ◽

Connected Components ◽

Sampling Strategies ◽

Spanning Forest ◽

Speed Up ◽

Minimum Spanning Forest ◽

Edge Sampling

Connected components is a fundamental kernel in graph applications. The fastest existing multicore algorithms for solving graph connectivity are based on some form of edge sampling and/or linking and compressing trees. However, many combinations of these design choices have been left unexplored. In this paper, we design the ConnectIt framework, which provides different sampling strategies as well as various tree linking and compression schemes. ConnectIt enables us to obtain several hundred new variants of connectivity algorithms, most of which extend to computing spanning forest. In addition to static graphs, we also extend ConnectIt to support mixes of insertions and connectivity queries in the concurrent setting. We present an experimental evaluation of ConnectIt on a 72-core machine, which we believe is the most comprehensive evaluation of parallel connectivity algorithms to date. Compared to a collection of state-of-the-art static multicore algorithms, we obtain an average speedup of 12.4x (2.36x average speedup over the fastest existing implementation for each graph). Using ConnectIt, we are able to compute connectivity on the largest publicly-available graph (with over 3.5 billion vertices and 128 billion edges) in under 10 seconds using a 72-core machine, providing a 3.1x speedup over the fastest existing connectivity result for this graph, in any computational setting. For our incremental algorithms, we show that our algorithms can ingest graph updates at up to several billion edges per second. To guide the user in selecting the best variants in ConnectIt for different situations, we provide a detailed analysis of the different strategies. Finally, we show how the techniques in ConnectIt can be used to speed up two important graph applications: approximate minimum spanning forest and SCAN clustering.

Download Full-text