Mapping a Guided Image Filter on the HARP Reconfigurable Architecture Using OpenCL

Thomas Faict; Erik H. D’Hollander; Bart Goossens

doi:10.3390/a12080149

Mapping a Guided Image Filter on the HARP Reconfigurable Architecture Using OpenCL

Algorithms ◽

10.3390/a12080149 ◽

2019 ◽

Vol 12 (8) ◽

pp. 149

Author(s):

Thomas Faict ◽

Erik H. D’Hollander ◽

Bart Goossens

Keyword(s):

Fixed Point ◽

Programming Model ◽

Critical Parameters ◽

Floating Point ◽

Second Phase ◽

Processing Unit ◽

Image Filter ◽

Central Processing ◽

Guided Image Filter ◽

High Level

Intel recently introduced the Heterogeneous Architecture Research Platform, HARP. In this platform, the Central Processing Unit and a Field-Programmable Gate Array are connected through a high-bandwidth, low-latency interconnect and both share DRAM memory. For this platform, Open Computing Language (OpenCL), a High-Level Synthesis (HLS) language, is made available. By making use of HLS, a faster design cycle can be achieved compared to programming in a traditional hardware description language. This, however, comes at the cost of having less control over the hardware implementation. We will investigate how OpenCL can be applied to implement a real-time guided image filter on the HARP platform. In the first phase, the performance-critical parameters of the OpenCL programming model are defined using several specialized benchmarks. In a second phase, the guided image filter algorithm is implemented using the insights gained in the first phase. Both a floating-point and a fixed-point implementation were developed for this algorithm, based on a sliding window implementation. This resulted in a maximum floating-point performance of 135 GFLOPS, a maximum fixed-point performance of 430 GOPS and a throughput of HD color images at 74 frames per second.

Download Full-text

Comparative study of the implementation of the Lagrange interpolation algorithm on GPU and CPU using CUDA to compute the density of a material at different temperatures

SHS Web of Conferences ◽

10.1051/shsconf/202111907002 ◽

2021 ◽

Vol 119 ◽

pp. 07002

Author(s):

Youness Rtal ◽

Abdelkader Hadjoudja

Keyword(s):

Parallel Computing ◽

Graphics Processing Units ◽

Lagrange Interpolation ◽

Polynomial Interpolation ◽

Programming Model ◽

Interpolation Method ◽

Processing Unit ◽

Central Processing ◽

Computational Performance ◽

Different Temperatures

Graphics Processing Units (GPUs) are microprocessors attached to graphics cards, which are dedicated to the operation of displaying and manipulating graphics data. Currently, such graphics cards (GPUs) occupy all modern graphics cards. In a few years, these microprocessors have become potent tools for massively parallel computing. Such processors are practical instruments that serve in developing several fields like image processing, video and audio encoding and decoding, the resolution of a physical system with one or more unknowns. Their advantages: faster processing and consumption of less energy than the power of the central processing unit (CPU). In this paper, we will define and implement the Lagrange polynomial interpolation method on GPU and CPU to calculate the sodium density at different temperatures Ti using the NVIDIA CUDA C parallel programming model. It can increase computational performance by harnessing the power of the GPU. The objective of this study is to compare the performance of the implementation of the Lagrange interpolation method on CPU and GPU processors and to deduce the efficiency of the use of GPUs for parallel computing.

Download Full-text

Implementation of algebraic procedures on the GPU using CUDA architecture on the example of generalized eigenvalue problem

Open Computer Science ◽

10.1515/comp-2016-0006 ◽

2016 ◽

Vol 6 (1) ◽

pp. 79-90

Author(s):

Łukasz Syrocki ◽

Grzegorz Pestka

Keyword(s):

Eigenvalue Problem ◽

Graphics Processing Unit ◽

Generalized Eigenvalue Problem ◽

Processing Unit ◽

Graphics Processors ◽

Central Processing ◽

Generalized Eigenvalue ◽

Cuda Technology ◽

Cuda Architecture ◽

High Level

AbstractThe ready to use set of functions to facilitate solving a generalized eigenvalue problem for symmetric matrices in order to efficiently calculate eigenvalues and eigenvectors, using Compute Unified Device Architecture (CUDA) technology from NVIDIA, is provided. An integral part of the CUDA is the high level programming environment enabling tracking both code executed on Central Processing Unit and on Graphics Processing Unit. The presented matrix structures allow for the analysis of the advantages of using graphics processors in such calculations.

Download Full-text

High-Level Synthesis under Fixed-Point Accuracy Constraint

Journal of Electrical and Computer Engineering ◽

10.1155/2012/906350 ◽

2012 ◽

Vol 2012 ◽

pp. 1-14 ◽

Cited By ~ 6

Author(s):

Daniel Menard ◽

Nicolas Herve ◽

Olivier Sentieys ◽

Hai-Nam Nguyen

Keyword(s):

Signal Processing ◽

Fixed Point ◽

Word Length ◽

High Level Synthesis ◽

Floating Point ◽

Fixed Point Arithmetic ◽

Resource Binding ◽

High Level ◽

Point Arithmetic ◽

Length Optimization

Implementing signal processing applications in embedded systems generally requires the use of fixed-point arithmetic. The main problem slowing down the hardware implementation flow is the lack of high-level development tools to target these architectures from algorithmic specification language using floating-point data types. In this paper, a new method to automatically implement a floating-point algorithm into an FPGA or an ASIC using fixed-point arithmetic is proposed. An iterative process on high-level synthesis and data word-length optimization is used to improve both of these dependent processes. Indeed, high-level synthesis requires operator word-length knowledge to correctly execute its allocation, scheduling, and resource binding steps. Moreover, the word-length optimization requires resource binding and scheduling information to correctly group operations. To dramatically reduce the optimization time compared to fixed-point simulation-based methods, the accuracy evaluation is done through an analytical method. Different experiments on signal processing algorithms are presented to show the efficiency of the proposed method. Compared to classical methods, the average architecture area reduction is between 10% and 28%.

Download Full-text

BLVector: Fast BLAST-Like Algorithm for Manycore CPU With Vectorization

Frontiers in Genetics ◽

10.3389/fgene.2021.618659 ◽

2021 ◽

Vol 12 ◽

Author(s):

Sergio Gálvez ◽

Federico Agostini ◽

Javier Caselli ◽

Pilar Hernandez ◽

Gabriel Dorado

Keyword(s):

Amino Acids ◽

Execution Time ◽

High Performance ◽

Protein Sequences ◽

Processing Unit ◽

Central Processing ◽

Real Scenario ◽

High Level ◽

Performance Computing ◽

Comprehensive Study

New High-Performance Computing architectures have been recently developed for commercial central processing unit (CPU). Yet, that has not improved the execution time of widely used bioinformatics applications, like BLAST+. This is due to a lack of optimization between the bases of the existing algorithms and the internals of the hardware that allows taking full advantage of the available CPU cores. To optimize the new architectures, algorithms must be revised and redesigned; usually rewritten from scratch. BLVector adapts the high-level concepts of BLAST+ to the x86 architectures with AVX-512, to harness their capabilities. A deep comprehensive study has been carried out to optimize the approach, with a significant reduction in time execution. BLVector reduces the execution time of BLAST+ when aligning up to mid-size protein sequences (∼750 amino acids). The gain in real scenario cases is 3.2-fold. When applied to longer proteins, BLVector consumes more time than BLAST+, but retrieves a much larger set of results. BLVector and BLAST+ are fine-tuned heuristics. Therefore, the relevant results returned by both are the same, although they behave differently specially when performing alignments with low scores. Hence, they can be considered complementary bioinformatics tools.

Download Full-text

Counteracting UDP Flooding Attacks in SDN

Electronics ◽

10.3390/electronics9081239 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1239

Author(s):

Yung-Hao Tung ◽

Hung-Chuan Wei ◽

Yen-Wu Ti ◽

Yao-Tung Tsou ◽

Neetesh Saxena ◽

...

Keyword(s):

Network Performance ◽

Packet Transmission ◽

Processing Unit ◽

Centralized Control ◽

Available Bandwidth ◽

Network Attack ◽

Transmission Delays ◽

Central Processing ◽

Flooding Attacks ◽

High Level

Software-defined networking (SDN) is a new networking architecture with a centralized control mechanism. SDN has proven to be successful in improving not only the network performance, but also security. However, centralized control in the SDN architecture is associated with new security vulnerabilities. In particular, user-datagram-protocol (UDP) flooding attacks can be easily launched and cause serious packet-transmission delays, controller-performance loss, and even network shutdown. In response to applications in the Internet of Things (IoT) field, this study considers UDP flooding attacks in SDN and proposes two lightweight countermeasures. The first method sometimes sacrifices address-resolution-protocol (ARP) requests to achieve a high level of security. In the second method, although packets must sometimes be sacrificed when undergoing an attack before starting to defend, the detection of the network state can prevent normal packets from being sacrificed. When blocking a network attack, attacks from the affected port are directly blocked without affecting normal ports. The performance and security of the proposed methods were confirmed by means of extensive experiments. Compared with the situation where no defense is implemented, or similar defense methods are implemented, after simulating a UDP flooding attack, our proposed method performed better in terms of the available bandwidth, central-processing-unit (CPU) consumption, and network delay time.

Download Full-text

Honeywell radiation hardened 32-bit processor central processing unit, floating point processor, and cache memory dose rate and single event effects test results

1997 IEEE Radiation Effects Data Workshop NSREC Snowmass 1997. Workshop Record Held in conjunction with IEEE Nuclear and Space Radiation Effects Conference ◽

10.1109/redw.1997.629808 ◽

2002 ◽

Cited By ~ 4

Author(s):

G.R. Brown ◽

L.F. Hoffmann ◽

S.C. Leavy ◽

J.A. Mogensen ◽

J. Brichacek

Keyword(s):

Dose Rate ◽

Central Processing Unit ◽

Cache Memory ◽

Floating Point ◽

Processing Unit ◽

Test Results ◽

Single Event ◽

Single Event Effects ◽

Central Processing ◽

Radiation Hardened

Download Full-text

A Review of Factors Affecting the Effectiveness of Phishing

Computer and Information Science ◽

10.5539/cis.v15n1p20 ◽

2021 ◽

Vol 15 (1) ◽

pp. 20

Author(s):

Robert Karamagi

Keyword(s):

Low Cost ◽

Denial Of Service ◽

Random Access ◽

Processing Unit ◽

Factors Affecting ◽

Central Processing ◽

Advanced Persistent Threats ◽

Nation States ◽

Convenient Technique ◽

High Level

Phishing has become the most convenient technique that hackers use nowadays to gain access to protected systems. This is because cybersecurity has evolved and low-cost systems with the least security investments will need quite advanced and sophisticated mechanisms to be able to penetrate technically. Systems currently are equipped with at least some level of security, imposed by security firms with a very high level of expertise in managing the common and well-known attacks. This decreases the possible technical attack surface. Nation-states or advanced persistent threats (APTs), organized crime, and black hats possess the finance and skills to penetrate many different systems. However, they are always in need of the most available computing resources, such as central processing unit (CPU) and random-access memory (RAM), so they normally hack and hook computers into a botnet. This may allow them to perform dangerous distributed denial of service (DDoS) attacks and perform brute force cracking algorithms, which are highly CPU intensive. They may also use the zombie or drone systems they have hacked to hide their location on the net and gain anonymity by bouncing off around them many times a minute. Phishing allows them to gain their stretch of compromised systems to increase their power. For a normal hacker without the money to invest in sophisticated techniques, exploiting the human factor, which is the weakest link to security, comes in handy. The possibility of successfully manipulating the human into releasing the security that they set up makes the life of the hacker very easy, because they do not have to try to break into the system with force, rather the owner will just open the door for them. The objective of the research is to review factors that enhance phishing and improve the probability of its success. We have discovered that hackers rely on triggering the emotional effects of their victims through their phishing attacks. We have applied the use of artificial intelligence to be able to detect the emotion associated with a phrase or sentence. Our model had a good accuracy which could be improved with the use of a larger dataset with more emotional sentiments for various phrases and sentences. Our technique may be used to check for emotional manipulation in suspicious emails to improve the confidence interval of suspected phishing emails.

Download Full-text

Convergence: Commodity flight simulation and the future

The Aeronautical Journal ◽

10.1017/s0001924000002566 ◽

2008 ◽

Vol 112 (1136) ◽

pp. 599-607

Author(s):

K. Takeda ◽

S. J. Newman ◽

J. Kenny ◽

M. Zyskowski

Keyword(s):

Digital Media ◽

Software Industry ◽

Simulation Software ◽

Flight Simulation ◽

Third Party ◽

Computer Hardware ◽

Processing Unit ◽

Central Processing ◽

Data Portability ◽

High Level

Abstract The development of commodity flight simulation, in the form of PC game technology, continues to advance at a rapid pace. Indeed, the software industry is now being driven primarily by the requirements of gaming, digital media, and other entertainment applications. This has largely been due to the commoditisation of computer hardware, which is apparent when considering recent trends in central processing unit and graphics processor development. The flight simulation industry has benefited from this trend of hardware commoditisation, and will continue to do so for the foreseeable future. It is, however, yet to fully realise the potential for leveraging commodity-off-the-shelf (COTS) software. In this paper the opportunities presenting themselves for the next 25 years of flight simulation are discussed, as the aviation and games software industry’s requirements converge. A SWOT (strengths-weaknesses-opportunities-threats) analysis of the commodity flight simulation software industry is presented, including flight modelling, scenery generation, multiplayer technology, artificial intelligence, mission planning, and event handling. Issues such as data portability, economics, licensing, intellectual-property, interoperability, developer extensibility, robustness, qualification, and maintainability are addressed. Microsoft Flight Simulator is used as a case study of how commodity flight simulation has been extended to include extensive programmatic access to its core engine. Examples are given on how the base platform of this application can be extended by third-party developers and the power this extensibility model provides to the industry. This paper is presented to highlight particular technology trends in the commodity flight simulation industry, the fidelity that commodity flight simulations can provide, and to provide a high-level overview of the strengths and weaknesses thereof.

Download Full-text

Optimization of K-Means Clustering on Graphics Processing Unit Using Compute Unified Device Architecture

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2017.6274 ◽

2017 ◽

Vol 14 (1) ◽

pp. 789-795

Author(s):

V Saveetha ◽

S Sophia

Keyword(s):

High Performance ◽

Programming Model ◽

Graphics Processing Unit ◽

Direct Access ◽

Communication Overhead ◽

Processing Unit ◽

Compute Unified Device Architecture ◽

Central Processing ◽

Device Architecture ◽

Graphics Processing

Parallel data clustering aims at using algorithms and methods to extract knowledge from fat databases in rational time using high performance architectures. The computational challenge faced by cluster analysis due to increasing capacity of data can be overcome by exploiting the power of these architectures. The recent development in parallel power of Graphics Processing Unit enables low cost high performance solutions for general purpose applications. The Compute Unified Device Architecture programming model provides application programming interface methods to handle data proficiently on Graphics Processing Unit for iterative clustering algorithms like K-Means. The existing Graphics Processing Unit based K-Means algorithms highly focus on improvising the speedup of the algorithms and fall short to handle the high time spent on transfer of data between the Central Processing Unit and Graphics Processing Unit. A competent K-Means algorithm is proposed in this paper to lessen the transfer time by introducing a novel approach to check the convergence of the algorithm and utilize the pinned memory for direct access. This algorithm outperforms the other algorithms by maximizing parallelism and utilizing the memory features. The relative speedups and the validity measure for the proposed algorithm is elevated when compared with K-Means on Graphics Processing Unit and K-Means using Flag on Graphics Processing Unit. Thus the planned approach proves that communication overhead can be reduced in K-Means clustering.

Download Full-text

The Potential for a GPU-Like Overlay Architecture for FPGAs

International Journal of Reconfigurable Computing ◽

10.1155/2011/514581 ◽

2011 ◽

Vol 2011 ◽

pp. 1-15 ◽

Cited By ~ 14

Author(s):

Jeffrey Kingyens ◽

J. Gregory Steffan

Keyword(s):

Graphics Processing Units ◽

Programming Model ◽

Instruction Level Parallelism ◽

Floating Point ◽

High Level ◽

Graphics Processing ◽

Level Parallelism ◽

Data Level ◽

Accelerator System

We propose a soft processor programming model and architecture inspired by graphics processing units (GPUs) that are well-matched to the strengths of FPGAs, namely, highly parallel and pipelinable computation. In particular, our soft processor architecture exploits multithreading, vector operations, and predication to supply a floating-point pipeline of 64 stages via hardware support for up to 256 concurrent thread contexts. The key new contributions of our architecture are mechanisms for managing threads and register files that maximize data-level and instruction-level parallelism while overcoming the challenges of port limitations of FPGA block memories as well as memory and pipeline latency. Through simulation of a system that (i) is programmable via NVIDIA's high-levelCglanguage, (ii) supports AMD's CTM r5xx GPU ISA, and (iii) is realizable on an XtremeData XD1000 FPGA-based accelerator system, we demonstrate the potential for such a system to achieve 100% utilization of a deeply pipelined floating-point datapath.

Download Full-text