processor array Latest Research Papers

Symbolic Loop Compilation for Tightly Coupled Processor Arrays

ACM Transactions on Embedded Computing Systems ◽

10.1145/3466897 ◽

2021 ◽

Vol 20 (5) ◽

pp. 1-31

Author(s):

Michael Witterauf ◽

Dominik Walter ◽

Frank Hannig ◽

Jürgen Teich

Keyword(s):

Matrix Multiplication ◽

Processor Array ◽

Problem Size ◽

Modulo Scheduling ◽

Processor Arrays ◽

Complete Problems ◽

Tightly Coupled ◽

Np Complete ◽

Configuration Data ◽

Improved Performance

Tightly Coupled Processor Arrays (TCPAs), a class of massively parallel loop accelerators, allow applications to offload computationally expensive loops for improved performance and energy efficiency. To achieve these two goals, executing a loop on a TCPA requires an efficient generation of specific programs as well as other configuration data for each distinct combination of loop bounds and number of available processing elements (PEs). Since both these parameters are generally unknown at compile time—the number of available PEs due to dynamic resource management, and the loop bounds, because they depend on the problem size—both the programs and configuration data must be generated at runtime. However, pure just-in-time compilation is impractical, because mapping a loop program onto a TCPA entails solving multiple NP-complete problems. As a solution, this article proposes a unique mixed static/dynamic approach called symbolic loop compilation. It is shown that at compile time, the NP-complete problems (modulo scheduling, register allocation, and routing) can still be solved to optimality in a symbolic way resulting in a so-called symbolic configuration , a space-efficient intermediate representation parameterized in the loop bounds and number of PEs. This phase is called symbolic mapping . At runtime, for each requested accelerated execution of a loop program with given loop bounds and known number of available PEs, a concrete configuration , including PE programs and configuration data for all other components, is generated from the symbolic configuration according to these parameter values. This phase is called instantiation . We describe both phases in detail and show that instantiation runs in polynomial time with its most complex step, program instantiation, not directly depending on the number of PEs and thus scaling to arbitrary sizes of TCPAs. To validate the efficiency of this mixed static/dynamic compilation approach, we apply symbolic loop compilation to a set of real-world loop programs from several domains, measuring both compilation time and space requirements. Our experiments confirm that a symbolic configuration is a space-efficient representation suited for systems with little memory—in many cases, a symbolic configuration is smaller than even a single concrete configuration instantiated from it—and that the times for the runtime phase of program instantiation and configuration loading are negligible and moreover independent of the size of the available processor array. To give an example, instantiating a configuration for a matrix-matrix multiplication benchmark takes equally long for 4× 4 and 32× 32 PEs.

Download Full-text

Weighted Node Mapping and Localisation on a Pixel Processor Array

10.1109/icra48506.2021.9561524 ◽

2021 ◽

Author(s):

Hector Castillo-Elizalde ◽

Yanan Liu ◽

Laurie Bose ◽

Walterio Mayol-Cuevas

Keyword(s):

Processor Array

Download Full-text

Agile reactive navigation for a non‐holonomic mobile robot using a pixel processor array

IET Image Processing ◽

10.1049/ipr2.12158 ◽

2021 ◽

Author(s):

Yanan Liu ◽

Laurie Bose ◽

Colin Greatwood ◽

Jianing Chen ◽

Rui Fan ◽

...

Keyword(s):

Mobile Robot ◽

Processor Array ◽

Reactive Navigation ◽

Holonomic Mobile Robot

Download Full-text

Live Demonstration: CNN Inference on the Focal Plane with a Pixel Processor Array

2020 IEEE International Symposium on Circuits and Systems (ISCAS) ◽

10.1109/iscas45731.2020.9180959 ◽

2020 ◽

Author(s):

Stephen J. Carey ◽

Laurie Bose ◽

Thomas Richardson ◽

Walterio Mayol-Cuevas ◽

Jianing Chen ◽

...

Keyword(s):

Focal Plane ◽

Processor Array ◽

Live Demonstration

Download Full-text

A signed pulse-train-based image processor-array for parallel kernel convolution in vision sensors

Sensor Review ◽

10.1108/sr-10-2019-0242 ◽

2020 ◽

Vol 40 (4) ◽

pp. 521-528

Author(s):

Ahmad Reza Danesh ◽

Mehdi Habibi

Keyword(s):

Image Processing ◽

Real Time ◽

High Speed ◽

Nearest Neighbor ◽

Operating Conditions ◽

Processor Array ◽

Image Processor ◽

Content Type ◽

Arbitrary Size ◽

Kernel Convolution

Purpose The purpose of this paper is to design a kernel convolution processor. High-speed image processing is a challenging task for real-time applications such as product quality control of manufacturing lines. Smart image sensors use an array of in-pixel processors to facilitate high-speed real-time image processing. These sensors are usually used to perform the initial low-level bulk image filtering and enhancement. Design/methodology/approach In this paper, using pulse-width modulated signals and regular nearest neighbor interconnections, a convolution image processor is presented. The presented processor is not only capable of processing arbitrary size kernels but also the kernel coefficients can be any arbitrary positive or negative floating number. Findings The performance of the proposed architecture is evaluated on a Xilinx Virtex-7 field programmable gate array platform. The peak signal-to-noise ratio metric is used to measure the computation error for different images, filters and illuminations. Finally, the power consumption of the circuit in different operating conditions is presented. Originality/value The presented processor array can be used for high-speed kernel convolution image processing tasks including arbitrary size edge detection and sharpening functions, which require negative and fractional kernel values.

Download Full-text

SPAR-2: A SIMD Processor Array for Machine Learning in IoT Devices

2020 3rd International Conference on Data Intelligence and Security (ICDIS) ◽

10.1109/icdis50059.2020.00025 ◽

2020 ◽

Author(s):

Suhail Basalama ◽

Atiyehsadat Panahi ◽

Ange-Thierry Ishimwe ◽

David Andrews

Keyword(s):

Machine Learning ◽

Processor Array ◽

Iot Devices

Download Full-text

Scalable energy-efficient parallel sorting on a fine-grained many-core processor array

Journal of Parallel and Distributed Computing ◽

10.1016/j.jpdc.2019.12.011 ◽

2020 ◽

Vol 138 ◽

pp. 32-47

Author(s):

Aaron Stillmaker ◽

Brent Bohnenstiehl ◽

Lucas Stillmaker ◽

Bevan Baas

Keyword(s):

Energy Efficient ◽

Processor Array ◽

Parallel Sorting ◽

Fine Grained ◽

Many Core

Download Full-text

Algorithms for Reconfiguring NoC-Based Fault-Tolerant Multiprocessor Arrays

Journal of Circuits System and Computers ◽

10.1142/s0218126619501111 ◽

2019 ◽

Vol 28 (07) ◽

pp. 1950111

Author(s):

Jigang Wu ◽

Yalan Wu ◽

Guiyuan Jiang ◽

Siew Kei Lam

Keyword(s):

Heuristic Algorithms ◽

Fault Tolerant ◽

Search Algorithm ◽

Fixed Number ◽

Processor Array ◽

Solution Quality ◽

Processing Elements ◽

Original Size ◽

Target Array ◽

Initial Target

This paper investigates the techniques to construct high-quality target processor array (fault-free logical subarray) from a physical array with faulty processing elements (PEs), where a fixed number of spare PEs are pre-integrated that can be used to replace the faulty ones when necessary. A reconfiguration algorithm is successfully developed based on our proposed novel shifting operations that can efficiently select proper spare PEs to replace the faulty ones. Then, the initial target array is further refined by a carefully designed tabu search algorithm. We also consider the problem of constructing a fault-free subarray with given size, instead of the original size, which is often required in energy-efficient MPSoC design. We propose two efficient heuristic algorithms to construct target arrays of given sizes leveraging a sliding window on the physical array. Simulation results show that the improvements of the proposed algorithms over the state of the art are [Formula: see text] and [Formula: see text], in terms of congestion factor and distance factor, respectively, for the case that all faulty PEs can be replaced using the spare ones. For the case of finding [Formula: see text] target array on [Formula: see text] host array, the proposed heuristic algorithm saves the running time up to [Formula: see text] while the solution quality keeps nearly unchanged, in comparison with the baseline algorithms.

Download Full-text

Scalable and Parameterizable Processor Array Architecture for Similarity Distance Computation

2019 10th International Conference on Information and Communication Systems (ICICS) ◽

10.1109/iacs.2019.8809140 ◽

2019 ◽

Author(s):

Awos Kanan ◽

Fayez Gebali ◽

Atef Ibrahim ◽

Kin Fun Li

Keyword(s):

Processor Array ◽

Distance Computation ◽

Array Architecture ◽

Similarity Distance

Download Full-text

Low-Complexity Scalable Architectures for Parallel Computation of Similarity Measures

Scientific Programming ◽

10.1155/2019/3185137 ◽

2019 ◽

Vol 2019 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Awos Kanan ◽

Fayez Gebali ◽

Atef Ibrahim ◽

Kin Fun Li

Keyword(s):

Ad Hoc ◽

Similarity Measures ◽

Low Complexity ◽

Processor Array ◽

Problem Size ◽

Scalable Architectures ◽

Data Dependencies ◽

Data Mining Algorithms ◽

The One ◽

Similarity Distance

Processor array architectures have been employed, as an accelerator, to compute similarity distance found in a variety of data mining algorithms. However, most of the proposed architectures in the existing literature are designed in an ad hoc manner without taking into consideration the size and dimensionality of the datasets. Furthermore, data dependencies have not been analyzed, and often, only one design choice is considered for the scheduling and mapping of computational tasks. In this work, we present a systematic methodology to design scalable and area-efficient linear (1-D) processor arrays for the computation of similarity distance matrices. Six possible design options are obtained and analyzed in terms of area and time complexities. The obtained architectures provide us with the flexibility to choose the one that meets hardware constraints for a specific problem size. Comparisons with the previously reported architectures demonstrate that one of the proposed architectures achieves less area and area-delay product besides its scalability to high-dimensional data.

Download Full-text

processor array
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Symbolic Loop Compilation for Tightly Coupled Processor Arrays

Weighted Node Mapping and Localisation on a Pixel Processor Array

Agile reactive navigation for a non‐holonomic mobile robot using a pixel processor array

Live Demonstration: CNN Inference on the Focal Plane with a Pixel Processor Array

A signed pulse-train-based image processor-array for parallel kernel convolution in vision sensors

SPAR-2: A SIMD Processor Array for Machine Learning in IoT Devices

Scalable energy-efficient parallel sorting on a fine-grained many-core processor array

Algorithms for Reconfiguring NoC-Based Fault-Tolerant Multiprocessor Arrays

Scalable and Parameterizable Processor Array Architecture for Similarity Distance Computation

Low-Complexity Scalable Architectures for Parallel Computation of Similarity Measures

Export Citation Format

processor arrayRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Symbolic Loop Compilation for Tightly Coupled Processor Arrays

Weighted Node Mapping and Localisation on a Pixel Processor Array

Agile reactive navigation for a non‐holonomic mobile robot using a pixel processor array

Live Demonstration: CNN Inference on the Focal Plane with a Pixel Processor Array

A signed pulse-train-based image processor-array for parallel kernel convolution in vision sensors

SPAR-2: A SIMD Processor Array for Machine Learning in IoT Devices

Scalable energy-efficient parallel sorting on a fine-grained many-core processor array

Algorithms for Reconfiguring NoC-Based Fault-Tolerant Multiprocessor Arrays

Scalable and Parameterizable Processor Array Architecture for Similarity Distance Computation

Low-Complexity Scalable Architectures for Parallel Computation of Similarity Measures

processor array
Recently Published Documents