Efficient Evaluation of Power/Area/Latency Design Trade-Offs for Coarse-Grained Reconfigurable Processor Arrays

Dmitrij Kissler; Frank Hannig; Jürgen Teich

doi:10.1166/jolpe.2011.1114

Design Methodology and Trade-Offs Analysis for Parameterized Dynamically Reconfigurable Processor Arrays

2007 International Conference on Field Programmable Logic and Applications ◽

10.1109/fpl.2007.4380771 ◽

2007 ◽

Cited By ~ 4

Author(s):

Yohei Hasegawa ◽

Satoshi Tsutsumi ◽

Vasutan Tanbunheng ◽

Takuro Nakamura ◽

Takashi Nishimura ◽

...

Keyword(s):

Design Methodology ◽

Reconfigurable Processor ◽

Dynamically Reconfigurable ◽

Processor Arrays ◽

Trade Offs

Download Full-text

Scalable Many-Domain Power Gating in Coarse-Grained Reconfigurable Processor Arrays

IEEE Embedded Systems Letters ◽

10.1109/les.2011.2124438 ◽

2011 ◽

Vol 3 (2) ◽

pp. 58-61 ◽

Cited By ~ 6

Author(s):

Dmitrij Kissler ◽

Daniel Gran ◽

Zoran Salcic ◽

Frank Hannig ◽

Jürgen Teich

Keyword(s):

Coarse Grained ◽

Power Gating ◽

Reconfigurable Processor ◽

Processor Arrays

Download Full-text

Leakage power reduction for coarse-grained dynamically reconfigurable processor arrays using Dual Vt cells

2009 International Conference on Field-Programmable Technology ◽

10.1109/fpt.2009.5377641 ◽

2009 ◽

Cited By ~ 5

Author(s):

Kei'ichiro Hirai ◽

Masaru Kato ◽

Yoshiki Saito ◽

Hideharu Amano

Keyword(s):

Power Reduction ◽

Leakage Power ◽

Coarse Grained ◽

Reconfigurable Processor ◽

Dynamically Reconfigurable ◽

Processor Arrays

Download Full-text

Leakage power reduction for coarse grained dynamically reconfigurable processor arrays with fine grained Power Gating technique

2008 International Conference on Field-Programmable Technology ◽

10.1109/fpt.2008.4762410 ◽

2008 ◽

Cited By ~ 12

Author(s):

Yoshiki Saito ◽

Tomoaki Shirai ◽

Takuro Nakamura ◽

Takashi Nishimura ◽

Yohei Hasegawa ◽

...

Keyword(s):

Power Reduction ◽

Leakage Power ◽

Coarse Grained ◽

Power Gating ◽

Reconfigurable Processor ◽

Dynamically Reconfigurable ◽

Fine Grained ◽

Processor Arrays

Download Full-text

A data-flow graph generation algorithm for a coarse-grained reconfigurable processor

2009 IEEE 8th International Conference on ASIC ◽

10.1109/asicon.2009.5351548 ◽

2009 ◽

Author(s):

Chao Yang ◽

Shouyi Yin ◽

Leibo Liu ◽

Shaojun Wei

Keyword(s):

Data Flow ◽

Coarse Grained ◽

Data Flow Graph ◽

Flow Graph ◽

Generation Algorithm ◽

Reconfigurable Processor ◽

Graph Generation

Download Full-text

You Only Traverse Twice: A YOTT Placement, Routing, and Timing Approach for CGRAs

ACM Transactions on Embedded Computing Systems ◽

10.1145/3477038 ◽

2021 ◽

Vol 20 (5s) ◽

pp. 1-25

Author(s):

Michael Canesche ◽

Westerley Carvalho ◽

Lucas Reis ◽

Matheus Oliveira ◽

Salles Magalhães ◽

...

Keyword(s):

Execution Time ◽

High Performance ◽

Coarse Grained ◽

Optimal Placement ◽

Greedy Heuristics ◽

High Quality ◽

Solution Quality ◽

Graph Traversal ◽

Trade Offs ◽

Graph Properties

Coarse-grained reconfigurable architecture (CGRA) mapping involves three main steps: placement, routing, and timing. The mapping is an NP-complete problem, and a common strategy is to decouple this process into its independent steps. This work focuses on the placement step, and its aim is to propose a technique that is both reasonably fast and leads to high-performance solutions. Furthermore, a near-optimal placement simplifies the following routing and timing steps. Exact solutions cannot find placements in a reasonable execution time as input designs increase in size. Heuristic solutions include meta-heuristics, such as Simulated Annealing (SA) and fast and straightforward greedy heuristics based on graph traversal. However, as these approaches are probabilistic and have a large design space, it is not easy to provide both run-time efficiency and good solution quality. We propose a graph traversal heuristic that provides the best of both: high-quality placements similar to SA and the execution time of graph traversal approaches. Our placement introduces novel ideas based on “you only traverse twice” (YOTT) approach that performs a two-step graph traversal. The first traversal generates annotated data to guide the second step, which greedily performs the placement, node per node, aided by the annotated data and target architecture constraints. We introduce three new concepts to implement this technique: I/O and reconvergence annotation, degree matching, and look-ahead placement. Our analysis of this approach explores the placement execution time/quality trade-offs. We point out insights on how to analyze graph properties during dataflow mapping. Our results show that YOTT is 60.6 , 9.7 , and 2.3 faster than a high-quality SA, bounding box SA VPR, and multi-single traversal placements, respectively. Furthermore, YOTT reduces the average wire length and the maximal FIFO size (additional timing requirement on CGRAs) to avoid delay mismatches in fully pipelined architectures.

Download Full-text

An efficient implementation of Motion Compensation for AVS HD application based on a coarse-grained reconfigurable processor

2010 10th IEEE International Conference on Solid-State and Integrated Circuit Technology ◽

10.1109/icsict.2010.5667317 ◽

2010 ◽

Cited By ~ 1

Author(s):

Jing Zhao ◽

Li Zhou ◽

Qingdong Yu ◽

Jie Chen

Keyword(s):

Motion Compensation ◽

Efficient Implementation ◽

Coarse Grained ◽

Reconfigurable Processor

Download Full-text

Fast ant colony optimization on reconfigurable processor arrays

Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001 ◽

10.1109/ipdps.2001.925130 ◽

2005 ◽

Cited By ~ 2

Author(s):

D. Merkle ◽

M. Middendorf

Keyword(s):

Ant Colony Optimization ◽

Ant Colony ◽

Reconfigurable Processor ◽

Processor Arrays

Download Full-text

Coarse-grained tight-binding models

Journal of Physics Condensed Matter ◽

10.1088/1361-648x/ac443f ◽

2021 ◽

Author(s):

Tianxiang Liu ◽

Li Mao ◽

Mats-Erik Pistol ◽

Craig Pryor

Keyword(s):

Quantum Well ◽

Computing Time ◽

Numerical Test ◽

Tight Binding ◽

Computation Time ◽

Coarse Grained ◽

Weakly Bound ◽

Trade Offs ◽

Full Calculation ◽

Binding Models

Abstract Calculating the electronic structure of systems involving very different length scales presents a challenge. Empirical atomistic descriptions such as pseudopotentials or tight-binding models allow one to calculate the effects of atomic placements, but the computational burden increases rapidly with the size of the system, limiting the ability to treat weakly bound extended electronic states. Here we propose a new method to connect atomistic and quasi-continuous models, thus speeding up tight-binding calculations for large systems. We divide a structure into blocks consisting of several unit cells which we diagonalize individually. We then construct a tight-binding Hamiltonian for the full structure using a truncated basis for the blocks, ignoring states having large energy eigenvalues and retaining states with an energy close to the band edge energies. A numerical test using a GaAs/AlAs quantum well shows the computation time can be decreased to less than 5% of the full calculation with errors of less than 1%. We give data for the trade-offs between computing time and loss of accuracy. We also tested calculations of the density of states for a GaAs/AlAs quantum well and find a ten times speedup without much loss in accuracy.

Download Full-text

Extracting Coarse-Grained Pipelined Parallelism Out of Sequential Applications for Parallel Processor Arrays

Architecture of Computing Systems – ARCS 2009 - Lecture Notes in Computer Science ◽

10.1007/978-3-642-00454-4_4 ◽

2009 ◽

pp. 4-15 ◽

Cited By ~ 2

Author(s):

Dimitris Syrivelis ◽

Spyros Lalis

Keyword(s):

Coarse Grained ◽

Parallel Processor ◽

Processor Arrays

Download Full-text