scholarly journals Application-Specific SoC Design Using Core Mapping to 3D Mesh NoCs with Nonlinear Area Optimization and Simulated Annealing

Technologies ◽  
2020 ◽  
Vol 8 (1) ◽  
pp. 10
Author(s):  
Jan Moritz Joseph ◽  
Dominik Ermel ◽  
Lennart Bamberg ◽  
Alberto García-Oritz ◽  
Thilo Pionteck

Core mapping, in which a core graph is mapped to a network graph to minimize communication, is a common design problem for Systems-on-Chip interconnected by a Network-on-Chip. In conventional multiprocessors, this mapping is area-agnostic as the cores in the core graph are uniform and therefore iso-area. This changes for Systems-on-Chip because tasks are mapped to specific blocks and not general-purpose cores. Thus, the area of these specific cores is varying. This requires novel mapping methods. In this paper, we propose a an area-aware cost function for simulated annealing; Furthermore, we advocate the use of nonlinear models as the area is nonlinear: A semi-definite program (SDP) can be used as it is sufficiently fast and shows 20% better area than conventional linear models. Our cost function allows for up to 16.4% better area, 2% better communication (bandwidth times hop distance) and 13.8% better total bandwidth in the network in comparison to the standard approach that accounts for both the network communication and uses cores with varying areas as well.

2015 ◽  
Vol 57 (3) ◽  
Author(s):  
Lars Bauer ◽  
Jörg Henkel ◽  
Andreas Herkersdorf ◽  
Michael A. Kochte ◽  
Johannes M. Kühn ◽  
...  

AbstractAchieving system-level dependability is a demanding task. The manifold requirements and dependability threats can no longer be statically addressed at individual abstraction layers. Instead, all components of future multi-processor systems-on-chip (MPSoCs) have to contribute to this common goal in an adaptive manner.In this paper we target a generic heterogeneous MPSoC that combines general purpose processors along with dedicated application-specific hard-wired accelerators, fine-grained reconfigurable processors, and coarse-grained reconfigurable architectures. We present different


VLSI Design ◽  
2007 ◽  
Vol 2007 ◽  
pp. 1-15 ◽  
Author(s):  
Zvika Guz ◽  
Isask'har Walter ◽  
Evgeny Bolotin ◽  
Israel Cidon ◽  
Ran Ginosar ◽  
...  

Network-on-chip- (NoC-) based application-specific systems on chip, where information traffic is heterogeneous and delay requirements may largely vary, require individual capacity assignment for each link in the NoC. This is in contrast to the standard approach of on- and off-chip interconnection networks which employ uniform-capacity links. Therefore, the allocation of link capacities is an essential step in the automated design process of NoC-based systems. The algorithm should minimize the communication resource costs under Quality-of-Service timing constraints. This paper presents a novel analytical delay model for virtual channeled wormhole networks with nonuniform links and applies the analysis in devising an efficient capacity allocation algorithm which assigns link capacities such that packet delay requirements for each flow are satisfied.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Michele Amoretti

Networks on-chip (NoCs) provide enhanced performance, scalability, modularity, and design productivity as compared with previous communication architectures for VLSI systems on-chip (SoCs), such as buses and dedicated signal wires. Since the NoC design space is very large and high dimensional, evaluation methodologies rely heavily on analytical modeling and simulation. Unfortunately, there is no standard modeling framework. In this paper we illustrate how to design and evaluate NoCs by integrating the Discrete Event System Specification (DEVS) modeling framework and the simulation environment called DEUS. The advantage of such an approach is that both DEVS and DEUS support modularity—the former being a sound and complete modeling framework and the latter being an open, general-purpose platform, characterized by a steep learning curve and the possibility to simulate any system at any level of detail.


Author(s):  
Carsten Heinz ◽  
Jaco Hofmann ◽  
Jens Korinth ◽  
Lukas Sommer ◽  
Lukas Weber ◽  
...  

AbstractThe integration of FPGA-based accelerators into a complete heterogeneous system is a challenging task faced by many researchers and engineers, especially now that FPGAs enjoy increasing popularity as implementation platforms for efficient, application-specific accelerators for domains such as signal processing, machine learning and intelligent storage. To lighten the burden of system integration from the developers of accelerators, the open-source TaPaSCo framework presented in this work provides an automated toolflow for the construction of heterogeneous many-core architectures from custom processing elements, and a simple, uniform programming interface to utilize spatially distributed, parallel computation on FPGAs. TaPaSCo aims to increase the scalability and portability of FPGA designs through automated design space exploration, greatly simplifying the scaling of hardware designs and facilitating iterative growth and portability across FPGA devices and families. This work describes TaPaSCo with its primary design abstractions and shows how TaPaSCo addresses portability and extensibility of FPGA hardware designs for systems-on-chip. A study of successful projects using TaPaSCo shows its versatility and can serve as inspiration and reference for future users, with more details on the usage of TaPaSCo presented in an in-depth case study and a short overview of the workflow.


2003 ◽  
Vol 1 ◽  
pp. 171-175
Author(s):  
T. von Sydow ◽  
H. Blume ◽  
T. G. Noll

Abstract. Various reasons like technology progress, flexibility demands, shortened product cycle time and shortened time to market have brought up the possibility and necessity to integrate different architecture blocks on one heterogeneous System-on-Chip (SoC). Architecture blocks like programmable processor cores (DSP- and GPP-kernels), embedded FPGAs as well as dedicated macros will be integral parts of such a SoC. Especially programmable architecture blocks and associated optimization techniques are discussed in this contribution. Design space exploration and thus the choice which architecture blocks should be integrated in a SoC is a challenging task. Crucial to this exploration is the evaluation of the application domain characteristics and the costs caused by individual architecture blocks integrated on a SoC. An ATE-cost function has been applied to examine the performance of the aforementioned programmable architecture blocks. Therefore, representative discrete devices have been analyzed. Furthermore, several architecture dependent optimization steps and their effects on the cost ratios are presented.


2016 ◽  
Vol 2016 ◽  
pp. 1-11 ◽  
Author(s):  
Valery Sklyarov ◽  
Iouliia Skliarova ◽  
João Silva

Popcount computations are widely used in such areas as combinatorial search, data processing, statistical analysis, and bio- and chemical informatics. In many practical problems the size of initial data is very large and increase in throughput is important. The paper suggests two types of hardware accelerators that are (1) designed in FPGAs and (2) implemented in Zynq-7000 all programmable systems-on-chip with partitioning of algorithms that use popcounts between software of ARM Cortex-A9 processing system and advanced programmable logic. A three-level system architecture that includes a general-purpose computer, the problem-specific ARM, and reconfigurable hardware is then proposed. The results of experiments and comparisons with existing benchmarks demonstrate that although throughput of popcount computations is increased in FPGA-based designs interacting with general-purpose computers, communication overheads (in experiments with PCI express) are significant and actual advantages can be gained if not only popcount but also other types of relevant computations are implemented in hardware. The comparison of software/hardware designs for Zynq-7000 all programmable systems-on-chip with pure software implementations in the same Zynq-7000 devices demonstrates increase in performance by a factor ranging from 5 to 19 (taking into account all the involved communication overheads between the programmable logic and the processing systems).


Sign in / Sign up

Export Citation Format

Share Document