scholarly journals A Coarse-Grained Reconfigurable Architecture with Compilation for High Performance

2012 ◽  
Vol 2012 ◽  
pp. 1-17 ◽  
Author(s):  
Lu Wan ◽  
Chen Dong ◽  
Deming Chen

We propose afast data relay(FDR) mechanism to enhance existing CGRA (coarse-grained reconfigurable architecture). FDR can not only provide multicycle data transmission in concurrent with computations but also convert resource-demanding inter-processing-element global data accesses into local data accesses to avoid communication congestion. We also propose the supporting compiler techniques that can efficiently utilize the FDR feature to achieve higher performance for a variety of applications. Our results on FDR-based CGRA are compared with two other works in this field: ADRES and RCP. Experimental results for various multimedia applications show that FDR combined with the new compiler deliver up to 29% and 21% higher performance than ADRES and RCP, respectively.

2014 ◽  
Vol 57 (10) ◽  
pp. 1-18
Author(s):  
LeiBo Liu ◽  
YanSheng Wang ◽  
ShouYi Yin ◽  
Min Zhu ◽  
Xing Wang ◽  
...  

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Mehdi Srifi ◽  
Ahmed Oussous ◽  
Ayoub Ait Lahcen ◽  
Salma Mouline

AbstractVarious recommender systems (RSs) have been developed over recent years, and many of them have concentrated on English content. Thus, the majority of RSs from the literature were compared on English content. However, the research investigations about RSs when using contents in other languages such as Arabic are minimal. The researchers still neglect the field of Arabic RSs. Therefore, we aim through this study to fill this research gap by leveraging the benefit of recent advances in the English RSs field. Our main goal is to investigate recent RSs in an Arabic context. For that, we firstly selected five state-of-the-art RSs devoted originally to English content, and then we empirically evaluated their performance on Arabic content. As a result of this work, we first build four publicly available large-scale Arabic datasets for recommendation purposes. Second, various text preprocessing techniques have been provided for preparing the constructed datasets. Third, our investigation derived well-argued conclusions about the usage of modern RSs in the Arabic context. The experimental results proved that these systems ensure high performance when applied to Arabic content.


2014 ◽  
Vol 971-973 ◽  
pp. 714-717 ◽  
Author(s):  
Xiang Shi ◽  
Zhe Xu ◽  
Qing Yi He ◽  
Ka Tian

To control wheeled inverted pendulum is a good way to test all kinds of theories of control. The control law is designed, and it based on the collaborative simulation of MATLAB and ADAMS is used to control wheeled inverted pendulum. Then, with own design of hardware and software of control system, sliding mode control is used to wheeled inverted pendulum, and the experimental results of it indicate short adjusting time, the small overshoot and high performance.


2009 ◽  
Vol 156-158 ◽  
pp. 199-204
Author(s):  
Hiroaki Kariyazaki ◽  
Tatsuhiko Aoki ◽  
Kouji Izunome ◽  
Koji Sueoka

Hybrid crystal orientation technology (HOT) substrates comprised of Si (100) and (110) surface orientation paralleling each <110> direction attract considerable attentions as one of the promising technology for high performance bulk CMOS technology. Although HOT substrates are fabricated by wafer bonding of Si (110) and Si (100) surfaces, it is not clear the atomic configuration of interfacial structure. Furthermore, the possibility for the interface to be an effective gettering source of impurity metals was not well studied. In this paper, we studied the interfacial structure and gettering efficiency of the atomic bonded interface by molecular simulations. The results indicate that the simulated atomic configuration and gettering efficiency of the bonded interface agreed well with the experimental results.


2012 ◽  
Vol 479-481 ◽  
pp. 65-70
Author(s):  
Xiao Hui Zhang ◽  
Liu Qing ◽  
Mu Li

Based on the target detection of alignment template, the paper designs a lane alignment template by using correlation matching method, and combines with genetic algorithm for template stochastic matching and optimization to realize the lane detection. In order to solve the real-time problem of lane detection algorithm based on genetic algorithm, this paper uses the high performance multi-core DSP chip TMS320C6474 as the core, combines with high-speed data transmission technology of Rapid10, realizes the hardware parallel processing of the lane detection algorithm. By Rapid10 bus, the data transmission speed between the DSP and the DSP can reach 3.125Gbps, it basically realizes transmission without delay, and thereby solves the high speed transmission of the large data quantity between processor. The experimental results show that, no matter the calculated lane line, or the running time is better than the single DSP and PC at the parallel C6474 platform. In addition, the road detection is accurate and reliable, and it has good robustness.


2021 ◽  
Vol 20 (5s) ◽  
pp. 1-25
Author(s):  
Michael Canesche ◽  
Westerley Carvalho ◽  
Lucas Reis ◽  
Matheus Oliveira ◽  
Salles Magalhães ◽  
...  

Coarse-grained reconfigurable architecture (CGRA) mapping involves three main steps: placement, routing, and timing. The mapping is an NP-complete problem, and a common strategy is to decouple this process into its independent steps. This work focuses on the placement step, and its aim is to propose a technique that is both reasonably fast and leads to high-performance solutions. Furthermore, a near-optimal placement simplifies the following routing and timing steps. Exact solutions cannot find placements in a reasonable execution time as input designs increase in size. Heuristic solutions include meta-heuristics, such as Simulated Annealing (SA) and fast and straightforward greedy heuristics based on graph traversal. However, as these approaches are probabilistic and have a large design space, it is not easy to provide both run-time efficiency and good solution quality. We propose a graph traversal heuristic that provides the best of both: high-quality placements similar to SA and the execution time of graph traversal approaches. Our placement introduces novel ideas based on “you only traverse twice” (YOTT) approach that performs a two-step graph traversal. The first traversal generates annotated data to guide the second step, which greedily performs the placement, node per node, aided by the annotated data and target architecture constraints. We introduce three new concepts to implement this technique: I/O and reconvergence annotation, degree matching, and look-ahead placement. Our analysis of this approach explores the placement execution time/quality trade-offs. We point out insights on how to analyze graph properties during dataflow mapping. Our results show that YOTT is 60.6 , 9.7 , and 2.3 faster than a high-quality SA, bounding box SA VPR, and multi-single traversal placements, respectively. Furthermore, YOTT reduces the average wire length and the maximal FIFO size (additional timing requirement on CGRAs) to avoid delay mismatches in fully pipelined architectures.


Sign in / Sign up

Export Citation Format

Share Document