A Coarse-Grained Reconfigurable Architecture with Compilation for High Performance

International Journal of Reconfigurable Computing ◽

10.1155/2012/163542 ◽

2012 ◽

Vol 2012 ◽

pp. 1-17 ◽

Cited By ~ 2

Author(s):

Lu Wan ◽

Chen Dong ◽

Deming Chen

Keyword(s):

Data Transmission ◽

High Performance ◽

Multimedia Applications ◽

Reconfigurable Architecture ◽

Experimental Results ◽

Coarse Grained ◽

Processing Element ◽

Local Data ◽

Compiler Techniques ◽

Global Data

We propose afast data relay(FDR) mechanism to enhance existing CGRA (coarse-grained reconfigurable architecture). FDR can not only provide multicycle data transmission in concurrent with computations but also convert resource-demanding inter-processing-element global data accesses into local data accesses to avoid communication congestion. We also propose the supporting compiler techniques that can efficiently utilize the FDR feature to achieve higher performance for a variety of applications. Our results on FDR-based CGRA are compared with two other works in this field: ADRES and RCP. Experimental results for various multimedia applications show that FDR combined with the new compiler deliver up to 29% and 21% higher performance than ADRES and RCP, respectively.

Download Full-text

Context Management Scheme Optimization of Coarse-Grained Reconfigurable Architecture for Multimedia Applications

IEEE Transactions on Very Large Scale Integration (VLSI) Systems ◽

10.1109/tvlsi.2017.2695493 ◽

2017 ◽

Vol 25 (8) ◽

pp. 2321-2331 ◽

Cited By ~ 3

Author(s):

Peng Cao ◽

Bo Liu ◽

Jinjiang Yang ◽

Jun Yang ◽

Meng Zhang ◽

...

Keyword(s):

Multimedia Applications ◽

Reconfigurable Architecture ◽

Coarse Grained ◽

Context Management ◽

Scheme Optimization ◽

Management Scheme

Download Full-text

Row-based configuration mechanism for a 2-D processing element array in coarse-grained reconfigurable architecture

Science China Information Sciences ◽

10.1007/s11432-013-4973-8 ◽

2014 ◽

Vol 57 (10) ◽

pp. 1-18

Author(s):

LeiBo Liu ◽

YanSheng Wang ◽

ShouYi Yin ◽

Min Zhu ◽

Xing Wang ◽

...

Keyword(s):

Reconfigurable Architecture ◽

Coarse Grained ◽

Processing Element ◽

Element Array

Download Full-text

Dynamic context management for low power coarse-grained reconfigurable architecture

Proceedings of the 19th ACM Great Lakes symposium on VLSI - GLSVLSI '09 ◽

10.1145/1531542.1531555 ◽

2009 ◽

Cited By ~ 10

Author(s):

Yoonjin Kim ◽

Rabi N. Mahapatra

Keyword(s):

Low Power ◽

Reconfigurable Architecture ◽

Coarse Grained ◽

Context Management ◽

Dynamic Context

Download Full-text

Evaluation of recent advances in recommender systems on Arabic content

Journal Of Big Data ◽

10.1186/s40537-021-00420-2 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Mehdi Srifi ◽

Ahmed Oussous ◽

Ayoub Ait Lahcen ◽

Salma Mouline

Keyword(s):

Recommender Systems ◽

High Performance ◽

Large Scale ◽

State Of The Art ◽

Experimental Results ◽

Recent Advances ◽

Research Gap ◽

Text Preprocessing

AbstractVarious recommender systems (RSs) have been developed over recent years, and many of them have concentrated on English content. Thus, the majority of RSs from the literature were compared on English content. However, the research investigations about RSs when using contents in other languages such as Arabic are minimal. The researchers still neglect the field of Arabic RSs. Therefore, we aim through this study to fill this research gap by leveraging the benefit of recent advances in the English RSs field. Our main goal is to investigate recent RSs in an Arabic context. For that, we firstly selected five state-of-the-art RSs devoted originally to English content, and then we empirically evaluated their performance on Arabic content. As a result of this work, we first build four publicly available large-scale Arabic datasets for recommendation purposes. Second, various text preprocessing techniques have been provided for preparing the constructed datasets. Third, our investigation derived well-argued conclusions about the usage of modern RSs in the Arabic context. The experimental results proved that these systems ensure high performance when applied to Arabic content.

Download Full-text

Coarse-grained reconfigurable architecture for multiple application domains

Proceedings of the 2009 International Conference on Hybrid Information Technology - ICHIT '09 ◽

10.1145/1644993.1645095 ◽

2009 ◽

Cited By ~ 1

Author(s):

Manhwee Jo ◽

Ganghee Lee ◽

Kyungwook Chang ◽

Kyuseung Han ◽

Kiyoung Choi ◽

...

Keyword(s):

Reconfigurable Architecture ◽

Coarse Grained ◽

Multiple Application

Download Full-text

Sliding Mode Control for Wheeled Inverted Pendulum

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.971-973.714 ◽

2014 ◽

Vol 971-973 ◽

pp. 714-717 ◽

Cited By ~ 4

Author(s):

Xiang Shi ◽

Zhe Xu ◽

Qing Yi He ◽

Ka Tian

Keyword(s):

Control System ◽

Sliding Mode Control ◽

High Performance ◽

Inverted Pendulum ◽

Sliding Mode ◽

Experimental Results ◽

Control Law ◽

Collaborative Simulation ◽

Mode Control ◽

Wheeled Inverted Pendulum

To control wheeled inverted pendulum is a good way to test all kinds of theories of control. The control law is designed, and it based on the collaborative simulation of MATLAB and ADAMS is used to control wheeled inverted pendulum. Then, with own design of hardware and software of control system, sliding mode control is used to wheeled inverted pendulum, and the experimental results of it indicate short adjusting time, the small overshoot and high performance.

Download Full-text

Molecular Simulation on Interfacial Structure and Gettering Efficiency of Si (110)/(100) Directly Bonded Hybrid Crystal Orientation Substrates

Solid State Phenomena ◽

10.4028/www.scientific.net/ssp.156-158.199 ◽

2009 ◽

Vol 156-158 ◽

pp. 199-204

Author(s):

Hiroaki Kariyazaki ◽

Tatsuhiko Aoki ◽

Kouji Izunome ◽

Koji Sueoka

Keyword(s):

Molecular Simulation ◽

Wafer Bonding ◽

Crystal Orientation ◽

High Performance ◽

Cmos Technology ◽

Experimental Results ◽

Interfacial Structure ◽

Atomic Configuration ◽

Promising Technology ◽

Bonded Interface

Hybrid crystal orientation technology (HOT) substrates comprised of Si (100) and (110) surface orientation paralleling each <110> direction attract considerable attentions as one of the promising technology for high performance bulk CMOS technology. Although HOT substrates are fabricated by wafer bonding of Si (110) and Si (100) surfaces, it is not clear the atomic configuration of interfacial structure. Furthermore, the possibility for the interface to be an effective gettering source of impurity metals was not well studied. In this paper, we studied the interfacial structure and gettering efficiency of the atomic bonded interface by molecular simulations. The results indicate that the simulated atomic configuration and gettering efficiency of the bonded interface agreed well with the experimental results.

Download Full-text

A Coarse Grained Reconfigurable Architecture for Variable Block Size Motion Estimation

2007 International Conference on Field-Programmable Technology ◽

10.1109/fpt.2007.4439235 ◽

2007 ◽

Cited By ~ 2

Author(s):

Ruchika Verma ◽

Ali Akoglu

Keyword(s):

Motion Estimation ◽

Block Size ◽

Reconfigurable Architecture ◽

Coarse Grained ◽

Variable Block

Download Full-text

Lane Detection Algorithm Based on Genetic Algorithm and its Parallel Computing Realization

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.479-481.65 ◽

2012 ◽

Vol 479-481 ◽

pp. 65-70

Author(s):

Xiao Hui Zhang ◽

Liu Qing ◽

Mu Li

Keyword(s):

Genetic Algorithm ◽

Data Transmission ◽

High Speed ◽

High Performance ◽

Large Data ◽

Detection Algorithm ◽

Lane Detection ◽

The Road ◽

Time Problem ◽

High Speed Data

Based on the target detection of alignment template, the paper designs a lane alignment template by using correlation matching method, and combines with genetic algorithm for template stochastic matching and optimization to realize the lane detection. In order to solve the real-time problem of lane detection algorithm based on genetic algorithm, this paper uses the high performance multi-core DSP chip TMS320C6474 as the core, combines with high-speed data transmission technology of Rapid10, realizes the hardware parallel processing of the lane detection algorithm. By Rapid10 bus, the data transmission speed between the DSP and the DSP can reach 3.125Gbps, it basically realizes transmission without delay, and thereby solves the high speed transmission of the large data quantity between processor. The experimental results show that, no matter the calculated lane line, or the running time is better than the single DSP and PC at the parallel C6474 platform. In addition, the road detection is accurate and reliable, and it has good robustness.

Download Full-text

You Only Traverse Twice: A YOTT Placement, Routing, and Timing Approach for CGRAs

ACM Transactions on Embedded Computing Systems ◽

10.1145/3477038 ◽

2021 ◽

Vol 20 (5s) ◽

pp. 1-25

Author(s):

Michael Canesche ◽

Westerley Carvalho ◽

Lucas Reis ◽

Matheus Oliveira ◽

Salles Magalhães ◽

...

Keyword(s):

Execution Time ◽

High Performance ◽

Coarse Grained ◽

Optimal Placement ◽

Greedy Heuristics ◽

High Quality ◽

Solution Quality ◽

Graph Traversal ◽

Trade Offs ◽

Graph Properties

Coarse-grained reconfigurable architecture (CGRA) mapping involves three main steps: placement, routing, and timing. The mapping is an NP-complete problem, and a common strategy is to decouple this process into its independent steps. This work focuses on the placement step, and its aim is to propose a technique that is both reasonably fast and leads to high-performance solutions. Furthermore, a near-optimal placement simplifies the following routing and timing steps. Exact solutions cannot find placements in a reasonable execution time as input designs increase in size. Heuristic solutions include meta-heuristics, such as Simulated Annealing (SA) and fast and straightforward greedy heuristics based on graph traversal. However, as these approaches are probabilistic and have a large design space, it is not easy to provide both run-time efficiency and good solution quality. We propose a graph traversal heuristic that provides the best of both: high-quality placements similar to SA and the execution time of graph traversal approaches. Our placement introduces novel ideas based on “you only traverse twice” (YOTT) approach that performs a two-step graph traversal. The first traversal generates annotated data to guide the second step, which greedily performs the placement, node per node, aided by the annotated data and target architecture constraints. We introduce three new concepts to implement this technique: I/O and reconvergence annotation, degree matching, and look-ahead placement. Our analysis of this approach explores the placement execution time/quality trade-offs. We point out insights on how to analyze graph properties during dataflow mapping. Our results show that YOTT is 60.6 , 9.7 , and 2.3 faster than a high-quality SA, bounding box SA VPR, and multi-single traversal placements, respectively. Furthermore, YOTT reduces the average wire length and the maximal FIFO size (additional timing requirement on CGRAs) to avoid delay mismatches in fully pipelined architectures.

Download Full-text