memory area
Recently Published Documents


TOTAL DOCUMENTS

25
(FIVE YEARS 3)

H-INDEX

4
(FIVE YEARS 0)

2021 ◽  
Vol 18 (4) ◽  
pp. 1-26
Author(s):  
Candace Walden ◽  
Devesh Singh ◽  
Meenatchi Jagasivamani ◽  
Shang Li ◽  
Luyi Kang ◽  
...  

Many emerging non-volatile memories are compatible with CMOS logic, potentially enabling their integration into a CPU’s die. This article investigates such monolithically integrated CPU–main memory chips. We exploit non-volatile memories employing 3D crosspoint subarrays, such as resistive RAM (ReRAM), and integrate them over the CPU’s last-level cache (LLC). The regular structure of cache arrays enables co-design of the LLC and ReRAM main memory for area efficiency. We also develop a streamlined LLC/main memory interface that employs a single shared internal interconnect for both the cache and main memory arrays, and uses a unified controller to service both LLC and main memory requests. We apply our monolithic design ideas to a many-core CPU by integrating 3D ReRAM over each core’s LLC slice. We find that co-design of the LLC and ReRAM saves 27% of the total LLC–main memory area at the expense of slight increases in delay and energy. The streamlined LLC/main memory interface saves an additional 12% in area. Our simulation results show monolithic integration of CPU and main memory improves performance by 5.3× and 1.7× over HBM2 DRAM for several graph and streaming kernels, respectively. It also reduces the memory system’s energy by 6.0× and 1.7×, respectively. Moreover, we show that the area savings of co-design permits the CPU to have 23% more cores and main memory, and that streamlining the LLC/main memory interface incurs a small 4% performance penalty.


Electronics ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 94
Author(s):  
Antonio Rios-Navarro ◽  
Daniel Gutierrez-Galan ◽  
Juan Pedro Dominguez-Morales ◽  
Enrique Piñero-Fuentes ◽  
Lourdes Duran-Lopez ◽  
...  

The use of deep learning solutions in different disciplines is increasing and their algorithms are computationally expensive in most cases. For this reason, numerous hardware accelerators have appeared to compute their operations efficiently in parallel, achieving higher performance and lower latency. These algorithms need large amounts of data to feed each of their computing layers, which makes it necessary to efficiently handle the data transfers that feed and collect the information to and from the accelerators. For the implementation of these accelerators, hybrid devices are widely used, which have an embedded computer, where an operating system can be run, and a field-programmable gate array (FPGA), where the accelerator can be deployed. In this work, we present a software API that efficiently organizes the memory, preventing reallocating data from one memory area to another, which improves the native Linux driver with a 85% speed-up and reduces the frame computing time by 28% in a real application.


2020 ◽  
Vol 10 (20) ◽  
pp. 7181
Author(s):  
Donghyun Lee ◽  
Jeong-Sik Park ◽  
Myoung-Wan Koo ◽  
Ji-Hwan Kim

The performance of a long short-term memory (LSTM) recurrent neural network (RNN)-based language model has been improved on language model benchmarks. Although a recurrent layer has been widely used, previous studies showed that an LSTM RNN-based language model (LM) cannot overcome the limitation of the context length. To train LMs on longer sequences, attention mechanism-based models have recently been used. In this paper, we propose a LM using a neural Turing machine (NTM) architecture based on localized content-based addressing (LCA). The NTM architecture is one of the attention-based model. However, the NTM encounters a problem with content-based addressing because all memory addresses need to be accessed for calculating cosine similarities. To address this problem, we propose an LCA method. The LCA method searches for the maximum of all cosine similarities generated from all memory addresses. Next, a specific memory area including the selected memory address is normalized with the softmax function. The LCA method is applied to pre-trained NTM-based LM during the test stage. The proposed architecture is evaluated on Penn Treebank and enwik8 LM tasks. The experimental results indicate that the proposed approach outperforms the previous NTM architecture.


2020 ◽  
Vol 123 (1) ◽  
pp. 3-11
Author(s):  
Cory R. Scherer ◽  
John J. Skowronski

This article introduces the special issue on autobiographical memory for Psychological Reports. In this introduction, we attempt to provide a context for autobiographical memory area by highlighting the diversity in the areas of scholarship that contribute to the area. We describe our perceptions of the contributions made by the various articles and reflect on how the scholarship presented in the article links to the scholarship presented in the other articles. We also use the scholarship presented in the articles to generate some additional ideas and directions that could be pursued in future theory and research.


2016 ◽  
Vol 96 ◽  
pp. 1172-1178 ◽  
Author(s):  
Muhammad Nur Adilin Mohd Anuardi ◽  
Hideyuki Shinohara ◽  
Atsuko K. Yamazaki

Sign in / Sign up

Export Citation Format

Share Document