CLU

2021 ◽  
Vol 17 (2) ◽  
pp. 1-25
Author(s):  
Palash Das ◽  
Hemangee K. Kapoor

Convolutional/Deep Neural Networks (CNNs/DNNs) are rapidly growing workloads for the emerging AI-based systems. The gap between the processing speed and the memory-access latency in multi-core systems affects the performance and energy efficiency of the CNN/DNN tasks. This article aims to alleviate this gap by providing a simple and yet efficient near-memory accelerator-based system that expedites the CNN inference. Towards this goal, we first design an efficient parallel algorithm to accelerate CNN/DNN tasks. The data is partitioned across the multiple memory channels (vaults) to assist in the execution of the parallel algorithm. Second, we design a hardware unit, namely the convolutional logic unit (CLU), which implements the parallel algorithm. To optimize the inference, the CLU is designed, and it works in three phases for layer-wise processing of data. Last, to harness the benefits of near-memory processing (NMP), we integrate homogeneous CLUs on the logic layer of the 3D memory, specifically the Hybrid Memory Cube (HMC). The combined effect of these results in a high-performing and energy-efficient system for CNNs/DNNs. The proposed system achieves a substantial gain in the performance and energy reduction compared to multi-core CPU- and GPU-based systems with a minimal area overhead of 2.37%.

2021 ◽  
Vol 26 (6) ◽  
pp. 1-20
Author(s):  
Naebeom Park ◽  
Sungju Ryu ◽  
Jaeha Kung ◽  
Jae-Joon Kim

This article discusses the high-performance near-memory neural network (NN) accelerator architecture utilizing the logic die in three-dimensional (3D) High Bandwidth Memory– (HBM) like memory. As most of the previously reported 3D memory-based near-memory NN accelerator designs used the Hybrid Memory Cube (HMC) memory, we first focus on identifying the key differences between HBM and HMC in terms of near-memory NN accelerator design. One of the major differences between the two 3D memories is that HBM has the centralized through- silicon-via (TSV) channels while HMC has distributed TSV channels for separate vaults. Based on the observation, we introduce the Round-Robin Data Fetching and Groupwise Broadcast schemes to exploit the centralized TSV channels for improvement of the data feeding rate for the processing elements. Using synthesized designs in a 28-nm CMOS technology, performance and energy consumption of the proposed architectures with various dataflow models are evaluated. Experimental results show that the proposed schemes reduce the runtime by 16.4–39.3% on average and the energy consumption by 2.1–5.1% on average compared to conventional data fetching schemes.


Author(s):  
Jörg-Tobias Kuhn ◽  
Elena Ise ◽  
Julia Raddatz ◽  
Christin Schwenk ◽  
Christian Dobel

Abstract. Objective: Deficits in basic numerical skills, calculation, and working memory have been found in children with developmental dyscalculia (DD) as well as children with attention-deficit/hyperactivity disorder (ADHD). This paper investigates cognitive profiles of children with DD and/or ADHD symptoms (AS) in a double dissociation design to obtain a better understanding of the comorbidity of DD and ADHD. Method: Children with DD-only (N = 33), AS-only (N = 16), comorbid DD+AS (N = 20), and typically developing controls (TD, N = 40) were assessed on measures of basic numerical processing, calculation, working memory, processing speed, and neurocognitive measures of attention. Results: Children with DD (DD, DD+AS) showed deficits in all basic numerical skills, calculation, working memory, and sustained attention. Children with AS (AS, DD+AS) displayed more selective difficulties in dot enumeration, subtraction, verbal working memory, and processing speed. Also, they generally performed more poorly in neurocognitive measures of attention, especially alertness. Children with DD+AS mostly showed an additive combination of the deficits associated with DD-only and A_Sonly, except for subtraction tasks, in which they were less impaired than expected. Conclusions: DD and AS appear to be related to largely distinct patterns of cognitive deficits, which are present in combination in children with DD+AS.



2019 ◽  
Vol 11 (8) ◽  
pp. 906 ◽  
Author(s):  
Zongyong Cui ◽  
Cui Tang ◽  
Zongjie Cao ◽  
Nengyuan Liu

Automatic target recognition (ATR) can obtain important information for target surveillance from Synthetic Aperture Radar (SAR) images. Thus, a direct automatic target recognition (D-ATR) method, based on a deep neural network (DNN), is proposed in this paper. To recognize targets in large-scene SAR images, the traditional methods of SAR ATR are comprised of four major steps: detection, discrimination, feature extraction, and classification. However, the recognition performance is sensitive to each step, as the processing result from each step will affect the following step. Meanwhile, these processes are independent, which means that there is still room for processing speed improvement. The proposed D-ATR method can integrate these steps as a whole system and directly recognize targets in large-scene SAR images, by encapsulating all of the computation in a single deep convolutional neural network (DCNN). Before the DCNN, a fast sliding method is proposed to partition the large image into sub-images, to avoid information loss when resizing the input images, and to avoid the target being divided into several parts. After the DCNN, non-maximum suppression between sub-images (NMSS) is performed on the results of the sub-images, to obtain an accurate result of the large-scene SAR image. Experiments on the MSTAR dataset and large-scene SAR images (with resolution 1478 × 1784) show that the proposed method can obtain a high accuracy and fast processing speed, and out-performs other methods, such as CFAR+SVM, Region-based CNN, and YOLOv2.


2017 ◽  
Vol 2017 ◽  
pp. 1-17 ◽  
Author(s):  
Lambros Messinis ◽  
Grigorios Nasios ◽  
Mary H. Kosmidis ◽  
Petros Zampakis ◽  
Sonia Malefaki ◽  
...  

Cognitive impairment is frequently encountered in multiple sclerosis (MS) affecting between 40–65% of individuals, irrespective of disease duration and severity of physical disability. In the present multicenter randomized controlled trial, fifty-eight clinically stable RRMS patients with mild to moderate cognitive impairment and relatively low disability status were randomized to receive either computer-assisted (RehaCom) functional cognitive training with an emphasis on episodic memory, information processing speed/attention, and executive functions for 10 weeks (IG; n=32) or standard clinical care (CG; n=26). Outcome measures included a flexible comprehensive neuropsychological battery of tests sensitive to MS patient deficits and feedback regarding personal benefit gained from the intervention on four verbal questions. Only the IG group showed significant improvements in verbal and visuospatial episodic memory, processing speed/attention, and executive functioning from pre - to postassessment. Moreover, the improvement obtained on attention was retained over 6 months providing evidence on the long-term benefits of this intervention. Group by time interactions revealed significant improvements in composite cognitive domain scores in the IG relative to the demographically and clinically matched CG for verbal episodic memory, processing speed, verbal fluency, and attention. Treated patients rated the intervention positively and were more confident about their cognitive abilities following treatment.


Author(s):  
Peter K. Koo ◽  
Matt Ploenzke

AbstractDespite deep neural networks (DNNs) having found great success at improving performance on various prediction tasks in computational genomics, it remains difficult to understand why they make any given prediction. In genomics, the main approaches to interpret a high-performing DNN are to visualize learned representations via weight visualizations and attribution methods. While these methods can be informative, each has strong limitations. For instance, attribution methods only uncover the independent contribution of single nucleotide variants in a given sequence. Here we discuss and argue for global importance analysis which can quantify population-level importance of putative features and their interactions learned by a DNN. We highlight recent work that has benefited from this interpretability approach and then discuss connections between global importance analysis and causality.


Author(s):  
Bicky Shakya ◽  
Xiaolin Xu ◽  
Mark Tehranipoor ◽  
Domenic Forte

Logic locking has recently been proposed as a solution for protecting gatelevel semiconductor intellectual property (IP). However, numerous attacks have been mounted on this technique, which either compromise the locking key or restore the original circuit functionality. SAT attacks leverage golden IC information to rule out all incorrect key classes, while bypass and removal attacks exploit the limited output corruptibility and/or structural traces of SAT-resistant locking schemes. In this paper, we propose a new lightweight locking technique: CAS-Lock (cascaded locking) which nullifies both SAT and bypass attacks, while simultaneously maintaining nontrivial output corruptibility. This property of CAS-Lock is in stark contrast to the well-accepted notion that there is an inherent trade-off between output corruptibility and SAT resistance. We theoretically and experimentally validate the SAT resistance of CAS-Lock, and show that it reduces the attack to brute-force, regardless of its construction. Further, we evaluate its resistance to recently proposed approximate SAT attacks (i.e., AppSAT). We also propose a modified version of CAS-Lock (mirrored CAS-Lock or M-CAS) to protect against removal attacks. M-CAS allows a trade-off evaluation between removal attack and SAT attack resiliency, while incurring minimal area overhead. We also show how M-CAS parameters such as the implemented Boolean function and selected key can be tuned by the designer so that a desired level of protection against all known attacks can be achieved.


2020 ◽  
Vol 35 (6) ◽  
pp. 1030-1030
Author(s):  
Petranovich C ◽  
Wilson K ◽  
Gill D ◽  
Morrison L ◽  
Hart B ◽  
...  

Abstract Objective This study assessed the convergent validity of the NIHTB-CB in a sample of children and adolescents with CCM-1 and non-affected relatives. Method Twenty-two participants with CCM-1 and 7 non-affected relatives completed the NIHTB-CB and traditional neuropsychological measures. The following domains were assessed: memory (NIHTB-CB Picture Sequencing Memory and Child and Adolescent Memory Profile- Screening Index), word reading (NIHTB-CB Oral Reading and Wide Range Achievement Test-4th Word Reading [WRAT-4 WR]),processing speed (NIHTB-CB Pattern Comparison and Symbol Digit Modalities Test), and attention/working memory (NIHTB-CB List Sorting and Digit Span). Results The non-affected group scored higher than the CCM-1 group on WRAT-4 WR (t = 2.68, p = .02) and NIHTB-CB Oral Reading (t = 2.18, p = .05). The groups did not differ on the other measures (p > .05). Pearson’s correlations ranged from .45 for memory to .81 for word reading, demonstrating adequate construct validity for memory, processing speed, and attention/working memory and good to very good for word reading. The NIHTB-CB was more likely to identify participants as impaired for memory (17.2% vs 6.9%) and processing speed (62.1% vs. 3.4%). The traditional attention/working memory measure was more likely to identify participants as impaired (27.6% vs 3.4%). Impairment rates were similar for the word reading measures. Conclusions Of the domains considered, convergent validity was best established for word reading. Although correlations were adequate, rates of impairment differed for memory, processing speed, and attention/working memory, suggesting that caution is warranted when comparing the NIHTB-CB to traditional measures in these areas.


Electronics ◽  
2019 ◽  
Vol 8 (10) ◽  
pp. 1096
Author(s):  
Chae Eun Rhee ◽  
Seung-Won Park ◽  
Jungwoo Choi ◽  
Hyunmin Jung ◽  
Hyuk-Jae Lee

Recently, dramatic improvements in memory performance have been highly required for data demanding application services such as deep learning, big data, and immersive videos. To this end, the throughput-oriented memory such as high bandwidth memory (HBM) and hybrid memory cube (HMC) has been introduced to provide a high bandwidth. For its effective use, various research efforts have been conducted. Among them, the near-memory-processing (NMP) is a concept that utilizes bandwidth and power consumption by placing computation logic near the memory. In the NMP-enabled system, a processor hierarchy consisting of hosts and NMPs is formed based on the distance from the main memory. In this paper, an evaluation tool is proposed to obtain the optimal design decision considering the power-time trade-off in the processor hierarchy. Every time the operating condition and constraints change, the decision of task-level offloading is dynamically made. For the realistic NMP-enabled system environment, the relationship among HBM, host, and NMP should be carefully considered. Hosts and NMPs are almost hidden from each other and the communications between them are extremely limited. In the simulation results, popular benchmarks and a machine learning application are used to demonstrate power-time trade-offs depending on applications and system conditions.


Sign in / Sign up

Export Citation Format

Share Document