Adaptive Computation Reuse for Energy-Efficient Training of Deep Neural Networks

2021 ◽  
Vol 20 (6) ◽  
pp. 1-24
Author(s):  
Jason Servais ◽  
Ehsan Atoofian

In recent years, Deep Neural Networks (DNNs) have been deployed into a diverse set of applications from voice recognition to scene generation mostly due to their high-accuracy. DNNs are known to be computationally intensive applications, requiring a significant power budget. There have been a large number of investigations into energy-efficiency of DNNs. However, most of them primarily focused on inference while training of DNNs has received little attention. This work proposes an adaptive technique to identify and avoid redundant computations during the training of DNNs. Elements of activations exhibit a high degree of similarity, causing inputs and outputs of layers of neural networks to perform redundant computations. Based on this observation, we propose Adaptive Computation Reuse for Tensor Cores (ACRTC) where results of previous arithmetic operations are used to avoid redundant computations. ACRTC is an architectural technique, which enables accelerators to take advantage of similarity in input operands and speedup the training process while also increasing energy-efficiency. ACRTC dynamically adjusts the strength of computation reuse based on the tolerance of precision relaxation in different training phases. Over a wide range of neural network topologies, ACRTC accelerates training by 33% and saves energy by 32% with negligible impact on accuracy.

Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 230
Author(s):  
Jaechan Cho ◽  
Yongchul Jung ◽  
Seongjoo Lee ◽  
Yunho Jung

Binary neural networks (BNNs) have attracted significant interest for the implementation of deep neural networks (DNNs) on resource-constrained edge devices, and various BNN accelerator architectures have been proposed to achieve higher efficiency. BNN accelerators can be divided into two categories: streaming and layer accelerators. Although streaming accelerators designed for a specific BNN network topology provide high throughput, they are infeasible for various sensor applications in edge AI because of their complexity and inflexibility. In contrast, layer accelerators with reasonable resources can support various network topologies, but they operate with the same parallelism for all the layers of the BNN, which degrades throughput performance at certain layers. To overcome this problem, we propose a BNN accelerator with adaptive parallelism that offers high throughput performance in all layers. The proposed accelerator analyzes target layer parameters and operates with optimal parallelism using reasonable resources. In addition, this architecture is able to fully compute all types of BNN layers thanks to its reconfigurability, and it can achieve a higher area–speed efficiency than existing accelerators. In performance evaluation using state-of-the-art BNN topologies, the designed BNN accelerator achieved an area–speed efficiency 9.69 times higher than previous FPGA implementations and 24% higher than existing VLSI implementations for BNNs.


2007 ◽  
Vol 189 (11) ◽  
pp. 4161-4167 ◽  
Author(s):  
Mark D. Farrar ◽  
Karen M. Howson ◽  
Richard A. Bojar ◽  
David West ◽  
James C. Towler ◽  
...  

ABSTRACT Cutaneous propionibacteria are important commensals of human skin and are implicated in a wide range of opportunistic infections. Propionibacterium acnes is also associated with inflammatory acne vulgaris. Bacteriophage PA6 is the first phage of P. acnes to be sequenced and demonstrates a high degree of similarity to many mycobacteriophages both morphologically and genetically. PA6 possesses an icosahedreal head and long noncontractile tail characteristic of the Siphoviridae. The overall genome organization of PA6 resembled that of the temperate mycobacteriophages, although the genome was much smaller, 29,739 bp (48 predicted genes), compared to, for example, 50,550 bp (86 predicted genes) for the Bxb1 genome. PA6 infected only P. acnes and produced clear plaques with turbid centers, but it lacked any obvious genes for lysogeny. The host range of PA6 was restricted to P. acnes, but the phage was able to infect and lyse all P. acnes isolates tested. Sequencing of the PA6 genome makes an important contribution to the study of phage evolution and propionibacterial genetics.


2018 ◽  
Author(s):  
Christopher McComb

The design of a system commits a significant portion of the final cost of that system. Many computational approaches have been developed to assist designers in the analysis (e.g., computational fluid dynamics) and synthesis (e.g., topology optimization) of engineered systems. However, many of these approaches are computationally intensive, taking significant time to complete an analysis and even longer to iteratively synthesize a solution. The current work proposes a methodology for rapidly evaluating and syn- thesizing engineered systems through the use of deep neural networks. The proposed methodology is applied to the analysis and synthesis of offshore structures such as oil platforms. These structures are constructed in a ma- rine environment and are typically designed to achieve specific dynamics in response to a known spectrum of ocean waves. Results show that deep learning can be used to accurately and rapidly synthesize and analyze off- shore structure.


2022 ◽  
Vol 18 (2) ◽  
pp. 1-25
Author(s):  
Saransh Gupta ◽  
Mohsen Imani ◽  
Joonseop Sim ◽  
Andrew Huang ◽  
Fan Wu ◽  
...  

Stochastic computing (SC) reduces the complexity of computation by representing numbers with long streams of independent bits. However, increasing performance in SC comes with either an increase in area or a loss in accuracy. Processing in memory (PIM) computes data in-place while having high memory density and supporting bit-parallel operations with low energy consumption. In this article, we propose COSMO, an architecture for co mputing with s tochastic numbers in me mo ry, which enables SC in memory. The proposed architecture is general and can be used for a wide range of applications. It is a highly dense and parallel architecture that supports most SC encodings and operations in memory. It maximizes the performance and energy efficiency of SC by introducing several innovations: (i) in-memory parallel stochastic number generation, (ii) efficient implication-based logic in memory, (iii) novel memory bit line segmenting, (iv) a new memory-compatible SC addition operation, and (v) enabling flexible block allocation. To show the generality and efficiency of our stochastic architecture, we implement image processing, deep neural networks (DNNs), and hyperdimensional (HD) computing on the proposed hardware. Our evaluations show that running DNN inference on COSMO is 141× faster and 80× more energy efficient as compared to GPU.


Author(s):  
Alistair Baretto ◽  
Noel Pudussery ◽  
Veerasai Subramaniam ◽  
Amroz Siddiqui

The rapid growth that has taken place in Computer Vision has been instrumental in driving the advancement of Image processing techniques and drawing inferences from them. Combined with the enormous capabilities that Deep Neural networks bring to the table, computers can be efficiently trained to automate the tasks and yield accurate and robust results quickly thus optimizing the process. Technological growth has enabled us to bring such computationally intensive tasks to lighter and lower-end mobile devices thus opening up a wide range of possibilities. WebRTC-the open-source web standard enables us to send multimedia-based data from peer to peer paving the way for Real-time Communication over the Web. With this project, we aim to build on one such opportunity that can enable us to perform custom object detection through an android based application installed on our mobile phones. Therefore, our problem statement is to be able to capture real-time feeds, perform custom object detection, generate inference results, and appropriately send intruder alerts when needed. To implement this, we propose a mobile-based over-the-cloud solution that can capitalize on the enormous and encouraging features of the YOLO algorithm and incorporate the functionalities of OpenCV’s DNN module for providing us with fast and correct inferences. Coupled with a good and intuitive UI, we can ensure ease of use of our application.


Author(s):  
Ulas Isildak ◽  
Alessandro Stella ◽  
Matteo Fumagalli

1AbstractBalancing selection is an important adaptive mechanism underpinning a wide range of phenotypes. Despite its relevance, the detection of recent balancing selection from genomic data is challenging as its signatures are qualitatively similar to those left by ongoing positive selection. In this study we developed and implemented two deep neural networks and tested their performance to predict loci under recent selection, either due to balancing selection or incomplete sweep, from population genomic data. Specifically, we generated forward-intime simulations to train and test an artificial neural network (ANN) and a convolutional neural network (CNN). ANN received as input multiple summary statistics calculated on the locus of interest, while CNN was applied directly on the matrix of haplotypes. We found that both architectures have high accuracy to identify loci under recent selection. CNN generally outperformed ANN to distinguish between signals of balancing selection and incomplete sweep and was less affected by incorrect training data. We deployed both trained networks on neutral genomic regions in European populations and demonstrated a lower false positive rate for CNN than ANN. We finally deployed CNN within the MEFV gene region and identified several common variants predicted to be under incomplete sweep in a European population. Notably, two of these variants are functional changes and could modulate susceptibility to Familial Mediterranean Fever, possibly as a consequence of past adaptation to pathogens. In conclusion, deep neural networks were able to characterise signals of selection on intermediate-frequency variants, an analysis currently inaccessible by commonly used strategies.


2021 ◽  
Author(s):  
Morgan Frearson

<div><div><div><p>The use of deep learning for human identification and object detection is becoming ever more prevalent in the surveillance industry. These systems have been trained to identify human body’s or faces with a high degree of accuracy. However, there have been successful attempts to fool these systems with different techniques called adversarial attacks. This paper presents an adversarial attack using infrared light on facial recognition systems. The relevance of this research is to exploit the physical downfalls of deep neural networks. This demonstration of weakness within these systems are in hopes that this research will be used in the future to improve the training models for object recognition. A research outline on infrared light and facial recognition are presented within this paper. A detailed analyzation of the current design phase and future steps of the of the project are presented including initial testing of the device. Any challenges are explored and evaluated such that the deliverables of the project remain consistent to its timeline. The project specifications may be subject to change overtime based on the outcomes of testing stages.</p></div></div></div>


Author(s):  
Jiakai Wang

Although deep neural networks (DNNs) have already made fairly high achievements and a very wide range of impact, their vulnerability attracts lots of interest of researchers towards related studies about artificial intelligence (AI) safety and robustness this year. A series of works reveals that the current DNNs are always misled by elaborately designed adversarial examples. And unfortunately, this peculiarity also affects real-world AI applications and places them at potential risk. we are more interested in physical attacks due to their implementability in the real world. The study of physical attacks can effectively promote the application of AI techniques, which is of great significance to the security development of AI.


2020 ◽  
pp. 107754632092914
Author(s):  
Mohammed Alabsi ◽  
Yabin Liao ◽  
Ala-Addin Nabulsi

Deep learning has seen tremendous growth over the past decade. It has set new performance limits for a wide range of applications, including computer vision, speech recognition, and machinery health monitoring. With the abundance of instrumentation data and the availability of high computational power, deep learning continues to prove itself as an efficient tool for the extraction of micropatterns from machinery big data repositories. This study presents a comparative study for feature extraction capabilities using stacked autoencoders considering the use of expert domain knowledge. Case Western Reserve University bearing dataset was used for the study, and a classifier was trained and tested to extract and visualize features from 12 different failure classes. Based on the raw data preprocessing, four different deep neural network structures were studied. Results indicated that integrating domain knowledge with deep learning techniques improved feature extraction capabilities and reduced the deep neural networks size and computational requirements without the need for exhaustive deep neural networks architecture tuning and modification.


2019 ◽  
Vol 28 (06) ◽  
pp. 1960008 ◽  
Author(s):  
Grega Vrbančič ◽  
Iztok Fister ◽  
Vili Podgorelec

Over the past years, the application of deep neural networks in a wide range of areas is noticeably increasing. While many state-of-the-art deep neural networks are providing the performance comparable or in some cases even superior to humans, major challenges such as parameter settings for learning deep neural networks and construction of deep learning architectures still exist. The implications of those challenges have a significant impact on how a deep neural network is going to perform on a specific task. With the proposed method, presented in this paper, we are addressing the problem of parameter setting for a deep neural network utilizing swarm intelligence algorithms. In our experiments, we applied the proposed method variants to the classification task for distinguishing between phishing and legitimate websites. The performance of the proposed method is evaluated and compared against four different phishing datasets, two of which we prepared on our own. The results, obtained from the conducted empirical experiments, have proven the proposed approach to be very promising. By utilizing the proposed swarm intelligence based methods, we were able to statistically significantly improve the predictive performance when compared to the manually tuned deep neural network. In general, the improvement of classification accuracy ranges from 2.5% to 3.8%, while the improvement of F1-score reached even 24% on one of the datasets.


Sign in / Sign up

Export Citation Format

Share Document