Layer-Centric Memory Reuse and Data Migration for Extreme-Scale Deep Learning on Many-Core Architectures

Hai Jin; Bo Liu; Wenbin Jiang; Yang Ma; Xuanhua Shi; Bingsheng He; Shaofeng Zhao

doi:10.1145/3243904

Layer-Centric Memory Reuse and Data Migration for Extreme-Scale Deep Learning on Many-Core Architectures

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3243904 ◽

2018 ◽

Vol 15 (3) ◽

pp. 1-26 ◽

Cited By ~ 3

Author(s):

Hai Jin ◽

Bo Liu ◽

Wenbin Jiang ◽

Yang Ma ◽

Xuanhua Shi ◽

...

Keyword(s):

Deep Learning ◽

Data Migration ◽

Extreme Scale ◽

Many Core ◽

Memory Reuse

Download Full-text

Exploring Data Migration for Future Deep-Memory Many-Core Systems

2016 IEEE International Conference on Cluster Computing (CLUSTER) ◽

10.1109/cluster.2016.42 ◽

2016 ◽

Cited By ~ 5

Author(s):

Swann Perarnau ◽

Judicael A. Zounmevo ◽

Balazs Gerofi ◽

Kamil Iskra ◽

Pete Beckman

Keyword(s):

Data Migration ◽

Many Core

Download Full-text

Hyperspectral Image Classification Using Parallel Autoencoding Diabolo Networks on Multi-Core and Many-Core Architectures

Electronics ◽

10.3390/electronics7120411 ◽

2018 ◽

Vol 7 (12) ◽

pp. 411 ◽

Cited By ~ 4

Author(s):

Emanuele Torti ◽

Alessandro Fontanella ◽

Antonio Plaza ◽

Javier Plaza ◽

Francesco Leporati

Keyword(s):

Deep Learning ◽

Hyperspectral Imaging ◽

Hyperspectral Image ◽

Unsupervised Classification ◽

Hyperspectral Data ◽

Machine Learning Techniques ◽

High Data ◽

Learning Techniques ◽

Speed Up ◽

Many Core

One of the most important tasks in hyperspectral imaging is the classification of the pixels in the scene in order to produce thematic maps. This problem can be typically solved through machine learning techniques. In particular, deep learning algorithms have emerged in recent years as a suitable methodology to classify hyperspectral data. Moreover, the high dimensionality of hyperspectral data, together with the increasing availability of unlabeled samples, makes deep learning an appealing approach to process and interpret those data. However, the limited number of labeled samples often complicates the exploitation of supervised techniques. Indeed, in order to guarantee a suitable precision, a large number of labeled samples is normally required. This hurdle can be overcome by resorting to unsupervised classification algorithms. In particular, autoencoders can be used to analyze a hyperspectral image using only unlabeled data. However, the high data dimensionality leads to prohibitive training times. In this regard, it is important to realize that the operations involved in autoencoders training are intrinsically parallel. Therefore, in this paper we present an approach that exploits multi-core and many-core devices in order to achieve efficient autoencoders training in hyperspectral imaging applications. Specifically, in this paper, we present new OpenMP and CUDA frameworks for autoencoder training. The obtained results show that the CUDA framework provides a speed-up of about two orders of magnitudes as compared to an optimized serial processing chain.

Download Full-text

Research on Parallel Acceleration for Deep Learning Inference Based on Many-Core ARM Platform

Communications in Computer and Information Science - Advanced Computer Architecture ◽

10.1007/978-981-13-2423-9_3 ◽

2018 ◽

pp. 30-41

Author(s):

Keqian Zhu ◽

Jingfei Jiang

Keyword(s):

Deep Learning ◽

Many Core ◽

Parallel Acceleration

Download Full-text

Implementation and performance of Barnes-hut n-body algorithm on extreme-scale heterogeneous many-core architectures

The International Journal of High Performance Computing Applications ◽

10.1177/1094342020943652 ◽

2020 ◽

Vol 34 (6) ◽

pp. 615-628

Author(s):

Masaki Iwasawa ◽

Daisuke Namekata ◽

Ryo Sakamoto ◽

Takashi Nakamura ◽

Yasuyuki Kimura ◽

...

Keyword(s):

Large Scale ◽

High Efficiency ◽

Simulation Code ◽

Network Bandwidth ◽

Large Numbers ◽

Sunway Taihulight ◽

And Performance ◽

Extreme Scale ◽

Many Core ◽

New Algorithms

In this paper, we report the implementation and measured performance of our extreme-scale whole planetary ring simulation code on Sunway TaihuLight and two PEZY-SC2 systems: Shoubu System B and Gyoukou. The numerical algorithm is the parallel Barnes-Hut tree algorithm, which has been used in many large-scale astrophysical particle-based simulations. Our implementation is based on our FDPS framework. However, the extremely large numbers of cores of the systems used (10 M on TaihuLight and 16 M on Gyoukou) and their relatively poor memory and network bandwidth pose new challenges. We describe the new algorithms introduced to achieve high efficiency on machines with low memory bandwidth. The measured performance is 47.9, 10.6 PF, and 1.01PF on TaihuLight, Gyoukou and Shoubu System B (efficiency 40%, 23.5% and 35.5%). The current code is developed for the simulation of planetary rings, but most of the new algorithms are useful for other simulations, and are now available in the FDPS framework.

Download Full-text

Deep Learning Optimization for Many-Core Virtual Platforms

Parallel Architectures, Algorithms and Programming - Communications in Computer and Information Science ◽

10.1007/978-981-16-0010-4_3 ◽

2021 ◽

pp. 22-33

Author(s):

Hengyu Cai ◽

Chengming Ning ◽

Qilong Zheng

Keyword(s):

Deep Learning ◽

Many Core

Download Full-text

Automatic Generation of High-Order Finite-Difference Code with Temporal Blocking for Extreme-Scale Many-Core Systems

2018 IEEE/ACM 4th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2) ◽

10.1109/espm2.2018.00008 ◽

2018 ◽

Cited By ~ 3

Author(s):

Hideyuki Tanaka ◽

Youhei Ishihara ◽

Ryo Sakamoto ◽

Takashi Nakamura ◽

Yasuyuki Kimura ◽

...

Keyword(s):

Finite Difference ◽

Automatic Generation ◽

High Order ◽

Extreme Scale ◽

Many Core ◽

High Order Finite Difference

Download Full-text

Many-Core Acceleration of a Discrete Ordinates Transport Mini-App at Extreme Scale

Lecture Notes in Computer Science - High Performance Computing ◽

10.1007/978-3-319-41321-1_22 ◽

2016 ◽

pp. 429-448 ◽

Cited By ~ 5

Author(s):

Tom Deakin ◽

Simon McIntosh-Smith ◽

Wayne Gaudin

Keyword(s):

Discrete Ordinates ◽

Extreme Scale ◽

Many Core

Download Full-text

Deep Learning

10.1017/cbo9780511780295 ◽

2009 ◽

Cited By ~ 94

Author(s):

Stellan Ohlsson

Keyword(s):

Deep Learning

Download Full-text

Intelligence artificielle : le futur de l’Orthodontie ?

Revue d Orthopédie Dento-Faciale ◽

10.1051/odf/2019026 ◽

2019 ◽

Vol 53 (3) ◽

pp. 281-294

Author(s):

Jean-Michel Foucart ◽

Augustin Chavanne ◽

Jérôme Bourriau

Keyword(s):

Big Data ◽

Deep Learning ◽

Intelligence Artificielle ◽

Set Up

Nombreux sont les apports envisagés de l’Intelligence Artificielle (IA) en médecine. En orthodontie, plusieurs solutions automatisées sont disponibles depuis quelques années en imagerie par rayons X (analyse céphalométrique automatisée, analyse automatisée des voies aériennes) ou depuis quelques mois (analyse automatique des modèles numériques, set-up automatisé; CS Model +, Carestream Dental™). L’objectif de cette étude, en deux parties, est d’évaluer la fiabilité de l’analyse automatisée des modèles tant au niveau de leur numérisation que de leur segmentation. La comparaison des résultats d’analyse des modèles obtenus automatiquement et par l’intermédiaire de plusieurs orthodontistes démontre la fiabilité de l’analyse automatique; l’erreur de mesure oscillant, in fine, entre 0,08 et 1,04 mm, ce qui est non significatif et comparable avec les erreurs de mesures inter-observateurs rapportées dans la littérature. Ces résultats ouvrent ainsi de nouvelles perspectives quand à l’apport de l’IA en Orthodontie qui, basée sur le deep learning et le big data, devrait permettre, à moyen terme, d’évoluer vers une orthodontie plus préventive et plus prédictive.

Download Full-text

EFFECTIVENESS OF ARTIFICIAL INTELLIGENCE USING DEEP LEARNING FOR DETECTING GASTRIC CANCER IN ENDOSCOPIC IMAGES

10.1055/s-0038-1637183 ◽

2018 ◽

Author(s):

T Hirasawa ◽

K Aoyama ◽

J Fujisaki ◽

T Tada

Keyword(s):

Artificial Intelligence ◽

Gastric Cancer ◽

Deep Learning ◽

Endoscopic Images

Download Full-text