A Partition Based Gradient Compression Algorithm for Distributed Training in AIoT

Bingjun Guo; Yazhi Liu; Chunyang Zhang

doi:10.3390/s21061943

A Partition Based Gradient Compression Algorithm for Distributed Training in AIoT

Sensors ◽

10.3390/s21061943 ◽

2021 ◽

Vol 21 (6) ◽

pp. 1943

Author(s):

Bingjun Guo ◽

Yazhi Liu ◽

Chunyang Zhang

Keyword(s):

Deep Neural Networks ◽

Communication Overhead ◽

Training Procedure ◽

Distribution Characteristics ◽

Gradient Distribution ◽

Training Strategy ◽

Distributed Training ◽

Communication Efficiency ◽

Adaptive Compression ◽

Training Efficiency

Running Deep Neural Networks (DNNs) in distributed Internet of Things (IoT) nodes is a promising scheme to enhance the performance of IoT systems. However, due to the limited computing and communication resources of the IoT nodes, the communication efficiency of the distributed DNN training strategy is a problem demanding a prompt solution. In this paper, an adaptive compression strategy based on gradient partition is proposed to solve the problem of high communication overhead between nodes during the distributed training procedure. Firstly, a neural network is trained to predict the gradient distribution of its parameters. According to the distribution characteristics of the gradient, the gradient is divided into the key region and the sparse region. At the same time, combined with the information entropy of gradient distribution, a reasonable threshold is selected to filter the gradient value in the partition, and only the gradient value greater than the threshold is transmitted and updated, to reduce the traffic and improve the distributed training efficiency. The strategy uses gradient sparsity to achieve the maximum compression ratio of 37.1 times, which improves the training efficiency to a certain extent.

Download Full-text

Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015693 ◽

2019 ◽

Vol 33 ◽

pp. 5693-5700 ◽

Cited By ~ 16

Author(s):

Hao Yu ◽

Sen Yang ◽

Shenghuo Zhu

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Model Averaging ◽

Communication Overhead ◽

Single Server ◽

Training Time ◽

Distributed Training ◽

Speed Up ◽

Experimental Works ◽

Single Worker

In distributed training of deep neural networks, parallel minibatch SGD is widely used to speed up the training process by using multiple workers. It uses multiple workers to sample local stochastic gradients in parallel, aggregates all gradients in a single server to obtain the average, and updates each worker’s local model using a SGD update with the averaged gradient. Ideally, parallel mini-batch SGD can achieve a linear speed-up of the training time (with respect to the number of workers) compared with SGD over a single worker. However, such linear scalability in practice is significantly limited by the growing demand for gradient communication as more workers are involved. Model averaging, which periodically averages individual models trained over parallel workers, is another common practice used for distributed training of deep neural networks since (Zinkevich et al. 2010) (McDonald, Hall, and Mann 2010). Compared with parallel mini-batch SGD, the communication overhead of model averaging is significantly reduced. Impressively, tremendous experimental works have verified that model averaging can still achieve a good speed-up of the training time as long as the averaging interval is carefully controlled. However, it remains a mystery in theory why such a simple heuristic works so well. This paper provides a thorough and rigorous theoretical study on why model averaging can work as well as parallel mini-batch SGD with significantly less communication overhead.

Download Full-text

Diversity oriented Deep Reinforcement Learning for targeted molecule generation

Journal of Cheminformatics ◽

10.1186/s13321-021-00498-z ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Tiago Pereira ◽

Maryam Abbasi ◽

Bernardete Ribeiro ◽

Joel P. Arrais

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Reinforcement Learning ◽

Deep Neural Networks ◽

Chemical Space ◽

Biological Properties ◽

Training Process ◽

Training Strategy ◽

Inhibitory Power ◽

Exploratory Strategy

AbstractIn this work, we explore the potential of deep learning to streamline the process of identifying new potential drugs through the computational generation of molecules with interesting biological properties. Two deep neural networks compose our targeted generation framework: the Generator, which is trained to learn the building rules of valid molecules employing SMILES strings notation, and the Predictor which evaluates the newly generated compounds by predicting their affinity for the desired target. Then, the Generator is optimized through Reinforcement Learning to produce molecules with bespoken properties. The innovation of this approach is the exploratory strategy applied during the reinforcement training process that seeks to add novelty to the generated compounds. This training strategy employs two Generators interchangeably to sample new SMILES: the initially trained model that will remain fixed and a copy of the previous one that will be updated during the training to uncover the most promising molecules. The evolution of the reward assigned by the Predictor determines how often each one is employed to select the next token of the molecule. This strategy establishes a compromise between the need to acquire more information about the chemical space and the need to sample new molecules, with the experience gained so far. To demonstrate the effectiveness of the method, the Generator is trained to design molecules with an optimized coefficient of partition and also high inhibitory power against the Adenosine $$A_{2A}$$ A 2 A and $$\kappa$$ κ opioid receptors. The results reveal that the model can effectively adjust the newly generated molecules towards the wanted direction. More importantly, it was possible to find promising sets of unique and diverse molecules, which was the main purpose of the newly implemented strategy.

Download Full-text

Coding Algorithms and Network Plan for Context-Aware Data Collection Based on Internet of Things in Large Marine Ships

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.701-702.957 ◽

2014 ◽

Vol 701-702 ◽

pp. 957-960

Author(s):

Feng Xie

Keyword(s):

Internet Of Things ◽

Formal Analysis ◽

Cluster Head ◽

Communication Overhead ◽

Context Aware ◽

Sensing Data ◽

Communication Efficiency ◽

Data Volume ◽

Using Data ◽

Data Context

The equipment maintenance in large marine ships may rely on Internet of Things to provide monitoring of equipment status instantly. The data volume of sensing data is huge as the number of equipments is large. It is critical to decrease the communication overhead of uploading sensing data for efficiently and timely monitoring. In this paper, we propose several coding algorithms by using data context that is modeled by our normal forms on the base of our observations. The communication efficiency is improved, which is justified by formal analysis and rigorous proof. We also propose several network plan policies for further improvement of the communication efficiency by using data context and cluster head deployment.

Download Full-text

Pruning for Hardware-Based Deep Spiking Neural Networks Using Gated Schottky Diode as Synaptic Devices

Journal of Nanoscience and Nanotechnology ◽

10.1166/jnn.2020.18772 ◽

2020 ◽

Vol 20 (11) ◽

pp. 6603-6608 ◽

Cited By ~ 1

Author(s):

Sung-Tae Lee ◽

Suhwan Lim ◽

Jong-Ho Bae ◽

Dongseok Kwon ◽

Hyeong-Su Kim ◽

...

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Schottky Diodes ◽

Computational Cost ◽

Spiking Neural Networks ◽

Training Procedure ◽

Learning Tasks ◽

L1 Regularization ◽

The Cost ◽

High Computational Cost

Deep learning represents state-of-the-art results in various machine learning tasks, but for applications that require real-time inference, the high computational cost of deep neural networks becomes a bottleneck for the efficiency. To overcome the high computational cost of deep neural networks, spiking neural networks (SNN) have been proposed. Herein, we propose a hardware implementation of the SNN with gated Schottky diodes as synaptic devices. In addition, we apply L1 regularization for connection pruning of the deep spiking neural networks using gated Schottky diodes as synap-tic devices. Applying L1 regularization eliminates the need for a re-training procedure because it prunes the weights based on the cost function. The compressed hardware-based SNN is energy efficient while achieving a classification accuracy of 97.85% which is comparable to 98.13% of the software deep neural networks (DNN).

Download Full-text

ENHANCED TRAINING FOR THE LOCALLY RECURRENT PROBABILISTIC NEURAL NETWORKS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213009000433 ◽

2009 ◽

Vol 18 (06) ◽

pp. 853-881 ◽

Cited By ~ 4

Author(s):

TODOR GANCHEV

Keyword(s):

Neural Networks ◽

Fitness Function ◽

Training Data ◽

Training Dataset ◽

Automatic Process ◽

Training Procedure ◽

Posterior Probabilities ◽

Probabilistic Neural Networks ◽

Training Strategy ◽

Locally Recurrent

In the present contribution we propose an integral training procedure for the Locally Recurrent Probabilistic Neural Networks (LR PNNs). Specifically, the adjustment of the smoothing factor "sigma" in the pattern layer of the LR PNN and the training of the recurrent layer weights are integrated in an automatic process that iteratively estimates all adjustable parameters of the LR PNN from the available training data. Furthermore, in contrast to the original LR PNN, whose recurrent layer was trained to provide optimum separation among the classes on the training dataset, while striving to keep a balance between the learning rates for all classes, here the training strategy is oriented towards optimizing the overall classification accuracy, straightforwardly. More precisely, the new training strategy directly targets at maximizing the posterior probabilities for the target class and minimizing the posterior probabilities estimated for the non-target classes. The new fitness function requires fewer computations for each evaluation, and therefore the overall computational demands for training the recurrent layer weights are reduced. The performance of the integrated training procedure is illustrated on three different speech processing tasks: emotion recognition, speaker identification and speaker verification.

Download Full-text

Noise-robust recognition of objects by humans and deep neural networks

10.1101/2020.08.03.234625 ◽

2020 ◽

Author(s):

Hojin Jang ◽

Devin McCormack ◽

Frank Tong

Keyword(s):

Neural Networks ◽

Visual Processing ◽

Deep Neural Networks ◽

Signal To Noise Ratio ◽

Human Vision ◽

Training Procedure ◽

Robust Recognition ◽

Recognition Of Objects ◽

Noise Robust ◽

Level Performance

ABSTRACTDeep neural networks (DNNs) can accurately recognize objects in clear viewing conditions, leading to claims that they have attained or surpassed human-level performance. However, standard DNNs are severely impaired at recognizing objects in visual noise, whereas human vision remains robust. We developed a noise-training procedure, generating noisy images of objects with low signal-to-noise ratio, to investigate whether DNNs can acquire robustness that better matches human vision. After noise training, DNNs outperformed human observers while exhibiting more similar patterns of performance, and provided a better model for predicting human recognition thresholds on an image-by-image basis. Noise training also improved DNN recognition of vehicles in noisy weather. Layer-specific analyses revealed that the contaminating effects of noise were dampened, rather than amplified, across successive stages of the noise-trained network, with greater benefit at higher levels of the network. Our findings indicate that DNNs can learn noise-robust representations that better approximate human visual processing.

Download Full-text

A new efficient training strategy for deep neural networks by hybridization of artificial bee colony and limited–memory BFGS optimization algorithms

Neurocomputing ◽

10.1016/j.neucom.2017.05.061 ◽

2017 ◽

Vol 266 ◽

pp. 506-526 ◽

Cited By ~ 43

Author(s):

Hasan Badem ◽

Alper Basturk ◽

Abdullah Caliskan ◽

Mehmet Emin Yuksel

Keyword(s):

Neural Networks ◽

Artificial Bee Colony ◽

Deep Neural Networks ◽

Optimization Algorithms ◽

Training Strategy ◽

Limited Memory ◽

Bee Colony

Download Full-text

Outperforming Dermatologist-Level Skin Cancer Classification via Enhanced Training of Deep Neural Networks (Preprint)

10.2196/preprints.12222 ◽

2018 ◽

Author(s):

Titus Josef Brinker ◽

Achim Hekler ◽

Christof von Kalle

Keyword(s):

Neural Networks ◽

Skin Cancer ◽

Deep Neural Networks ◽

Challenge Test ◽

Image Data ◽

Cancer Classification ◽

Receiver Operating Curve ◽

Training Procedure ◽

Average Precision ◽

Test Sets

BACKGROUND In recent months, multiple publications have demonstrated the use of convolutional neural networks (CNN) to classify images of skin cancer as precisely as dermatologists. These CNNs failed to outperform the International Symposium on Biomedical Imaging (ISBI) 2016 challenge in terms of average precision, however, so the technical progress represented by these studies is limited. In addition, the available reports are difficult to reproduce, due to incomplete descriptions of training procedures and the use of proprietary image databases. These factors prevent the comparison of various CNN classifiers in equal terms. OBJECTIVE To demonstrate the training of an image-classifier CNN that outperforms the winner of the ISBI 2016 challenge by using open source images exclusively. METHODS A detailed description of the training procedure is reported while the used images and test sets are disclosed fully, to insure the reproducibility of our work. RESULTS Our CNN classifier outperforms all recent attempts to classify the original ISBI 2016 challenge test data (full set of 379 test images), with an average precision of 0.709 (vs. 0.637 of the ISBI winner) and with an area under the receiver operating curve of 0.85. CONCLUSIONS This work illustrates the potential for improving skin cancer classification with enhanced training procedures for CNNs, while avoiding the use of costly equipment or proprietary image data.

Download Full-text