Exploiting parallel computers to reduce neural network training time of real applications

Author(s):  
Jim Torresen ◽  
Shin-ichiro Mori ◽  
Hiroshi Nakashima ◽  
Shinji Tomita ◽  
Olav Landsverk
2017 ◽  
Vol 109 (1) ◽  
pp. 29-38 ◽  
Author(s):  
Valentin Deyringer ◽  
Alexander Fraser ◽  
Helmut Schmid ◽  
Tsuyoshi Okita

Abstract Neural Networks are prevalent in todays NLP research. Despite their success for different tasks, training time is relatively long. We use Hogwild! to counteract this phenomenon and show that it is a suitable method to speed up training Neural Networks of different architectures and complexity. For POS tagging and translation we report considerable speedups of training, especially for the latter. We show that Hogwild! can be an important tool for training complex NLP architectures.


2020 ◽  
Vol 2 (1) ◽  
pp. 29-36
Author(s):  
M. I. Zghoba ◽  
◽  
Yu. I. Hrytsiuk ◽  

The peculiarities of neural network training for forecasting taxi passenger demand using graphics processing units are considered, which allowed to speed up the training procedure for different sets of input data, hardware configurations, and its power. It has been found that taxi services are becoming more accessible to a wide range of people. The most important task for any transportation company and taxi driver is to minimize the waiting time for new orders and to minimize the distance from drivers to passengers on order receiving. Understanding and assessing the geographical passenger demand that depends on many factors is crucial to achieve this goal. This paper describes an example of neural network training for predicting taxi passenger demand. It shows the importance of a large input dataset for the accuracy of the neural network. Since the training of a neural network is a lengthy process, parallel training was used to speed up the training. The neural network for forecasting taxi passenger demand was trained using different hardware configurations, such as one CPU, one GPU, and two GPUs. The training times of one epoch were compared along with these configurations. The impact of different hardware configurations on training time was analyzed in this work. The network was trained using a dataset containing 4.5 million trips within one city. The results of this study show that the training with GPU accelerators doesn't necessarily improve the training time. The training time depends on many factors, such as input dataset size, splitting of the entire dataset into smaller subsets, as well as hardware and power characteristics.


Author(s):  
Yasufumi Sakai ◽  
Yutaka Tamiya

AbstractRecent advances in deep neural networks have achieved higher accuracy with more complex models. Nevertheless, they require much longer training time. To reduce the training time, training methods using quantized weight, activation, and gradient have been proposed. Neural network calculation by integer format improves the energy efficiency of hardware for deep learning models. Therefore, training methods for deep neural networks with fixed point format have been proposed. However, the narrow data representation range of the fixed point format degrades neural network accuracy. In this work, we propose a new fixed point format named shifted dynamic fixed point (S-DFP) to prevent accuracy degradation in quantized neural networks training. S-DFP can change the data representation range of dynamic fixed point format by adding bias to the exponent. We evaluated the effectiveness of S-DFP for quantized neural network training on the ImageNet task using ResNet-34, ResNet-50, ResNet-101 and ResNet-152. For example, the accuracy of quantized ResNet-152 is improved from 76.6% with conventional 8-bit DFP to 77.6% with 8-bit S-DFP.


Author(s):  
Hesam Karim ◽  
Sharareh R. Niakan ◽  
Reza Safdari

<span lang="EN-US">Heart disease is the first cause of death in different countries. Artificial neural network (ANN) technique can be used to predict or classification patients getting a heart disease. There are different training algorithms for ANN. We compared eight neural network training algorithms for classification of heart disease data from UCI repository containing 303 samples. Performance measures of each algorithm containing the speed of training, the number of epochs, accuracy, and mean square error (MSE) were obtained and analyzed. Our results showed that training time for gradient descent algorithms was longer than other training algorithms (8-10 seconds). In contrast, Quasi-Newton algorithms were faster than others (&lt;=0 second). MSE for all algorithms was between 0.117 and 0.228. While there was a significant association between training algorithms and training time (p&lt;0.05), the number of neurons in hidden layer had not any significant effect on the MSE and/or accuracy of the models (p&gt;0.05). Based on our findings, for development an ANN classification model for heart diseases, it is best to use Quasi-Newton training algorithms because of the best speed and accuracy.</span>


2014 ◽  
pp. 9-19
Author(s):  
V. Turchenko ◽  
C. Triki ◽  
Lucio Grandinetti ◽  
Anatoly Sachenko

The main feature of neural network using for accuracy improvement of physical quantities (for example, temperature, humidity, pressure etc.) measurement by data acquisition systems is insufficient volume of input data for predicting neural network training at an initial exploitation period of sensors. The authors have proposed the technique of data volume increasing for predicting neural network training using integration of historical data method. In this paper we have proposed enhanced integration historical data method with its simulation results on mathematical models of sensor drift using single-layer and multi-layer perceptrons. We also considered a parallelization technique of enhanced integration historical data method in order to decrease its working time. A modified coarse-grain parallel algorithm with dynamic mapping on processors of parallel computing system using neural network training time as mapping criterion is considered. Fulfilled experiments have showed that modified parallel algorithm is more efficient than basic parallel algorithm with dynamic mapping, which does not use any mapping criterion.


Author(s):  
Frédéric Pinel ◽  
Jian-xiong Yin ◽  
Christian Hundt ◽  
Emmanuel Kieffer ◽  
Sébastien Varrette ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document