Exploiting parallel computers to reduce neural network training time of real applications

Parallelization of Neural Network Training for NLP with Hogwild!

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0036 ◽

2017 ◽

Vol 109 (1) ◽

pp. 29-38 ◽

Cited By ~ 2

Author(s):

Valentin Deyringer ◽

Alexander Fraser ◽

Helmut Schmid ◽

Tsuyoshi Okita

Keyword(s):

Neural Network ◽

Neural Networks ◽

Suitable Method ◽

Neural Network Training ◽

Training Time ◽

Pos Tagging ◽

Network Training ◽

Speed Up

Abstract Neural Networks are prevalent in todays NLP research. Despite their success for different tasks, training time is relatively long. We use Hogwild! to counteract this phenomenon and show that it is a suitable method to speed up training Neural Networks of different architectures and complexity. For POS tagging and translation we report considerable speedups of training, especially for the latter. We show that Hogwild! can be an important tool for training complex NLP architectures.

Download Full-text

A method for decreasing neural network training time as applied to ECG classification

[1992] Proceedings of the Eighteenth IEEE Annual Northeast Bioengineering Conference ◽

10.1109/nebc.1992.285919 ◽

2003 ◽

Author(s):

E. Sakk ◽

J. Belina ◽

R.J. Thomas

Keyword(s):

Neural Network ◽

Neural Network Training ◽

Training Time ◽

Network Training

Download Full-text

TRAINING NEURAL NETWORK FOR TAXI PASSENGER DEMAND FORECASTING USING GRAPHICS PROCESSING UNITS

Ukrainian Journal of Information Technology ◽

10.23939/ujit2020.02.029 ◽

2020 ◽

Vol 2 (1) ◽

pp. 29-36

Author(s):

M. I. Zghoba ◽

◽

Yu. I. Hrytsiuk ◽

Keyword(s):

Neural Network ◽

Graphics Processing Units ◽

Neural Network Training ◽

Training Time ◽

Passenger Demand ◽

Network Training ◽

The Neural Network ◽

Input Dataset ◽

Speed Up ◽

Graphics Processing

The peculiarities of neural network training for forecasting taxi passenger demand using graphics processing units are considered, which allowed to speed up the training procedure for different sets of input data, hardware configurations, and its power. It has been found that taxi services are becoming more accessible to a wide range of people. The most important task for any transportation company and taxi driver is to minimize the waiting time for new orders and to minimize the distance from drivers to passengers on order receiving. Understanding and assessing the geographical passenger demand that depends on many factors is crucial to achieve this goal. This paper describes an example of neural network training for predicting taxi passenger demand. It shows the importance of a large input dataset for the accuracy of the neural network. Since the training of a neural network is a lengthy process, parallel training was used to speed up the training. The neural network for forecasting taxi passenger demand was trained using different hardware configurations, such as one CPU, one GPU, and two GPUs. The training times of one epoch were compared along with these configurations. The impact of different hardware configurations on training time was analyzed in this work. The network was trained using a dataset containing 4.5 million trips within one city. The results of this study show that the training with GPU accelerators doesn't necessarily improve the training time. The training time depends on many factors, such as input dataset size, splitting of the entire dataset into smaller subsets, as well as hardware and power characteristics.

Download Full-text

S-DFP: shifted dynamic fixed point for quantized deep neural network training

Neural Computing and Applications ◽

10.1007/s00521-021-06821-x ◽

2021 ◽

Author(s):

Yasufumi Sakai ◽

Yutaka Tamiya

Keyword(s):

Neural Network ◽

Neural Networks ◽

Fixed Point ◽

Deep Neural Networks ◽

Data Representation ◽

Training Methods ◽

Neural Network Training ◽

Training Time ◽

Network Training ◽

Complex Models

AbstractRecent advances in deep neural networks have achieved higher accuracy with more complex models. Nevertheless, they require much longer training time. To reduce the training time, training methods using quantized weight, activation, and gradient have been proposed. Neural network calculation by integer format improves the energy efficiency of hardware for deep learning models. Therefore, training methods for deep neural networks with fixed point format have been proposed. However, the narrow data representation range of the fixed point format degrades neural network accuracy. In this work, we propose a new fixed point format named shifted dynamic fixed point (S-DFP) to prevent accuracy degradation in quantized neural networks training. S-DFP can change the data representation range of dynamic fixed point format by adding bias to the exponent. We evaluated the effectiveness of S-DFP for quantized neural network training on the ImageNet task using ResNet-34, ResNet-50, ResNet-101 and ResNet-152. For example, the accuracy of quantized ResNet-152 is improved from 76.6% with conventional 8-bit DFP to 77.6% with 8-bit S-DFP.

Download Full-text

Comparison of Neural Network Training Algorithms for Classification of Heart Diseases

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v7.i4.pp185-189 ◽

2018 ◽

Vol 7 (4) ◽

pp. 185 ◽

Cited By ~ 3

Author(s):

Hesam Karim ◽

Sharareh R. Niakan ◽

Reza Safdari

Keyword(s):

Neural Network ◽

Heart Disease ◽

Heart Diseases ◽

Classification Model ◽

Training Algorithms ◽

Neural Network Training ◽

Training Time ◽

Network Training ◽

Quasi Newton

<span lang="EN-US">Heart disease is the first cause of death in different countries. Artificial neural network (ANN) technique can be used to predict or classification patients getting a heart disease. There are different training algorithms for ANN. We compared eight neural network training algorithms for classification of heart disease data from UCI repository containing 303 samples. Performance measures of each algorithm containing the speed of training, the number of epochs, accuracy, and mean square error (MSE) were obtained and analyzed. Our results showed that training time for gradient descent algorithms was longer than other training algorithms (8-10 seconds). In contrast, Quasi-Newton algorithms were faster than others (<=0 second). MSE for all algorithms was between 0.117 and 0.228. While there was a significant association between training algorithms and training time (p<0.05), the number of neurons in hidden layer had not any significant effect on the MSE and/or accuracy of the models (p>0.05). Based on our findings, for development an ANN classification model for heart diseases, it is best to use Quasi-Newton training algorithms because of the best speed and accuracy.</span>

Download Full-text

EFFICIENCY ESTIMATION OF PARALLEL ALGORITHM OF ENHANCED HISTORICAL DATA INTEGRATION ON COMPUTATIONAL GRID

International Journal of Computing ◽

10.47839/ijc.4.3.357 ◽

2014 ◽

pp. 9-19

Author(s):

V. Turchenko ◽

C. Triki ◽

Lucio Grandinetti ◽

Anatoly Sachenko

Keyword(s):

Neural Network ◽

Parallel Algorithm ◽

Historical Data ◽

Single Layer ◽

Computing System ◽

Neural Network Training ◽

Training Time ◽

Network Training ◽

Dynamic Mapping ◽

Mapping Criterion

The main feature of neural network using for accuracy improvement of physical quantities (for example, temperature, humidity, pressure etc.) measurement by data acquisition systems is insufficient volume of input data for predicting neural network training at an initial exploitation period of sensors. The authors have proposed the technique of data volume increasing for predicting neural network training using integration of historical data method. In this paper we have proposed enhanced integration historical data method with its simulation results on mathematical models of sensor drift using single-layer and multi-layer perceptrons. We also considered a parallelization technique of enhanced integration historical data method in order to decrease its working time. A modified coarse-grain parallel algorithm with dynamic mapping on processors of parallel computing system using neural network training time as mapping criterion is considered. Fulfilled experiments have showed that modified parallel algorithm is more efficient than basic parallel algorithm with dynamic mapping, which does not use any mapping criterion.

Download Full-text

Evolving a Deep Neural Network Training Time Estimator

Communications in Computer and Information Science - Optimization and Learning ◽

10.1007/978-3-030-41913-4_2 ◽

2020 ◽

pp. 13-24

Author(s):

Frédéric Pinel ◽

Jian-xiong Yin ◽

Christian Hundt ◽

Emmanuel Kieffer ◽

Sébastien Varrette ◽

...

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Neural Network Training ◽

Training Time ◽

Network Training

Download Full-text

Reduction of Neural Network Training Time Using anAdaptive Fuzzy Approach in Real Time Applications

International Journal of Information and Electronics Engineering ◽

10.7763/ijiee.2012.v2.140 ◽

2012 ◽

Author(s):

Hamidreza Kanan

Keyword(s):

Neural Network ◽

Real Time ◽

Neural Network Training ◽

Fuzzy Approach ◽

Training Time ◽

Network Training ◽

Real Time Applications

Download Full-text

Dynamic learning rate neural network training and composite structural damage detection

AIAA Journal ◽

10.2514/3.13701 ◽

1997 ◽

Vol 35 ◽

pp. 1522-1527

Author(s):

H. Luo ◽

S. Hanagud

Keyword(s):

Neural Network ◽

Damage Detection ◽

Structural Damage ◽

Learning Rate ◽

Neural Network Training ◽

Structural Damage Detection ◽

Dynamic Learning ◽

Network Training

Download Full-text

Weight regularisation in particle swarm optimisation neural network training

2014 IEEE Symposium on Swarm Intelligence ◽

10.1109/sis.2014.7011773 ◽

2014 ◽

Cited By ~ 7

Author(s):

Anna Rakitianskaia ◽

Andries Engelbrecht

Keyword(s):

Neural Network ◽

Particle Swarm ◽

Particle Swarm Optimisation ◽

Neural Network Training ◽

Network Training

Download Full-text