Layer-wise learning based stochastic gradient descent method for the optimization of deep convolutional neural network

2019 ◽  
Vol 37 (4) ◽  
pp. 5641-5654 ◽  
Author(s):  
Qinghe Zheng ◽  
Xinyu Tian ◽  
Nan Jiang ◽  
Mingqiang Yang
2018 ◽  
Author(s):  
Kazunori D Yamada

ABSTRACTIn the deep learning era, stochastic gradient descent is the most common method used for optimizing neural network parameters. Among the various mathematical optimization methods, the gradient descent method is the most naive. Adjustment of learning rate is necessary for quick convergence, which is normally done manually with gradient descent. Many optimizers have been developed to control the learning rate and increase convergence speed. Generally, these optimizers adjust the learning rate automatically in response to learning status. These optimizers were gradually improved by incorporating the effective aspects of earlier methods. In this study, we developed a new optimizer: YamAdam. Our optimizer is based on Adam, which utilizes the first and second moments of previous gradients. In addition to the moment estimation system, we incorporated an advantageous part of AdaDelta, namely a unit correction system, into YamAdam. According to benchmark tests on some common datasets, our optimizer showed similar or faster convergent performance compared to the existing methods. YamAdam is an option as an alternative optimizer for deep learning.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Jinhuan Duan ◽  
Xianxian Li ◽  
Shiqi Gao ◽  
Zili Zhong ◽  
Jinyan Wang

With the vigorous development of artificial intelligence technology, various engineering technology applications have been implemented one after another. The gradient descent method plays an important role in solving various optimization problems, due to its simple structure, good stability, and easy implementation. However, in multinode machine learning system, the gradients usually need to be shared, which will cause privacy leakage, because attackers can infer training data with the gradient information. In this paper, to prevent gradient leakage while keeping the accuracy of the model, we propose the super stochastic gradient descent approach to update parameters by concealing the modulus length of gradient vectors and converting it or them into a unit vector. Furthermore, we analyze the security of super stochastic gradient descent approach and demonstrate that our algorithm can defend against the attacks on the gradient. Experiment results show that our approach is obviously superior to prevalent gradient descent approaches in terms of accuracy, robustness, and adaptability to large-scale batches. Interestingly, our algorithm can also resist model poisoning attacks to a certain extent.


Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2510
Author(s):  
Nam D. Vo ◽  
Minsung Hong ◽  
Jason J. Jung

The previous recommendation system applied the matrix factorization collaborative filtering (MFCF) technique to only single domains. Due to data sparsity, this approach has a limitation in overcoming the cold-start problem. Thus, in this study, we focus on discovering latent features from domains to understand the relationships between domains (called domain coherence). This approach uses potential knowledge of the source domain to improve the quality of the target domain recommendation. In this paper, we consider applying MFCF to multiple domains. Mainly, by adopting the implicit stochastic gradient descent algorithm to optimize the objective function for prediction, multiple matrices from different domains are consolidated inside the cross-domain recommendation system (CDRS). Additionally, we design a conceptual framework for CDRS, which applies to different industrial scenarios for recommenders across domains. Moreover, an experiment is devised to validate the proposed method. By using a real-world dataset gathered from Amazon Food and MovieLens, experimental results show that the proposed method improves 15.2% and 19.7% in terms of computation time and MSE over other methods on a utility matrix. Notably, a much lower convergence value of the loss function has been obtained from the experiment. Furthermore, a critical analysis of the obtained results shows that there is a dynamic balance between prediction accuracy and computational complexity.


2021 ◽  
Vol 7 (3) ◽  
pp. 420
Author(s):  
Budi Nugroho ◽  
Eva Yulia Puspaningrum ◽  
M. Syahrul Munir

Penelitian ini berkaitan dengan proses klasifikasi Pneumonia Covid-19 (radang paru-paru atau pneumonia yang disebabkan oleh virus corona SARS-CoV-2) dari citra hasil foto rontgen / x-ray paru-paru dengan menggunakan pendekatan pembelajaran mesin. Klasifikasi dilakukan untuk menentukan apakah kondisi paru-paru seseorang mengalami Pneumonia Covid-19, Pneumonia biasa, atau Normal / Sehat. Untuk menghasilkan kinerja klasifikasi yang lebih baik, proses optimasi seringkali digunakan pada tahap pelatihan data. Banyak teknik yang digunakan untuk melakukan optimasi tersebut, diantaranya adalah algoritma Root-Mean-Square Propagation (RMSprop) dan Stochastic Gradient Descent (SGD). Pada penelitian ini, pengujian dilakukan terhadap kedua metode tersebut untuk mengetahui kinerjanya pada klasifikasi Pneumonia Covid-19. Metode klasifikasi menggunakan Convolutional Neural Network (CNN) yang menerapkan 5 layer konvolusi dengan nilai filter 16, 32, 64, 128, dan 256. Proses pelatihan menggunakan 3.900 citra yang terdiri atas 1.300 citra pneumonia covid-19, 1.300 citra pneumonia, dan 1.300 citra normal. Sedangkan proses validasi menggunakan 450 citra dan proses pengujian mengunakan 225 citra. Berdasarkan uji coba yang telah dilakukan, implementasi algoritma optimasi RMSprop menghasilkan akurasi 87,99%, presisi 0,88, recall 0,86, dan f1 score 0,87. Sedangkan implementasi algoritma optimasi SGD menghasilkan akurasi 66,22%, presisi 0,69, recall 0,64, dan f1 score 0,67. Hasil ini memberikan informasi penting bahwa algoritma optimasi RMSprop menghasilkan kinerja yang jauh lebih baik daripada SGD pada klasifikasi Pneumonia Covid-19.


Author(s):  
Mamta Bisht ◽  
Richa Gupta

The handwriting style of every writer consists of variations, skewness and slanting nature and therefore, it is a stimulating task to recognise these handwritten documents. This article presents a study on various methods available in literature for Devanagari handwritten character recognition and performs its implementation using Convolutional neural network (CNN). Available methods are studied on different parameters and a tabular comparison is also presented which concludes superiority of CNN model in character recognition task. The proposed CNN model results in well acceptable accuracy using dropout and stochastic gradient descent (SGD) optimizer.


2018 ◽  
Vol 10 (03) ◽  
pp. 1850004
Author(s):  
Grant Sheen

Wireless recording and real time classification of brain waves are essential steps towards future wearable devices to assist Alzheimer’s patients in conveying their thoughts. This work is concerned with efficient computation of a dimension-reduced neural network (NN) model on Alzheimer’s patient data recorded by a wireless headset. Due to much fewer sensors in wireless recording than the number of electrodes in a traditional wired cap and shorter attention span of an Alzheimer’s patient than a normal person, the data is much more restrictive than is typical in neural robotics and mind-controlled games. To overcome this challenge, an alternating minimization (AM) method is developed for network training. AM minimizes a nonsmooth and nonconvex objective function one variable at a time while fixing the rest. The sub-problem for each variable is piecewise convex with a finite number of minima. The overall iterative AM method is descending and free of step size (learning parameter) in the standard gradient descent method. The proposed model, trained by the AM method, significantly outperforms the standard NN model trained by the stochastic gradient descent method in classifying four daily thoughts, reaching accuracies around 90% for Alzheimer’s patient. Curved decision boundaries of the proposed model with multiple hidden neurons are found analytically to establish the nonlinear nature of the classification.


Sign in / Sign up

Export Citation Format

Share Document