scholarly journals Application Of L1- Regularization Approach In QSAR Problem. Linear Regression And Artificial Neural Networks

2019 ◽  
Vol 14 (2) ◽  
pp. 79-90
Author(s):  
M.I. Berdnyk ◽  
A.B. Zakharov ◽  
V.V. Ivanov

One of the primary tasks of analytical chemistry and QSAR/QSPR researches is building of prognostic regression equations based on descriptors sets. The one of the most important problems here is to decrease the number of descriptors in the initial descriptor set which is usually way too big. In current investigation the descriptor set is proposed to be reduced employing the least absolute shrinkage and selection operator (LASSO) approach. Decreased descriptor sets were used for calculations with application of the following QSAR/QSPR methods: ordinary least squares (OLS), the least absolute deviation (LAD) regressions and artificial neural networks (ANN). Contrary to aforementioned methods principal component regression (PCR) and partial least squares (PLS) approaches can produce solutions containing numerous descriptors. In this article we compared the viability of these two different descriptor handling ideologies in application to molecular chemical and physical properties prediction. From the obtained results it is possible to see that there are tasks for which PCR and PLS approaches can fail to produce accurate regression equations. At the same time, methods OLS and LAD that use small amount of descriptors can provide viable solutions for the same cases. It was shown that these small sets of descriptors selected with LASSO approach can be used in ANN to obtain models with even better internal validation characteristics.

1994 ◽  
Vol 48 (1) ◽  
pp. 21-26
Author(s):  
M. Marjoniemi

In this article artificial neural networks (ANNs) are applied for multivariate calibration using spectroscopic data and for generation of quantitative estimates of the concentrations of a component (chromium) in solutions. Neural networks are capable of handling nonlinear relationships. Absorbance is nonlinearly dependent on concentration, especially in the case of wide concentration ranges and multicomponent solutions. In addition to the aforementioned reasons, nonlinearities are also caused by aging and by differences in pH and in the temperatures of the chromium-tanning solutions to be modeled. The sigmoid output function was used in the hidden layer to perform nonlinear fitting. The results are compared with the results obtained with principal component regression (PCR) and partial least-squares regression (PLS) methods.


Sign in / Sign up

Export Citation Format

Share Document