Partial least square and k-nearest neighbor algorithms for improved 3D quantitative spectral data-activity relationship consensus modeling of acute toxicity

Iva B. Stoyanova-Slavova; Svetoslav H. Slavov; Bruce Pearce; Dan A. Buzatu; Richard D. Beger; Jon G. Wilkes

doi:10.1002/etc.2534

Quantitative structure−pharmacokinetic parameters relationships (QSPKR) analysis of antimicrobial agents in humans using simulated annealing k‐nearest‐neighbor and partial least‐square analysis methods**This paper was presented in part at the Annual Meeting of the American Association of Pharmaceutical Scientists in Toronto in 2003.

Journal of Pharmaceutical Sciences ◽

10.1002/jps.20117 ◽

2004 ◽

Vol 93 (10) ◽

pp. 2535-2544 ◽

Cited By ~ 24

Author(s):

Chee Ng ◽

Yunde Xiao ◽

Wendy Putnam ◽

Bert Lum ◽

Alexander Tropsha

Keyword(s):

Simulated Annealing ◽

Antimicrobial Agents ◽

Nearest Neighbor ◽

American Association ◽

Partial Least Square ◽

Least Square ◽

Pharmacokinetic Parameters ◽

Quantitative Structure ◽

K Nearest Neighbor ◽

Partial Least Square Analysis

Download Full-text

Comparison of Approaches to Photometric Redshift Estimation of Quasars

Proceedings of the International Astronomical Union ◽

10.1017/s1743921315009989 ◽

2015 ◽

Vol 11 (S319) ◽

pp. 146-146

Author(s):

Yang Tu ◽

Yan-Xia Zhang ◽

Yong-Heng Zhao ◽

Hai-Jun Tian

Keyword(s):

Gradient Descent ◽

Nearest Neighbor ◽

Partial Least Square ◽

Least Square ◽

Partial Least Square Regression ◽

Stochastic Gradient Descent ◽

K Nearest Neighbor ◽

Photometric Redshift ◽

K Nearest Neighbor Algorithm ◽

Selection Operator

AbstractWe probe many kinds of approaches used for photometric redshift estimation of quasars, including KNN (K-nearest neighbor algorithm), Lasso (Least Absolute Shrinkage and Selection Operator), PLS (Partial Least Square regression), ridge regression, SGD (Stochastic Gradient Descent) and Extra-Trees.

Download Full-text

DEVELOPMENT OF A ROBUST QSAR MODEL OF ANGIOTENSIN RECEPTOR REVEALS A K NEAREST NEIGHBOR APPLICABLE TO DIVERSE SCAFFOLDS

INDIAN DRUGS ◽

10.53879/id.54.06.10947 ◽

2017 ◽

Vol 54 (06) ◽

pp. 30-36

Author(s):

M. C Sharma ◽

◽

D. V. Kohli

Keyword(s):

Biological Activity ◽

Nearest Neighbor ◽

3D Qsar ◽

Qsar Model ◽

Partial Least Square ◽

Least Square ◽

Positive Contribution ◽

K Nearest Neighbor ◽

Qsar Studies ◽

2D Qsar

Quantitative structure–activity relationship (QSAR) studies were performed on quinazolinone analogues for prediction of antihypertensive activity. The best significant 2D-QSAR model having r2 = 0.8118 and pred_r2 = 0.7428 was developed by stepwise-partial least square method. k-nearest neighbor molecular field analysis was used to construct the best 3D-QSAR model, showing good correlative and predictive capabilities in terms of q2 = 0.7388 and pred_r2 = 0.6983. Results reveal that the 2D-QSAR studies signify positive contribution of SssOE index and SsCH3 count towards the biological activity. The results have showed that electronegative groups are necessary for activity and halogen, bulky, less bulky groups in quinazolinones nucleus enhanced the biological activity. The information rendered by 2D- and 3D-QSAR models may lead to a better understanding of structural requirements of substituted quinazolinones derivatives and also aid in designing novel potent antihypertensive molecules.

Download Full-text

A comparative study of different imputation methods for daily rainfall data in east-coast Peninsular Malaysia

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v9i2.2090 ◽

2020 ◽

Vol 9 (2) ◽

Author(s):

Siti Mariana Che Mat Nor ◽

Shazlyn Milleana Shaharudin ◽

Shuhaida Ismail ◽

Nurul Hila Zainuddin ◽

Mou Leong Tan

Keyword(s):

Missing Values ◽

Nearest Neighbor ◽

East Coast ◽

Daily Rainfall ◽

Absolute Error ◽

Peninsular Malaysia ◽

Partial Least Square ◽

Least Square ◽

Rainfall Data ◽

Efficiency Coefficient

Rainfall data are the most significant values in hydrology and climatology modelling. However, the datasets are prone to missing values due to various issues. This study aspires to impute the rainfall missing values by using various imputation method such as Replace by Mean, Nearest Neighbor, Random Forest, Non-linear Interactive Partial Least-Square (NIPALS) and Markov Chain Monte Carlo (MCMC). Daily rainfall datasets from 48 rainfall stations across east-coast Peninsular Malaysia were used in this study. The dataset were then fed into Multiple Linear Regression (MLR) model. The performance of abovementioned methods were evaluated using Root Mean Square Method (RMSE), Mean Absolute Error (MAE) and Nash-Sutcliffe Efficiency Coefficient (CE). The experimental results showed that RF coupled with MLR (RF-MLR) approach was attained as more fitting for satisfying the missing data in east-coast Peninsular Malaysia.

Download Full-text

Method for predicting lignocellulose components in jute by transformed FT-NIR spectroscopic data and chemometrics

Nordic Pulp & Paper Research Journal ◽

10.1515/npprj-2018-0018 ◽

2019 ◽

Vol 34 (1) ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

M. Nashir Uddin ◽

Sohan Ahmed ◽

Swapan Kumer Ray ◽

M. Saiful Islam ◽

Ariful Hai Quadery ◽

...

Keyword(s):

Chemical Composition ◽

Spectral Data ◽

Spectroscopic Data ◽

Partial Least Square ◽

Least Square ◽

Partial Least Square Regression ◽

Chemical Compositions ◽

Nondestructive Technique ◽

Jute Fibers ◽

Calibration Techniques

Abstract In this investigation, a nondestructive technique has been developed for determining chemical composition of jute fiber by chemometric modeling with pretreated FT-NIR spectroscopic data. The chemical composition of jute fibers in wet chemical method were, 58 to 61.80 % α-cellulose, 13.0 to 21.90 % lignin, 9.89 to 16.8 % pentosan and 79.02 to 88.33 % holocellulose. FT-NIR spectral data from range 9000–4000 cm−1 of all jute samples were collected from the instrument. Spectral data of jute samples were pretreated with second order derivatives (SOD), standard normal variate (SNV) techniques and both together were used before calibration. Two chemometric calibration techniques: partial least square regression (PLSR) and artificial neural network (ANN) were assessed for predicting chemical compositions of Jute fibers. Result shows that prediction efficiency ({\text{R}^{2}}) of ANN varies from 72–99 % for calibration, validation and test datasets. However, by PLSR, {\text{R}^{2}} are much higher and consistent than those by earlier one. For α-cellulose, lignin, pentosan and holocellulose {\text{R}^{2}} values hover around 95–99 %. Thereby, a non-destructive, simple and cost effective novel method is being proposed to determine chemical compositions of jute with pretreated FT-NIR spectral data and chemometric calibration techniques.

Download Full-text

WSN based Improved Bayesian Algorithm Combined with Enhanced Least-Squares Algorithm for Target Localizing and Tracking

March 2019 - IRO Journal on Sustainable Wireless Systems ◽

10.36548/jsws.2020.2.001 ◽

2020 ◽

Vol 2 (2) ◽

pp. 59-67

Author(s):

Dr. Wang Haoxiang ◽

Dr. Smys S.

Keyword(s):

Kalman Filter ◽

Expectation Maximization ◽

Nearest Neighbor ◽

Joint Probability ◽

Target Prediction ◽

Least Square ◽

K Nearest Neighbor ◽

Bayesian Algorithm ◽

Individual Measurement ◽

Localization And Tracking

For wireless sensor network (WSN), localization and tracking of targets are implemented extensively by means of traditional tracking algorithms like classical least-square (CLS) algorithm, extended Kalman filter (EKF) and the Bayesian algorithm. For the purpose of tracking and moving target localization of WSN, this paper proposes an improved Bayesian algorithm that combines the principles of least-square algorithm. For forming a matrix of range joint probability and using target predictive location of obtaining a sub-range probability set, an improved Bayesian algorithm is implemented. During the dormant state of the WSN testbed, an automatic update of the range joint probability matrix occurs. Further, the range probability matrix is used for the calculation and normalization of the weight of every individual measurement. Lastly, based on the weighted least-square algorithm, calculation of the target prediction position and its correction value is performed. The accuracy of positioning of the proposed algorithm is improved when compared to variational Bayes expectation maximization (VBEM), dual-factor enhanced VBAKF (EVBAKF), variational Bayesian adaptive Kalman filtering (VBAKF), the fingerprint Kalman filter (FKF), the position Kalman filter (PKF), the weighted K-nearest neighbor (WKNN) and the EKF algorithms with the values of 0.5%, 7%, 14%, 19%, 33% and 35% respectively. Along with this, when compared to Bayesian algorithm, the computation burden is reduced by the proposed algorithm by a factor of over 80%.

Download Full-text

National spectral data and learning algorithms for potentially toxic elements modelling in forest soil horizons

10.31219/osf.io/2jusw ◽

2020 ◽

Author(s):

Asa Gholizadeh ◽

Mohammadmehdi Saberioon ◽

Eyal Ben Dor ◽

Raphael A. Viscarra Rossel ◽

Lubos Boruvka

Keyword(s):

Czech Republic ◽

Spectral Data ◽

Forest Soils ◽

Toxic Elements ◽

Potentially Toxic Elements ◽

Partial Least Square ◽

Least Square ◽

Support Vector ◽

The Czech Republic ◽

Organic Horizons

Forest ecosystems are among the main parts of the biosphere; however, they have been endangered from the significant elevation and harmful effects of air and soil pollutants, including potentially toxic elements (PTEs). The concentration of PTEs in forest soils varies not only laterally but also vertically with depth. Forest surface organic horizons are of particular interest in forest ecosystem monitoring due to their role as stable adsorbents of the deposited atmospheric substances. Therefore, the main purpose of this study was to conduct rapid examinations of forest soils PTEs (Cr, Cu, Pb, Zn, and Al), testing the capability of VIS--NIR spectroscopy coupled with machine learning (ML) techniques (partial least square regression (PLSR), support vector machine regression (SVMR), and random forest (RF)) and fully connected neural network (FNN), a deep learning (DL) approach, in forest organic horizons. One-thousand-and-eighty forested sites across the Czech Republic at two soil layers, defining the fragmented (F) and humus (H) organic horizons, were investigated (total 2160 samples). PTEs as well as total Fe and SOC, as auxiliary data, were conventionally and spectrally determined and modelled in the combined organic horizons (F + H) and in each individual horizon using the ML and DL algorithms. Results indicated that the concentration of all PTEs was higher in the horizon H compared to the F horizon. Although the spectral reflectance of samples tended to decrease with increased PTEs concentration. Strongly significant positive correlations between all PTEs and total Fe in all horizons were obtained, which were higher in the H and F + H horizons than the F horizon. The highest correlations of PTEs with the spectra were at 460--590~nm, which is mostly linked to the presence of Fe-oxide. These results show the importance of Fe for spectral prediction of PTEs. Cr and Al were the most accurately predicted elements, regardless of the applied learning technique. SVMR provided the best results in assessing the H horizon (e.g., R\(^2\) = 0.88 and root mean square error (RMSE) = 3.01~mg/kg, and R\(^2\) = 0.82 and RMSE = 1682.25~mg/kg for Cr and Al, respectively); however, FNN predicted the combined F + H horizons the best (R\(^2\) = 0.89 and RMSE = 2.95~mg/kg, and R\(^2\) = 0.86 and RMSE = 1593.64~mg/kg for Cr and Al, respectively) due to the larger number of samples. In the F horizon, almost no parameters were predicted adequately. This study shows that given the availability of larger sample sizes, FNN can be a more promising technique compared to ML methods for assessment of Cr and Al concentration based on national spectral data in the forests of the Czech Republic.

Download Full-text

An RSSD-Based Fingerprint Positioning Method for Detection of an Unknown Radio Transmitter Using WLS and Factor Graph

Mathematical Problems in Engineering ◽

10.1155/2018/8052415 ◽

2018 ◽

Vol 2018 ◽

pp. 1-11

Author(s):

Liyang Zhang ◽

Taihang Du ◽

Chundong Jiang

Keyword(s):

Nearest Neighbor ◽

Access Point ◽

Reference Points ◽

Least Square ◽

Factor Graph ◽

K Nearest Neighbor ◽

Positioning Accuracy ◽

Radio Transmitter ◽

Error Weight ◽

The Impact

Realizing accurate detection of an unknown radio transmitter (URT) has become a challenge problem due to its unknown parameter information. A method based on received signal strength difference (RSSD) fingerprint positioning technique and using factor graph (FG) has been successfully developed to achieve the localization of an URT. However, the RSSD-based FG model is not accurate enough to express the relationship between the RSSD and the corresponding location coordinates since the RSSD variances of reference points are different in practice. This paper proposes an enhanced RSSD-based FG algorithm using weighted least square (WLS) to effectively reduce the impact of RSSD measurement variance difference on positioning accuracy. By the use of stochastic RSSD errors between the measured value and the estimated value of the selected reference points, we utilize the error weight matrix to establish a new WLSFG model. Then, the positioning process of proposed RSSD-WLSFG algorithm is derived with the sum-product principle. In addition, the paper also explores the effects of different access point (AP) numbers and grid distances on positioning accuracy. The simulation experiment results show that the proposed algorithm can obtain the best positioning performance compared with the conventional RSSD-based K nearest neighbor (RSSD-KNN) and RSSD-FG algorithms in the case of different AP numbers and grid distances.

Download Full-text

Robust Wavelength Selection Using Filter-Wrapper Method and Input Scaling on Near Infrared Spectral Data

Sensors ◽

10.3390/s20175001 ◽

2020 ◽

Vol 20 (17) ◽

pp. 5001 ◽

Cited By ~ 1

Author(s):

Divo Dharma Silalahi ◽

Habshah Midi ◽

Jayanthi Arasan ◽

Mohd Shafie Mustafa ◽

Jean-Pierre Caliman

Keyword(s):

Monte Carlo ◽

Spectral Data ◽

Near Infrared ◽

Selection Procedure ◽

Partial Least Square ◽

Least Square ◽

Partial Least Square Regression ◽

Tolerance Interval ◽

Wavelength Selection ◽

Wrapper Method

The extraction of relevant wavelengths from a large dataset of Near Infrared Spectroscopy (NIRS) is a significant challenge in vibrational spectroscopy research. Nonetheless, this process allows the improvement in the chemical interpretability by emphasizing the chemical entities related to the chemical parameters of samples. With the complexity in the dataset, it may be possible that irrelevant wavelengths are still included in the multivariate calibration. This yields the computational process to become unnecessary complex and decreases the accuracy and robustness of the model. In multivariate analysis, Partial Least Square Regression (PLSR) is a method commonly used to build a predictive model from NIR spectral data. However, in the PLSR method and common commercial chemometrics software, there is no standard wavelength selection procedure applied to screen the irrelevant wavelengths. In this study, a new robust wavelength selection procedure called the modified VIP-MCUVE (mod-VIP-MCUVE) using Filter-Wrapper method and input scaling strategy is introduced. The proposed method combines the modified Variable Importance in Projection (VIP) and modified Monte Carlo Uninformative Variable Elimination (MCUVE) to calculate the scale matrix of the input variable. The modified VIP uses the orthogonal components of Partial Least Square (PLS) in investigating the informative variable in the model by applying the amount of variation both in X and y{SSX,SSY}, simultaneously. The modified MCUVE uses a robust reliability coefficient and a robust tolerance interval in the selection procedure. To evaluate the superiority of the proposed method, the classical VIP, MCUVE, and autoscaling procedure in classical PLSR were also included in the evaluation. Using artificial data with Monte Carlo simulation and NIR spectral data of oil palm (Elaeis guineensis Jacq.) fruit mesocarp, the study shows that the proposed method offers advantages to improve model interpretability, to be computationally extensive, and to produce better model accuracy.

Download Full-text

Continuous wide spectrum odor sensing for electronic nose system

Sensor Review ◽

10.1108/sr-04-2017-0067 ◽

2018 ◽

Vol 38 (2) ◽

pp. 223-230

Author(s):

Wenli Zhang ◽

Fengchun Tian ◽

An Song ◽

Zhenzhen Zhao ◽

Youwen Hu ◽

...

Keyword(s):

Support Vector Machine ◽

Nearest Neighbor ◽

Wide Spectrum ◽

Least Square ◽

Support Vector ◽

Detection Accuracy ◽

K Nearest Neighbor ◽

Sensing Element ◽

Content Type ◽

System Errors

Purpose This paper aims to propose an odor sensing system based on wide spectrum for e-nose, based on comprehensive analysis on the merits and drawbacks of current e-nose. Design/methodology/approach The wide spectral light is used as the sensing medium in the e-nose system based on continuous wide spectrum (CWS) odor sensing, and the sensing response of each sensing element is the change of light intensity distribution. Findings Experimental results not only verify the feasibility and effectiveness of the proposed system but also show the effectiveness of least square support vector machine (LSSVM) in eliminating system errors. Practical implications Theoretical model of the system was constructed, and experimental tests were carried out by using NO2 and SO2. System errors in the test data were eliminated using the LSSVM, and the preprocessed data were classified by euclidean distance to centroids (EDC), k-nearest neighbor (KNN), support vector machine (SVM), LSSVM, respectively. Originality/value The system not only has the advantages of current e-nose but also realizes expansion of sensing array by means of light source and the spectrometer with their wide spectrum, high resolution characteristics which improve the detection accuracy and realize real-time detection.

Download Full-text