Use of advanced statistical learning methods and principal component analysis in quantitative structure–genotoxicity relationship study of amines

2011 ◽  
Vol 76 (4) ◽  
pp. 243-264 ◽  
Author(s):  
Yueying Ren ◽  
Baowei Zhao ◽  
Xiaojun Yao

The paper highlighted the use of advanced nonlinear modeling and subset selection techniques in the construction of a good, predictive model for genotoxicity study of amines. Essentials accounting for a reliable model were all considered carefully. Chemicals were represented by a large number of CODESSA descriptors. Division of a whole sample into the training set and the test set was performed by principal component analysis (PCA). Six descriptors selected by the best multi-linear regression (BMLR) method in CODESSA program were used as inputs to build nonlinear models, using advanced statistical learning methods such as support vector machine (SVM) and projection pursuit regression (PPR). The models were validated through three ways, i.e. internal cross-validation (CV), a test set and an independent validation set. Analysis shows that nonlinear models produced better results than linear models and PPR model outperforms the rest in the following order: PPR > SVM > linear SVM ≥ BMLR. In addition, the relationships between the descriptors and the mutagenic behavior of compounds are well discussed.

Energies ◽  
2019 ◽  
Vol 12 (1) ◽  
pp. 196 ◽  
Author(s):  
Lihui Zhang ◽  
Riletu Ge ◽  
Jianxue Chai

China’s energy consumption issues are closely associated with global climate issues, and the scale of energy consumption, peak energy consumption, and consumption investment are all the focus of national attention. In order to forecast the amount of energy consumption of China accurately, this article selected GDP, population, industrial structure and energy consumption structure, energy intensity, total imports and exports, fixed asset investment, energy efficiency, urbanization, the level of consumption, and fixed investment in the energy industry as a preliminary set of factors; Secondly, we corrected the traditional principal component analysis (PCA) algorithm from the perspective of eliminating “bad points” and then judged a “bad spot” sample based on signal reconstruction ideas. Based on the above content, we put forward a robust principal component analysis (RPCA) algorithm and chose the first five principal components as main factors affecting energy consumption, including: GDP, population, industrial structure and energy consumption structure, urbanization; Then, we applied the Tabu search (TS) algorithm to the least square to support vector machine (LSSVM) optimized by the particle swarm optimization (PSO) algorithm to forecast China’s energy consumption. We collected data from 1996 to 2010 as a training set and from 2010 to 2016 as the test set. For easy comparison, the sample data was input into the LSSVM algorithm and the PSO-LSSVM algorithm at the same time. We used statistical indicators including goodness of fit determination coefficient (R2), the root means square error (RMSE), and the mean radial error (MRE) to compare the training results of the three forecasting models, which demonstrated that the proposed TS-PSO-LSSVM forecasting model had higher prediction accuracy, generalization ability, and higher training speed. Finally, the TS-PSO-LSSVM forecasting model was applied to forecast the energy consumption of China from 2017 to 2030. According to predictions, we found that China shows a gradual increase in energy consumption trends from 2017 to 2030 and will breakthrough 6000 million tons in 2030. However, the growth rate is gradually tightening and China’s energy consumption economy will transfer to a state of diminishing returns around 2026, which guides China to put more emphasis on the field of energy investment.


Author(s):  
Hongjuan Yao ◽  
Xiaoqiang Zhao ◽  
Wei Li ◽  
Yongyong Hui

Batch process generally has varying dynamic characteristic that causes low fault detection rate and high false alarm rate, and it is necessary and urgent to monitor batch process. This paper proposes a global enhanced multiple neighborhoods preserving embedding based fault detection strategy for dynamic batch process. Firstly, the angle neighbor is defined and selected to compensate for the insufficient expression for the spatial similarity of samples only by using the distance neighbor, and the time neighbor is introduced to describe the time correlations between samples. These three types of neighbors can fully characterize the similarity of the samples in time and space. Secondly, considering the minimum reconstruction error and the order information of three types of neighbors, an enhanced objective function is constructed to prevent the loss of order information when neighborhood preserving embedding (NPE) calculates the reconstruction weights. Furthermore, the enhanced objective function and a global objective function are organically combined to extract both global and local features, to describe process dynamics and visualize process data in a low-dimensional space. Finally, a monitoring index based on support vector data description is constructed to eliminate adverse effects of non-Gaussian data for monitoring performance. The advantages of the proposed method over principal component analysis, neighborhood preserving embedding, dynamic principal component analysis and time NPE are demonstrated by a numerical example and the penicillin fermentation process simulation.


Entropy ◽  
2018 ◽  
Vol 20 (9) ◽  
pp. 701 ◽  
Author(s):  
Beige Ye ◽  
Taorong Qiu ◽  
Xiaoming Bai ◽  
Ping Liu

In view of the nonlinear characteristics of electroencephalography (EEG) signals collected in the driving fatigue state recognition research and the issue that the recognition accuracy of the driving fatigue state recognition method based on EEG is still unsatisfactory, this paper proposes a driving fatigue recognition method based on sample entropy (SE) and kernel principal component analysis (KPCA), which combines the advantage of the high recognition accuracy of sample entropy and the advantages of KPCA in dimensionality reduction for nonlinear principal components and the strong non-linear processing capability. By using support vector machine (SVM) classifier, the proposed method (called SE_KPCA) is tested on the EEG data, and compared with those based on fuzzy entropy (FE), combination entropy (CE), three kinds of entropies including SE, FE and CE that merged with KPCA. Experiment results show that the method is effective.


Sign in / Sign up

Export Citation Format

Share Document