Prognostic outcome prediction by semi-supervised least squares classification

Author(s):  
Mingguang Shi ◽  
Zhou Sheng ◽  
Hao Tang

Abstract Although great progress has been made in prognostic outcome prediction, small sample size remains a challenge in obtaining accurate and robust classifiers. We proposed the Rescaled linear square Regression based Least Squares Learning (RRLSL), a jointly developed semi-supervised feature selection and classifier, for predicting prognostic outcome of cancer patients. RRLSL used the least square regression to identify the scale factors and then rank the features in available multiple types of molecular data. We applied the unlabeled multiple molecular data in conjunction with the labeled data to develop a similarity graph. RRLSL produced the constraint with kernel functions to bridge the gap between label information and geometry information from messenger RNA and microRNA expression profiling. Importantly, this semi-supervised model proposed the least squares learning with L2 regularization to develop a semi-supervised classifier. RRLSL suggested the performance improvement in the prognostic outcome prediction and successfully discriminated between the recurrent patients and non-recurrent ones. We also demonstrated that RRLSL improved the accuracy and Area Under the Precision Recall Curve (AUPRC) as compared to the baseline semi-supervised methods. RRLSL is available for a stand-alone software package (https://github.com/ShiMGLab/RRLSL). A short abstract We proposed the Rescaled linear square Regression based Least Squares Learning (RRLSL), a jointly developed semi-supervised feature selection and classifier, for predicting prognostic outcome of cancer patients. RRLSL used the least square regression to identify the scale factors to rank the features in available multiple types of molecular data. RRLSL produced the constraint with kernel functions to bridge the gap between label information and geometry information from messenger RNA and microRNA expression profiling. Importantly, this semi-supervised model proposed the least squares learning with L2 regularization to develop the semi-supervised classifier. RRLSL suggested the performance improvement in the prognostic outcome prediction and successfully discriminated between the recurrent patients and non-recurrent ones.

2014 ◽  
Vol 1051 ◽  
pp. 1023-1027
Author(s):  
Xiao Min Yang ◽  
Bin Yu Yan ◽  
Zong Rui Yang

Commingling is employed in the petroleum industry to enhance oil recovery and reduce costs. It is of great importance to monitor the production of each oil well oilfields. Nowadays, more and more oilfields use chromatographic fingerprint to estimate single-zone production allocation. In order to insure the efficiency and affectivity of the commingled oil well exploiting, the productivity contribution of every single layer must be acquainted. Kernel partial least squares (KPLS) is a promising regression method for tackling nonlinear systems because it can efficiently compute regression coefficients in high-dimensional feature spaces by means of nonlinear kernel functions. Unlike other nonlinear partial least squares (PLS) techniques KPLS does not entail any nonlinear optimization procedures and has a complexity similar to that of linear PLS. Using the technology of crude oil chromatography fingerprint, an algorithm for predicting productivity contribution based on KPLS is proposed. The validity of the method is proved by laboratory artificial experiments. The maximum absolute error of predicted and real proportion is less than 10%. The model can also be applied to other wells which are similar to those used in the experiment. The experiment results show the prediction model is feasible.


2020 ◽  
Vol 17 (1) ◽  
pp. 87-94
Author(s):  
Ibrahim A. Naguib ◽  
Fatma F. Abdallah ◽  
Aml A. Emam ◽  
Eglal A. Abdelaleem

: Quantitative determination of pyridostigmine bromide in the presence of its two related substances; impurity A and impurity B was considered as a case study to construct the comparison. Introduction: Novel manipulations of the well-known classical least squares multivariate calibration model were explained in detail as a comparative analytical study in this research work. In addition to the application of plain classical least squares model, two preprocessing steps were tried, where prior to modeling with classical least squares, first derivatization and orthogonal projection to latent structures were applied to produce two novel manipulations of the classical least square-based model. Moreover, spectral residual augmented classical least squares model is included in the present comparative study. Methods: 3 factor 4 level design was implemented constructing a training set of 16 mixtures with different concentrations of the studied components. To investigate the predictive ability of the studied models; a test set consisting of 9 mixtures was constructed. Results: The key performance indicator of this comparative study was the root mean square error of prediction for the independent test set mixtures, where it was found 1.367 when classical least squares applied with no preprocessing method, 1.352 when first derivative data was implemented, 0.2100 when orthogonal projection to latent structures preprocessing method was applied and 0.2747 when spectral residual augmented classical least squares was performed. Conclusion: Coupling of classical least squares model with orthogonal projection to latent structures preprocessing method produced significant improvement of the predictive ability of it.


2013 ◽  
Vol 694-697 ◽  
pp. 2545-2549 ◽  
Author(s):  
Qian Wen Cheng ◽  
Lu Ben Zhang ◽  
Hong Hua Chen

The key point researched by many scholars in the field of surveying and mapping is how to use the given geodetic height H measured by GPS to obtain the normal height. Although many commonly-used fitting methods have solved many problems, they all value the pending parameters as the nonrandom variables. Figuring out the best valuations, according to the traditional least square principle, only considers its trend or randomness, which is theoretically incomprehensive and have limitations in practice. Therefore, a method is needed not only considers its trend but also takes randomness into account. This method is called the least squares collocation.


2012 ◽  
Vol 591-593 ◽  
pp. 850-853
Author(s):  
Huai Xing Wen ◽  
Yong Tao Yang

Drawing Dies meter A / D acquisition module will be collected from the mold hole contour data to draw a curve in Matlab. According to the mold pore structure characteristics of the curve, the initial cut-off point of each part of contour is determined and iteratived optimization to find the best cut-off point, use the least squares method for fitting piecewise linear and fitting optimization to find the function of the various parts of the curve function, finally calculate the pass parameters of drawing mode. Parameters obtained compare with the standard mold, both of errors are relatively small that prove the correctness of the algorithm. Also a complete algorithm flow of pass parameters is designed, it can fast and accurately measure the wire drawing die hole parameters.


2013 ◽  
Vol 278-280 ◽  
pp. 1323-1326
Author(s):  
Yan Hua Yu ◽  
Li Xia Song ◽  
Kun Lun Zhang

Fuzzy linear regression has been extensively studied since its inception symbolized by the work of Tanaka et al. in 1982. As one of the main estimation methods, fuzzy least squares approach is appealing because it corresponds, to some extent, to the well known statistical regression analysis. In this article, a restricted least squares method is proposed to fit fuzzy linear models with crisp inputs and symmetric fuzzy output. The paper puts forward a kind of fuzzy linear regression model based on structured element, This model has precise input data and fuzzy output data, Gives the regression coefficient and the fuzzy degree function determination method by using the least square method, studies the imitation degree question between the observed value and the forecast value.


Transport ◽  
2011 ◽  
Vol 26 (2) ◽  
pp. 197-203 ◽  
Author(s):  
Yanrong Hu ◽  
Chong Wu ◽  
Hongjiu Liu

A support vector machine is a machine learning method based on the statistical learning theory and structural risk minimization. The support vector machine is a much better method than ever, because it may solve some actual problems in small samples, high dimension, nonlinear and local minima etc. The article utilizes the theory and method of support vector machine (SVM) regression and establishes the regressive model based on the least square support vector machine (LS-SVM). Through predicting passenger flow on Hangzhou highway in 2000–2008, the paper shows that the regressive model of LS-SVM has much higher accuracy and reliability of prediction, and therefore may effectively predict passenger flow on the highway. Santrauka Atraminių vektorių metodas (Support Vector Machine – SVM) yra skaičiuojamasis metodas, paremtas statistikos teorija, struktūriniu požiūriu mažinant riziką. SVM metodas, palyginti su kitais metodais, yra patikimesnis metodas, nes juo remiantis galima išspręsti realias problemas, esant įvairioms sąlygoms. Tyrimams naudojama SVM metodo regresijos teorija ir sukuriamas regresinis modelis, kuris grindžiamas mažiausių kvadratų atraminių vektorių metodu (Least Squares Support Vector Machine – LS-SVM). Straipsnio autoriai prognozuoja keleivių srautą Hangdžou (Kinija) greitkelyje 2000–2008 m. Gauti rezultatai rodo, kad regresinis LS-SVM modelis yra labai tikslus ir patikimas, todėl gali būti efektyviai taikomas keleivių srautams prognozuoti greitkeliuose. Резюме Метод опорных векторов (Support Vector Machine – SVM) – это набор аналогичных алгоритмов вида «обучение с учителем», использующихся для задач классификации и регрессионного анализа. Метод SVM принадлежит к семейству линейных классификаторов. Основная идея метода SVM заключается в переводе исходных векторов в пространство более высокой размерности и поиске разделяющей гиперплоскости с максимальным зазором в этом пространстве. Алгоритм работает в предположении, что чем больше разница или расстояние между параллельными гиперплоскостями, тем меньше будет средняя ошибка классификатора. В сравнении с другими методами метод SVM более надежен и позволяет решать проблемы с различными условиями. Для исследования был использован метод SVM и регрессионный анализ, затем создана регрессионная модель, основанная на методе опорных векторов с квадратичной функцией потерь (Least Squares Support Vector Machine – LS-SVM). Авторы прогнозировали пассажирский поток на автомагистрали Ханчжоу (Китай) в 2000–2008 гг. Полученные результаты показывают, что регрессионная модель LS-SVM является надежной и может быть применена для прогнозирования пассажирских потоков на других магистралях.


Author(s):  
Bo Wang ◽  
Chen Sun ◽  
Keming Zhang ◽  
Jubing Chen

Abstract As a representative type of outlier, the abnormal data in displacement measurement often inevitably occurred in full-field optical metrology and significantly affected the further evaluation, especially when calculating the strain field by differencing the displacement. In this study, an outlier removal method is proposed which can recognize and remove the abnormal data in optically measured displacement field. A iterative critical factor least squares algorithm (CFLS) is developed which distinguishes the distance between the data points and the least square plane to identify the outliers. A successive boundary point algorithm is proposed to divide the measurement domain to improve the applicability and effectiveness of the CFLS algorithm. The feasibility and precision of the proposed method are discussed in detail through simulations and experiments. Results show that the outliers are reliably recognized and the precision of the strain estimation is highly improved by using these methods.


Sign in / Sign up

Export Citation Format

Share Document