scholarly journals New Application of Life Rank Algorithm: A Case Study

2021 ◽  
Vol 23 (06) ◽  
pp. 438-447
Author(s):  
Neha Sharma ◽  
Dr. RashiAgarwal ◽  
Dr. NarendraKohli ◽  
Dr. Shubha Jain

The past few years have seen the emergence of learning-to-rank (LTR) in the field of machine learning. In information acquiring the size of data is very large and empowering a learning-to-rank model on it will be a costly and time taking process. High dimension data leads to irrelevant and redundant data which results in overfitting. “Dimensionality reduction” methods are used to manage this issue. There are two-dimensionality reduction techniques namely feature selection and feature reduction. There is extensive research available on the algorithm for learning-to-rank but this not the case for dimensionality reduction approaches in LTR, despite its importance. Feature selection techniques for classification are directly used for ranking. To the best of our understanding, feature extraction techniques in the context of ranking problems are not explored much to date. So, we make an effort to fill this void and explore feature extraction in the context of LTR problems. The LifeRank algorithm is a linear feature extraction algorithm for ranking. Its performance is analyzed on RankSVM and Linear regression. It is not applied to other learning-to-rank algorithms. So, in this task, an attempt is made to study the effect of the application of the LifeRank algorithm on other LTR algorithms. LifeRank algorithm is applied on RankNet and RankBoost. Then, the performance of several LTR algorithms on the LETOR dataset is analyzed before and after feature extraction.

2011 ◽  
Vol 08 (02) ◽  
pp. 161-169
Author(s):  
E. SIVASANKAR ◽  
H. SRIDHAR ◽  
V. BALAKRISHNAN ◽  
K. ASHWIN ◽  
R. S. RAJESH

Data mining methods are used to mine voluminous data to find useful information from data. The data that is to be mined may have a large number of dimensions, so the mining process will take a lot of time. In general, the computation time is an exponential function of the number of dimensions. It is in this context that we use dimensionality reduction techniques to speed up the decision-making process. Dimensionality reduction techniques can be categorized as Feature Selection and Feature Extraction Techniques. In this paper we compare the two categories of dimensionality reduction techniques. Feature selection has been implemented using the Information Gain and Goodman–Kruskal measure. Principal Component Analysis has been used for Feature Extraction. In order to compare the accuracy of the methods, we have also implemented a classifier using back-propagation neural network. In general, it is found that feature extraction methods are more accurate than feature selection methods in the framework of credit risk analysis.


2020 ◽  
Vol 1 (2) ◽  
pp. 56-70 ◽  
Author(s):  
Rizgar Zebari ◽  
Adnan Abdulazeez ◽  
Diyar Zeebaree ◽  
Dilovan Zebari ◽  
Jwan Saeed

Due to sharp increases in data dimensions, working on every data mining or machine learning (ML) task requires more efficient techniques to get the desired results. Therefore, in recent years, researchers have proposed and developed many methods and techniques to reduce the high dimensions of data and to attain the required accuracy. To ameliorate the accuracy of learning features as well as to decrease the training time dimensionality reduction is used as a pre-processing step, which can eliminate irrelevant data, noise, and redundant features. Dimensionality reduction (DR) has been performed based on two main methods, which are feature selection (FS) and feature extraction (FE). FS is considered an important method because data is generated continuously at an ever-increasing rate; some serious dimensionality problems can be reduced with this method, such as decreasing redundancy effectively, eliminating irrelevant data, and ameliorating result comprehensibility. Moreover, FE transacts with the problem of finding the most distinctive, informative, and decreased set of features to ameliorate the efficiency of both the processing and storage of data. This paper offers a comprehensive approach to FS and FE in the scope of DR. Moreover, the details of each paper, such as used algorithms/approaches, datasets, classifiers, and achieved results are comprehensively analyzed and summarized. Besides, a systematic discussion of all of the reviewed methods to highlight authors' trends, determining the method(s) has been done, which significantly reduced computational time, and selecting the most accurate classifiers. As a result, the different types of both methods have been discussed and analyzed the findings.  


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Joshua T. Vogelstein ◽  
Eric W. Bridgeford ◽  
Minh Tang ◽  
Da Zheng ◽  
Christopher Douville ◽  
...  

AbstractTo solve key biomedical problems, experimentalists now routinely measure millions or billions of features (dimensions) per sample, with the hope that data science techniques will be able to build accurate data-driven inferences. Because sample sizes are typically orders of magnitude smaller than the dimensionality of these data, valid inferences require finding a low-dimensional representation that preserves the discriminating information (e.g., whether the individual suffers from a particular disease). There is a lack of interpretable supervised dimensionality reduction methods that scale to millions of dimensions with strong statistical theoretical guarantees. We introduce an approach to extending principal components analysis by incorporating class-conditional moment estimates into the low-dimensional projection. The simplest version, Linear Optimal Low-rank projection, incorporates the class-conditional means. We prove, and substantiate with both synthetic and real data benchmarks, that Linear Optimal Low-Rank Projection and its generalizations lead to improved data representations for subsequent classification, while maintaining computational efficiency and scalability. Using multiple brain imaging datasets consisting of more than 150 million features, and several genomics datasets with more than 500,000 features, Linear Optimal Low-Rank Projection outperforms other scalable linear dimensionality reduction techniques in terms of accuracy, while only requiring a few minutes on a standard desktop computer.


2019 ◽  
Vol 2019 ◽  
pp. 1-19
Author(s):  
Yaolong Li ◽  
Hongru Li ◽  
Bing Wang ◽  
He Yu ◽  
Weiguo Wang

The bearings’ degradation features are crucial to assess the performance degradation and predict the remaining useful life of rolling bearings. So far, numerous degradation features have been proposed. Many researchers have devoted to use dimensionality reduction methods to reduce the redundancy of those features. However, they have not considered the properties and similarity of those features. In this paper, we present a simple way to reduce dimensionality by classifying different features based on their trends. And the degradation features can be classified into two subdivisions, namely, uptrends and downtrends. In each subdivision, there exists visible trend similarity, and we have introduced two indexes to measure this similarity. By selecting the representative features of the subdivision, the multifeatures can be dimensionality reduced. Through the comparison, the root mean square and sample entropy are two good representatives of uptrend and downtrend features. This method gives an alternative way for dimensionality reduction of the rolling bearings’ degradation features.


2019 ◽  
Vol 8 (2) ◽  
pp. 4800-4807

Recently, engineers are concentrating on designing an effective prediction model for finding the rate of student admission in order to raise the educational growth of the nation. The method to predict the student admission towards the higher education is a challenging task for any educational organization. There is a high visibility of crisis towards admission in the higher education. The admission rate of the student is the major risk to the educational society in the world. The student admission greatly affects the economic, social, academic, profit and cultural growth of the nation. The student admission rate also depends on the admission procedures and policies of the educational institutions. The chance of student admission also depends on the feedback given by all the stake holders of the educational sectors. The forecasting of the student admission is a major task for any educational institution to protect the profit and wealth of the organization. This paper attempts to analyze the performance of the student admission prediction by using machine learning dimensionality reduction algorithms. The Admission Predict dataset from Kaggle machine learning Repository is used for prediction analysis and the features are reduced by feature reduction methods. The prediction of the chance of Admit is achieved in four ways. Firstly, the correlation between each of the dataset attributes are found and depicted as a histogram. Secondly, the top most high correlated features are identified which are directly contributing to the prediction of chance of admit. Thirdly, the Admission Predict dataset is subjected to dimensionality reduction methods like principal component analysis (PCA), Sparse PCA, Incremental PCA , Kernel PCA and Mini Batch Sparse PCA. Fourth, the optimized dimensionality reduced dataset is then executed to analyze and compare the mean squared error, Mean Absolute Error and R2 Score of each method. The implementation is done by python in Anaconda Spyder Navigator Integrated Development Environment. Experimental Result shows that the CGPA, GRE Score and TOEFL Score are highly correlated features in predicting the chance of admit. The execution of performance analysis shows that Incremental PCA have achieved the effective prediction of chance of admit with minimum MSE of 0.09, MAE of 0.24 and reasonable R2 Score of 0.26.


2019 ◽  
Author(s):  
Cody N. Heiser ◽  
Ken S. Lau

SummaryHigh-dimensional data, such as those generated using single-cell RNA sequencing, present challenges in interpretation and visualization. Numerical and computational methods for dimensionality reduction allow for low-dimensional representation of genome-scale expression data for downstream clustering, trajectory reconstruction, and biological interpretation. However, a comprehensive and quantitative evaluation of the performance of these techniques has not been established. We present an unbiased framework that defines metrics of global and local structure preservation in dimensionality reduction transformations. Using discrete and continuous scRNA-seq datasets, we find that input cell distribution and method parameters are largely determinant of global, local, and organizational data structure preservation by eleven published dimensionality reduction methods. Code available atgithub.com/KenLauLab/DR-structure-preservationallows for rapid evaluation of further datasets and methods.


Sign in / Sign up

Export Citation Format

Share Document