Reducing correlation of random forest–based learning‐to‐rank algorithms using subsample size

Purpose Incorporating users’ behavior patterns could help in the ranking process. Different click models (CMs) are introduced to model the sophisticated search-time behavior of users among which commonly used the triple of attractiveness, examination and satisfaction. Inspired by this fact and considering the psychological definitions of these concepts, this paper aims to propose a novel learning to rank by redefining these concepts. The attractiveness and examination factors could be calculated using a limited subset of information retrieval (IR) features by the random forest algorithm, and then they are combined with each other to predicate the satisfaction factor which is considered as the relevance level. Design/methodology/approach The attractiveness and examination factors of a given document are usually considered as its perceived relevance and the fast scan of its snippet, respectively. Here, attractiveness and examination factors are regarded as the click-count and the investigation rate, respectively. Also, the satisfaction of a document is supposed to be the same as its relevance level for a given query. This idea is supported by the strong correlation between attractiveness-satisfaction and the examination-satisfaction. Applying random forest algorithm, the attractiveness and examination factors are calculated using a very limited set of the primitive features of query-document pairs. Then, by using the ordered weighted averaging operator, these factors are aggregated to estimate the satisfaction. Findings Experimental results on MSLR-WEB10K and WCL2R data sets show the superiority of this algorithm over the state-of-the-art ranking algorithms in terms of P@n and NDCG criteria. The enhancement is more noticeable in top-ranked items which are reviewed more by the users. Originality/value This paper proposes a novel learning to rank based on the redefinition of major building blocks of the CMs which are the attractiveness, examination and satisfactory. It proposes a method to use a very limited number of selected IR features to estimate the attractiveness and examination factors and then combines these factors to predicate the satisfactory which is regarded as the relevance level of a document with respect to a given query.

Download Full-text

An empirical comparison of random forest-based and other learning-to-rank algorithms

Pattern Analysis and Applications ◽

10.1007/s10044-019-00856-6 ◽

2019 ◽

Vol 23 (3) ◽

pp. 1133-1155

Author(s):

Muhammad Ibrahim

Keyword(s):

Random Forest ◽

Learning To Rank ◽

Empirical Comparison

Download Full-text

Scalability and Performance of Random Forest based Learning-to-Rank for Information Retrieval

ACM SIGIR Forum ◽

10.1145/3130332.3130346 ◽

2017 ◽

Vol 51 (1) ◽

pp. 73-74 ◽

Cited By ~ 2

Author(s):

Muhammad Ibrahim

Keyword(s):

Information Retrieval ◽

Random Forest ◽

Learning To Rank ◽

And Performance

Download Full-text

Comparing Pointwise and Listwise Objective Functions for Random-Forest-Based Learning-to-Rank

ACM Transactions on Information Systems ◽

10.1145/2866571 ◽

2016 ◽

Vol 34 (4) ◽

pp. 1-38 ◽

Cited By ~ 11

Author(s):

Muhammad Ibrahim ◽

Mark Carman

Keyword(s):

Random Forest ◽

Learning To Rank ◽

Objective Functions

Download Full-text

Implementation of data mining as a support of business application strategy

Journal of Applied Information, Communication and Technology ◽

10.33555/ejaict.v5i1.49 ◽

2018 ◽

Vol 5 (1) ◽

pp. 47-55

Author(s):

Florensia Unggul Damayanti

Keyword(s):

Data Mining ◽

Random Forest ◽

Business Strategy ◽

Input Parameter ◽

Data Mining Algorithm ◽

Complex Data ◽

Business Decision ◽

Marketing Department ◽

Business Application ◽

Complex Data Sets

Data mining help industries create intelligent decision on complex problems. Data mining algorithm can be applied to the data in order to forecasting, identity pattern, make rules and recommendations, analyze the sequence in complex data sets and retrieve fresh insights. Yet, increasing of technology and various techniques among data mining availability data give opportunity to industries to explore and gain valuable information from their data and use the information to support business decision making. This paper implement classification data mining in order to retrieve knowledge in customer databases to support marketing department while planning strategy for predict plan premium. The dataset decompose into conceptual analytic to identify characteristic data that can be used as input parameter of data mining model. Business decision and application is characterized by processing step, processing characteristic and processing outcome (Seng, J.L., Chen T.C. 2010). This paper set up experimental of data mining based on J48 and Random Forest classifiers and put a light on performance evaluation between J48 and random forest in the context of dataset in insurance industries. The experiment result are about classification accuracy and efficiency of J48 and Random Forest , also find out the most attribute that can be used to predict plan premium in context of strategic planning to support business strategy.

Download Full-text

Database-Driven Modeling based on Variable Selection using Random Forest and Its Application for Linear Air Fuel Ratio Sensor Output Prediction

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.139.850 ◽

2019 ◽

Vol 139 (8) ◽

pp. 850-857

Author(s):

Hiromu Imaji ◽

Takuya Kinoshita ◽

Toru Yamamoto ◽

Keisuke Ito ◽

Masahiro Yoshida ◽

...

Keyword(s):

Random Forest ◽

Variable Selection ◽

Sensor Output ◽

Fuel Ratio

Download Full-text

Multiple fault diagnosis for hydraulic systems using Nearest-centroid-with-DBA and Random-Forest-based-time-series-classification

2020 39th Chinese Control Conference (CCC) ◽

10.23919/ccc50068.2020.9189401 ◽

2020 ◽

Author(s):

Zhijie Peng ◽

Ke Zhang ◽

Yi Chai

Keyword(s):

Time Series ◽

Fault Diagnosis ◽

Random Forest ◽

Time Series Classification ◽

Hydraulic Systems ◽

Multiple Fault ◽

Multiple Fault Diagnosis

Download Full-text

Research on Prediction Method of Finish Rolling Power Consumption of Multi-Specific Strip Steel Based on Random Forest Optimization Model

2020 39th Chinese Control Conference (CCC) ◽

10.23919/ccc50068.2020.9188937 ◽

2020 ◽

Author(s):

XIAO Xiong ◽

DENG Daoming ◽

XIAO Yuxiong ◽

GUO Qiang ◽

ZHANG Yongjun

Keyword(s):

Random Forest ◽

Power Consumption ◽

Optimization Model ◽

Prediction Method ◽

Strip Steel ◽

Finish Rolling

Download Full-text

Random Forest: A Review

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i1/01113 ◽

2017 ◽

Vol 7 (1) ◽

pp. 251-257 ◽

Cited By ~ 28

Author(s):

Eesha Goel ◽

◽

Er. Abhilasha ◽

Keyword(s):

Random Forest

Download Full-text

Random Forest Refinement of Pairwise Potentials for Protein-ligand Decoy Detection

10.26434/chemrxiv.8047820.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jun Pei ◽

Zheng Zheng ◽

Hyunji Kim ◽

Lin Song ◽

Sarah Walworth ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Probability Function ◽

Pair Potential ◽

Scoring Function ◽

Stable Structure ◽

Scoring Functions ◽

Atom Pair ◽

Data Set ◽

Atom Pairs

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not. <br>

Download Full-text