Aggregated Recommendation through Random Forests

The Scientific World JOURNAL ◽

10.1155/2014/649596 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11 ◽

Cited By ~ 7

Author(s):

Heng-Ru Zhang ◽

Fan Min ◽

Xu He

Keyword(s):

Random Forest ◽

Recommender Systems ◽

Random Forests ◽

Decision Table ◽

Data Conversion ◽

Training Set ◽

Training Stage ◽

Testing Stage ◽

Rating Information ◽

New Item

Aggregated recommendation refers to the process of suggesting one kind of items to a group of users. Compared to user-oriented or item-oriented approaches, it is more general and, therefore, more appropriate for cold-start recommendation. In this paper, we propose a random forest approach to create aggregated recommender systems. The approach is used to predict the rating of a group of users to a kind of items. In the preprocessing stage, we merge user, item, and rating information to construct an aggregated decision table, where rating information serves as the decision attribute. We also model the data conversion process corresponding to the new user, new item, and both new problems. In the training stage, a forest is built for the aggregated training set, where each leaf is assigned a distribution of discrete rating. In the testing stage, we present four predicting approaches to compute evaluation values based on the distribution of each tree. Experiments results on the well-known MovieLens dataset show that the aggregated approach maintains an acceptable level of accuracy.

Download Full-text

Tree-weighting for multi-study ensemble learners

10.1101/698779 ◽

2019 ◽

Author(s):

Maya Ramchandran ◽

Prasad Patil ◽

Giovanni Parmigiani

Keyword(s):

Random Forest ◽

Random Forests ◽

Training Process ◽

Training Set ◽

Prediction Ability ◽

True Outcome ◽

The Individual ◽

Individual Trees ◽

The Relationship ◽

Shed Light

Multi-study learning uses multiple training studies, separately trains classifiers on individual studies, and then forms ensembles with weights rewarding members with better cross-study prediction ability. This article considers novel weighting approaches for constructing tree-based ensemble learners in this setting. Using Random Forests as a single-study learner, we perform a comparison of either weighting each forest to form the ensemble, or extracting the individual trees trained by each Random Forest and weighting them directly. We consider weighting approaches that reward cross-study replicability within the training set. We find that incorporating multiple layers of ensembling in the training process increases the robustness of the resulting predictor. Furthermore, we explore the mechanisms by which the ensembling weights correspond to the internal structure of trees to shed light on the important features in determining the relationship between the Random Forests algorithm and the true outcome model. Finally, we apply our approach to genomic datasets and show that our method improves upon the basic multi-study learning paradigm.

Download Full-text

Species-specific audio detection: A comparison of three template-based classification algorithms using random forests

10.7287/peerj.preprints.2713 ◽

2017 ◽

Author(s):

Carlos J Corrada Bravo ◽

Rafael Álvarez Berríos ◽

T. Mitchell Aide

Keyword(s):

Random Forest ◽

Random Forests ◽

Random Forest Classifier ◽

Classification Algorithms ◽

Statistical Features ◽

Web Based ◽

Average Accuracy ◽

Species Specific ◽

Web Based System

We developed a web-based cloud-hosted system that allow users to archive, listen, visualize, and annotate recordings. The system also provides tools to convert these annotations into datasets that can be used to train a computer to detect the presence or absence of a species. The algorithm used by the system was selected after comparing the accuracy and efficiency of three variants of a template-based classification. The algorithm computes a similarity vector by comparing a template of a species call with time increments across the spectrogram. Statistical features are extracted from this vector and used as input for a Random Forest classifier that predicts presence or absence of the species in the recording. The fastest algorithm variant had the highest average accuracy and specificity; therefore, it was implemented in the ARBIMON web-based system.

Download Full-text

Probabilistic Forecasts of Mesoscale Convective System Initiation Using the Random Forest Data Mining Technique

Weather and Forecasting ◽

10.1175/waf-d-15-0113.1 ◽

2016 ◽

Vol 31 (2) ◽

pp. 581-599 ◽

Cited By ~ 23

Author(s):

David Ahijevych ◽

James O. Pinto ◽

John K. Williams ◽

Matthias Steiner

Keyword(s):

Data Mining ◽

Random Forest ◽

Weather Prediction ◽

False Alarms ◽

Convective System ◽

Training Set ◽

Data Mining Technique ◽

High Detection Rate ◽

Probabilistic Forecasts ◽

Mesoscale Convective

Abstract A data mining and statistical learning method known as a random forest (RF) is employed to generate 2-h forecasts of the likelihood for initiation of mesoscale convective systems (MCS-I). The RF technique uses an ensemble of decision trees to relate a set of predictors [in this case radar reflectivity, satellite imagery, and numerical weather prediction (NWP) model diagnostics] to a predictand (in this case MCS-I). The RF showed a remarkable ability to detect MCS-I events. Over 99% of the 550 observed MCS-I events were detected to within 50 km. However, this high detection rate came with a tendency to issue false alarms either because of premature warning of an MCS-I event or in the continued elevation of RF forecast likelihoods well after an MCS-I event occurred. The skill of the RF forecasts was found to increase with the number of trees and the fraction of positive events used in the training set. The skill of the RF was also highly dependent on the types of predictor fields included in the training set and was notably better when a more recent training period was used. The RF offers advantages over high-resolution NWP because it can be run in a fraction of the time and can account for nonlinearly varying biases in the model data. In addition, as part of the training process, the RF ranks the importance of each predictor, which can be used to assess the utility of new datasets in the prediction of MCS-I.

Download Full-text

For Honor, for Toxicity

Proceedings of the ACM on Human-Computer Interaction ◽

10.1145/3474680 ◽

2021 ◽

Vol 5 (CHI PLAY) ◽

pp. 1-29

Author(s):

Alessandro Canossa ◽

Dmitry Salimov ◽

Ahmad Azadvar ◽

Casper Harteveld ◽

Georgios Yannakakis

Keyword(s):

Machine Learning ◽

Random Forest ◽

Random Forests ◽

Initial Study ◽

Unfair Advantage ◽

Offensive Behavior ◽

Forest Models ◽

Random Forest Models ◽

Action Type ◽

Degree Of Severity

Is it possible to detect toxicity in games just by observing in-game behavior? If so, what are the behavioral factors that will help machine learning to discover the unknown relationship between gameplay and toxic behavior? In this initial study, we examine whether it is possible to predict toxicity in the MOBA gameFor Honor by observing in-game behavior for players that have been labeled as toxic (i.e. players that have been sanctioned by Ubisoft community managers). We test our hypothesis of detecting toxicity through gameplay with a dataset of almost 1,800 sanctioned players, and comparing these sanctioned players with unsanctioned players. Sanctioned players are defined by their toxic action type (offensive behavior vs. unfair advantage) and degree of severity (warned vs. banned). Our findings, based on supervised learning with random forests, suggest that it is not only possible to behaviorally distinguish sanctioned from unsanctioned players based on selected features of gameplay; it is also possible to predict both the sanction severity (warned vs. banned) and the sanction type (offensive behavior vs. unfair advantage). In particular, all random forest models predict toxicity, its severity, and type, with an accuracy of at least 82%, on average, on unseen players. This research shows that observing in-game behavior can support the work of community managers in moderating and possibly containing the burden of toxic behavior.

Download Full-text

Exploratory analysis on prediction of loan privilege for customers using random forest

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.21.12399 ◽

2018 ◽

Vol 7 (2.21) ◽

pp. 339 ◽

Cited By ~ 1

Author(s):

K Ulaga Priya ◽

S Pushpa ◽

K Kalaivani ◽

A Sartiha

Keyword(s):

Machine Learning ◽

Random Forest ◽

Data Model ◽

Model Evaluation ◽

Banking Industry ◽

Performance Parameters ◽

Training Set ◽

Test Set ◽

Learning Technique ◽

Analytical Processing

In Banking Industry loan Processing is a tedious task in identifying the default customers. Manual prediction of default customers might turn into a bad loan in future. Banks possess huge volume of behavioral data from which they are unable to make a judgement about prediction of loan defaulters. Modern techniques like Machine Learning will help to do analytical processing using Supervised Learning and Unsupervised Learning Technique. A data model for predicting default customers using Random forest Technique has been proposed. Data model Evaluation is done on training set and based on the performance parameters final prediction is done on the Test set. This is an evident that Random Forest technique will help the bank to predict the loan Defaulters with utmost accuracy.

Download Full-text

Linked Open Data for New Item Problem Solving in Collaborative Recommender Systems

2018 International Conference on Smart Communications in Network Technologies (SaCoNeT) ◽

10.1109/saconet.2018.8585443 ◽

2018 ◽

Author(s):

Hanane ZITOUNI ◽

Souham MESHOUL ◽

Anfal KADI

Keyword(s):

Problem Solving ◽

Recommender Systems ◽

Open Data ◽

Linked Open Data ◽

New Item

Download Full-text

Addressing the New Item problem in video recommender systems by incorporation of visual features with restricted Boltzmann machines

Expert Systems ◽

10.1111/exsy.12645 ◽

2020 ◽

Author(s):

Naieme Hazrati ◽

Mehdi Elahi

Keyword(s):

Recommender Systems ◽

Visual Features ◽

Restricted Boltzmann Machines ◽

Boltzmann Machines ◽

New Item

Download Full-text

Semantic Segmentation of a Printed Circuit Board for Component Recognition Based on Depth Images

Sensors ◽

10.3390/s20185318 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5318

Author(s):

Dongnian Li ◽

Changming Li ◽

Chengjun Chen ◽

Zhengxu Zhao

Keyword(s):

Random Forest ◽

Printed Circuit Board ◽

Semantic Segmentation ◽

Circuit Board ◽

Depth Image ◽

Training Set ◽

Pixel Classification ◽

Printed Circuit ◽

Depth Images ◽

Illumination Changes

Locating and identifying the components mounted on a printed circuit board (PCB) based on machine vision is an important and challenging problem for automated PCB inspection and automated PCB recycling. In this paper, we propose a PCB semantic segmentation method based on depth images that segments and recognizes components in the PCB through pixel classification. The image training set for the PCB was automatically synthesized with graphic rendering. Based on a series of concentric circles centered at the given depth pixel, we extracted the depth difference features from the depth images in the training set to train a random forest pixel classifier. By using the constructed random forest pixel classifier, we performed semantic segmentation for the PCB to segment and recognize components in the PCB through pixel classification. Experiments on both synthetic and real test sets were conducted to verify the effectiveness of the proposed method. The experimental results demonstrate that our method can segment and recognize most of the components from a real depth image of the PCB. Our method is immune to illumination changes and can be implemented in parallel on a GPU.

Download Full-text

Components of Random Forests

Combinatorics Probability Computing ◽

10.1017/s0963548300000067 ◽

1992 ◽

Vol 1 (1) ◽

pp. 35-52 ◽

Cited By ~ 7

Author(s):

Tomasz Łuczak ◽

Boris Pittel

Keyword(s):

Phase Transition ◽

Random Forest ◽

Asymptotic Behaviour ◽

Random Graph ◽

Random Forests ◽

Limit Distribution ◽

The Family ◽

Supercritical Phase

A forest ℱ(n, M) chosen uniformly from the family of all labelled unrooted forests with n vertices and M edges is studied. We show that, like the Érdős-Rényi random graph G(n, M), the random forest exhibits three modes of asymptotic behaviour: subcritical, nearcritical and supercritical, with the phase transition at the point M = n/2. For each of the phases, we determine the limit distribution of the size of the k-th largest component of ℱ(n, M). The similarity to the random graph is far from being complete. For instance, in the supercritical phase, the giant tree in ℱ(n, M) grows roughly two times slower than the largest component of G(n, M) and the second largest tree in ℱ(n, M) is of the order n⅔ for every M = n/2 +s, provided that s3n−2 → ∞ and s = o(n), while its counterpart in G(n, M) is of the order n2s−2 log(s3n−2) ≪ n⅔.

Download Full-text

Video spam comment features selection using machine learning techniques

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v15.i2.pp1046-1053 ◽

2019 ◽

Vol 15 (2) ◽

pp. 1046

Author(s):

Nabilah Alias ◽

Cik Feresa Mohd Foozy ◽

Sofia Najwa Ramli ◽

Naqliyah Zainuddin

Keyword(s):

Machine Learning ◽

Social Media ◽

Random Forest ◽

Feature Detection ◽

Random Tree ◽

Machine Learning Techniques ◽

Decision Table ◽

Features Selection ◽

Video Sharing ◽

Learning Techniques

<p>Nowadays, social media (e.g., YouTube and Facebook) provides connection and interaction between people by posting comments or videos. In fact, comments are a part of contents in a website that can attract spammer to spreading phishing, malware or advertising. Due to existing malicious users that can spread malware or phishing in the comments, this work proposes a technique used for video sharing spam comments feature detection. The first phase of the methodology used in this work is dataset collection. For this experiment, a dataset from UCI Machine Learning repository is used. In the next phase, the development of framework and experimentation. The dataset will be pre-processed using tokenization and lemmatization process. After that, the features to detect spam is selected and the experiments for classification were performed by using six classifiers which are Random Tree, Random Forest, Naïve Bayes, KStar, Decision Table, and Decision Stump. The result shows the highest accuracy is 90.57% and the lowest was 58.86%.</p>

Download Full-text