Birds of a Feather Surf Together: Using Clustering Methods to Improve Navigation Prediction from Internet Log Files

Author(s):  
Martin Halvey ◽  
Mark T. Keane ◽  
Barry Smyth
Keyword(s):  

Nowadays, internet has become the easiest way to obtain more information from the web and millions of users search internet to find out the information. The continuous growth of web pages and users interest to search more information about various topics increases the complexity of recommendation. The user's behavior is extracted by using the web mining techniques, which are used in web server log. The main aim of this research study is to identify the navigation pattern of users from the log files. There are three major steps in the web mining process namely pre-processing the data, classification of pattern and users discovery. In recent periods, the web page articles are classified by the researchers before recommending the requested page to users. However, every category size is too large or manual labors are often needed for classification tasks. A high time complexity issues are faced by some existing clustering methods or according to the initial parameters, these techniques provides the iterative computing that leads to insufficient results. To address the above issues, a recommendation for web page is developed by initializing the margin parameters of classification techniques which considers both effectiveness and efficiency. This research work initializes the Random Forest's (RF) margin parameters by using the FireFly Algorithm (FFA) for reducing the processing time to speed up the process. A large volume of user's interest data is processed by these margin parameters, which provides a better recommendation than existing techniques. The experimental results show that RF-FFA method achieved 41.89% accuracy and recall values, when compared with other heuristic algorithms.


2014 ◽  
Vol 37 (1) ◽  
pp. 125-139 ◽  
Author(s):  
Urszula Kuzelewska

AbstractDecisions are taken by humans very often during professional as well as leisure activities. It is particularly evident during surfing the Internet: selecting web sites to explore, choosing needed information in search engine results or deciding which product to buy in an on-line store. Recommender systems are electronic applications, the aim of which is to support humans in this decision making process. They are widely used in many applications: adaptive WWW servers, e-learning, music and video preferences, internet stores etc. In on-line solutions, such as e-shops or libraries, the aim of recommendations is to show customers the products which they are probably interested in. As input data the following are taken: shopping basket archives, ratings of the products or servers log files.The article presents a solution of recommender system which helps users to select an interesting product. The system analyses data from other customers' ratings of the products. It uses clustering methods to find similarities among the users and proposed techniques to identify users' profiles. The system was implemented in Apache Mahout environment and tested on a movie database. Selected similarity measures are based on: Euclidean distance, cosine as well as correlation coefficient and loglikehood function.


2018 ◽  
Vol 18 (13) ◽  
pp. 1110-1122 ◽  
Author(s):  
Juan F. Morales ◽  
Lucas N. Alberca ◽  
Sara Chuguransky ◽  
Mauricio E. Di Ianni ◽  
Alan Talevi ◽  
...  

Much interest has been paid in the last decade on molecular predictors of promiscuity, including molecular weight, log P, molecular complexity, acidity constant and molecular topology, with correlations between promiscuity and those descriptors seemingly being context-dependent. It has been observed that certain therapeutic categories (e.g. mood disorders therapies) display a tendency to include multi-target agents (i.e. selective non-selectivity). Numerous QSAR models based on topological descriptors suggest that the topology of a given drug could be used to infer its therapeutic applications. Here, we have used descriptive statistics to explore the distribution of molecular topology descriptors and other promiscuity predictors across different therapeutic categories. Working with the publicly available ChEMBL database and 14 molecular descriptors, both hierarchical and non-hierchical clustering methods were applied to the descriptors mean values of the therapeutic categories after the refinement of the database (770 drugs grouped into 34 therapeutic categories). On the other hand, another publicly available database (repoDB) was used to retrieve cases of clinically-approved drug repositioning examples that could be classified into the therapeutic categories considered by the aforementioned clusters (111 cases), and the correspondence between the two studies was evaluated. Interestingly, a 3- cluster hierarchical clustering scheme based on only 14 molecular descriptors linked to promiscuity seem to explain up to 82.9% of approved cases of drug repurposing retrieved of repoDB. Therapeutic categories seem to display distinctive molecular patterns, which could be used as a basis for drug screening and drug design campaigns, and to unveil drug repurposing opportunities between particular therapeutic categories.


2021 ◽  
Vol 13 (11) ◽  
pp. 2125
Author(s):  
Bardia Yousefi ◽  
Clemente Ibarra-Castanedo ◽  
Martin Chamberland ◽  
Xavier P. V. Maldague ◽  
Georges Beaudoin

Clustering methods unequivocally show considerable influence on many recent algorithms and play an important role in hyperspectral data analysis. Here, we challenge the clustering for mineral identification using two different strategies in hyperspectral long wave infrared (LWIR, 7.7–11.8 μm). For that, we compare two algorithms to perform the mineral identification in a unique dataset. The first algorithm uses spectral comparison techniques for all the pixel-spectra and creates RGB false color composites (FCC). Then, a color based clustering is used to group the regions (called FCC-clustering). The second algorithm clusters all the pixel-spectra to directly group the spectra. Then, the first rank of non-negative matrix factorization (NMF) extracts the representative of each cluster and compares results with the spectral library of JPL/NASA. These techniques give the comparison values as features which convert into RGB-FCC as the results (called clustering rank1-NMF). We applied K-means as clustering approach, which can be modified in any other similar clustering approach. The results of the clustering-rank1-NMF algorithm indicate significant computational efficiency (more than 20 times faster than the previous approach) and promising performance for mineral identification having up to 75.8% and 84.8% average accuracies for FCC-clustering and clustering-rank1 NMF algorithms (using spectral angle mapper (SAM)), respectively. Furthermore, several spectral comparison techniques are used also such as adaptive matched subspace detector (AMSD), orthogonal subspace projection (OSP) algorithm, principal component analysis (PCA), local matched filter (PLMF), SAM, and normalized cross correlation (NCC) for both algorithms and most of them show a similar range in accuracy. However, SAM and NCC are preferred due to their computational simplicity. Our algorithms strive to identify eleven different mineral grains (biotite, diopside, epidote, goethite, kyanite, scheelite, smithsonite, tourmaline, pyrope, olivine, and quartz).


Sign in / Sign up

Export Citation Format

Share Document