Birds of a Feather Surf Together: Using Clustering Methods to Improve Navigation Prediction from Internet Log Files

Web Page Recommendation using Random Forest with Fire Fly Algorithm in Web Mining

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b4442.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 499-505

Keyword(s):

Web Mining ◽

Heuristic Algorithms ◽

Research Work ◽

Web Pages ◽

Clustering Methods ◽

Web Page ◽

Continuous Growth ◽

Log Files ◽

Speed Up ◽

The Web

Nowadays, internet has become the easiest way to obtain more information from the web and millions of users search internet to find out the information. The continuous growth of web pages and users interest to search more information about various topics increases the complexity of recommendation. The user's behavior is extracted by using the web mining techniques, which are used in web server log. The main aim of this research study is to identify the navigation pattern of users from the log files. There are three major steps in the web mining process namely pre-processing the data, classification of pattern and users discovery. In recent periods, the web page articles are classified by the researchers before recommending the requested page to users. However, every category size is too large or manual labors are often needed for classification tasks. A high time complexity issues are faced by some existing clustering methods or according to the initial parameters, these techniques provides the iterative computing that leads to insufficient results. To address the above issues, a recommendation for web page is developed by initializing the margin parameters of classification techniques which considers both effectiveness and efficiency. This research work initializes the Random Forest's (RF) margin parameters by using the FireFly Algorithm (FFA) for reducing the processing time to speed up the process. A large volume of user's interest data is processed by these margin parameters, which provides a better recommendation than existing techniques. The experimental results show that RF-FFA method achieved 41.89% accuracy and recall values, when compared with other heuristic algorithms.

Download Full-text

Clustering Algorithms in Hybrid Recommender System on MovieLens Data

Studies in Logic, Grammar and Rhetoric ◽

10.2478/slgr-2014-0021 ◽

2014 ◽

Vol 37 (1) ◽

pp. 125-139 ◽

Cited By ~ 17

Author(s):

Urszula Kuzelewska

Keyword(s):

Recommender System ◽

Web Sites ◽

Clustering Algorithms ◽

Leisure Activities ◽

Similarity Measures ◽

Clustering Methods ◽

Log Files ◽

E Learning ◽

On Line ◽

Hybrid Recommender

AbstractDecisions are taken by humans very often during professional as well as leisure activities. It is particularly evident during surfing the Internet: selecting web sites to explore, choosing needed information in search engine results or deciding which product to buy in an on-line store. Recommender systems are electronic applications, the aim of which is to support humans in this decision making process. They are widely used in many applications: adaptive WWW servers, e-learning, music and video preferences, internet stores etc. In on-line solutions, such as e-shops or libraries, the aim of recommendations is to show customers the products which they are probably interested in. As input data the following are taken: shopping basket archives, ratings of the products or servers log files.The article presents a solution of recommender system which helps users to select an interesting product. The system analyses data from other customers' ratings of the products. It uses clustering methods to find similarities among the users and proposed techniques to identify users' profiles. The system was implemented in Apache Mahout environment and tested on a movie database. Selected similarity measures are based on: Euclidean distance, cosine as well as correlation coefficient and loglikehood function.

Download Full-text

Clustering Methods for Italian Residential Real Estate Market

10.15396/eres2005_287 ◽

2005 ◽

Keyword(s):

Real Estate ◽

Real Estate Market ◽

Clustering Methods ◽

Residential Real Estate

Download Full-text

A Review on Analyzing Various Log Files Using Hadoop

Journal of Environmental Science Computer Science and Engineering & Technology ◽

10.24214/jecet.b.7.2.13337 ◽

2018 ◽

Vol 7 (2) ◽

Keyword(s):

Log Files

Download Full-text

Survey of Clustering Methods for Large Scale Dataset

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i5.13381344 ◽

2019 ◽

Vol 7 (5) ◽

pp. 1338-1344

Author(s):

Anupama Jawale ◽

Ganesh Magar

Keyword(s):

Large Scale ◽

Clustering Methods ◽

Large Scale Dataset

Download Full-text

Evaluating CSCL log files by social network analysis

10.3115/1150240.1150294 ◽

1999 ◽

Cited By ~ 25

Author(s):

Kari Nurmela ◽

Erno Lehtinen ◽

Tuire Palonen

Keyword(s):

Social Network ◽

Social Network Analysis ◽

Network Analysis ◽

Log Files

Download Full-text

Hierarchical and Non-Hierarchical Linear and Non-Linear Clustering Methods to Shakespeare Authorship Questionn

SSRN Electronic Journal ◽

10.2139/ssrn.2989022 ◽

2015 ◽

Author(s):

Refat Aljumily

Keyword(s):

Clustering Methods ◽

Non Linear

Download Full-text

Molecular Topology and Other Promiscuity Determinants as Predictors of Therapeutic Class - A Theoretical Framework to Guide Drug Repositioning?

Current Topics in Medicinal Chemistry ◽

10.2174/1568026618666180801091642 ◽

2018 ◽

Vol 18 (13) ◽

pp. 1110-1122 ◽

Cited By ~ 2

Author(s):

Juan F. Morales ◽

Lucas N. Alberca ◽

Sara Chuguransky ◽

Mauricio E. Di Ianni ◽

Alan Talevi ◽

...

Keyword(s):

Molecular Descriptors ◽

Drug Repositioning ◽

Drug Repurposing ◽

Topological Descriptors ◽

Log P ◽

Acidity Constant ◽

Molecular Topology ◽

Clustering Methods ◽

Mean Values ◽

Qsar Models

Much interest has been paid in the last decade on molecular predictors of promiscuity, including molecular weight, log P, molecular complexity, acidity constant and molecular topology, with correlations between promiscuity and those descriptors seemingly being context-dependent. It has been observed that certain therapeutic categories (e.g. mood disorders therapies) display a tendency to include multi-target agents (i.e. selective non-selectivity). Numerous QSAR models based on topological descriptors suggest that the topology of a given drug could be used to infer its therapeutic applications. Here, we have used descriptive statistics to explore the distribution of molecular topology descriptors and other promiscuity predictors across different therapeutic categories. Working with the publicly available ChEMBL database and 14 molecular descriptors, both hierarchical and non-hierchical clustering methods were applied to the descriptors mean values of the therapeutic categories after the refinement of the database (770 drugs grouped into 34 therapeutic categories). On the other hand, another publicly available database (repoDB) was used to retrieve cases of clinically-approved drug repositioning examples that could be classified into the therapeutic categories considered by the aforementioned clusters (111 cases), and the correspondence between the two studies was evaluated. Interestingly, a 3- cluster hierarchical clustering scheme based on only 14 molecular descriptors linked to promiscuity seem to explain up to 82.9% of approved cases of drug repurposing retrieved of repoDB. Therapeutic categories seem to display distinctive molecular patterns, which could be used as a basis for drug screening and drug design campaigns, and to unveil drug repurposing opportunities between particular therapeutic categories.

Download Full-text

k-Means, Ward and Probabilistic Distance-Based Clustering Methods with Contiguity Constraint

Journal of Classification ◽

10.1007/s00357-020-09370-5 ◽

2020 ◽

Author(s):

Andrzej Młodak

Keyword(s):

Clustering Methods ◽

Probabilistic Distance

Download Full-text

Unsupervised Identification of Targeted Spectra Applying Rank1-NMF and FCC Algorithms in Long-Wave Hyperspectral Infrared Imagery

Remote Sensing ◽

10.3390/rs13112125 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2125

Author(s):

Bardia Yousefi ◽

Clemente Ibarra-Castanedo ◽

Martin Chamberland ◽

Xavier P. V. Maldague ◽

Georges Beaudoin

Keyword(s):

Matched Filter ◽

Principal Component ◽

Hyperspectral Data ◽

Clustering Methods ◽

Spectral Angle Mapper ◽

Long Wave ◽

Clustering Approach ◽

Spectral Comparison ◽

Computational Simplicity ◽

Mineral Identification

Clustering methods unequivocally show considerable influence on many recent algorithms and play an important role in hyperspectral data analysis. Here, we challenge the clustering for mineral identification using two different strategies in hyperspectral long wave infrared (LWIR, 7.7–11.8 μm). For that, we compare two algorithms to perform the mineral identification in a unique dataset. The first algorithm uses spectral comparison techniques for all the pixel-spectra and creates RGB false color composites (FCC). Then, a color based clustering is used to group the regions (called FCC-clustering). The second algorithm clusters all the pixel-spectra to directly group the spectra. Then, the first rank of non-negative matrix factorization (NMF) extracts the representative of each cluster and compares results with the spectral library of JPL/NASA. These techniques give the comparison values as features which convert into RGB-FCC as the results (called clustering rank1-NMF). We applied K-means as clustering approach, which can be modified in any other similar clustering approach. The results of the clustering-rank1-NMF algorithm indicate significant computational efficiency (more than 20 times faster than the previous approach) and promising performance for mineral identification having up to 75.8% and 84.8% average accuracies for FCC-clustering and clustering-rank1 NMF algorithms (using spectral angle mapper (SAM)), respectively. Furthermore, several spectral comparison techniques are used also such as adaptive matched subspace detector (AMSD), orthogonal subspace projection (OSP) algorithm, principal component analysis (PCA), local matched filter (PLMF), SAM, and normalized cross correlation (NCC) for both algorithms and most of them show a similar range in accuracy. However, SAM and NCC are preferred due to their computational simplicity. Our algorithms strive to identify eleven different mineral grains (biotite, diopside, epidote, goethite, kyanite, scheelite, smithsonite, tourmaline, pyrope, olivine, and quartz).

Download Full-text