Effective Filtering of Query Results on Updated User Behavioral Profiles in Web Mining

The Scientific World JOURNAL ◽

10.1155/2015/829126 ◽

2015 ◽

Vol 2015 ◽

pp. 1-8

Author(s):

S. Sadesh ◽

R. C. Suganthe

Keyword(s):

Data Mining ◽

Information Search ◽

Regime Switching ◽

Web Mining ◽

User Behavior ◽

Research Work ◽

User Profile ◽

Query Result ◽

User Query ◽

Filtering Efficiency

Web with tremendous volume of information retrieves result for user related queries. With the rapid growth of web page recommendation, results retrieved based on data mining techniques did not offer higher performance filtering rate because relationships between user profile and queries were not analyzed in an extensive manner. At the same time, existing user profile based prediction in web data mining is not exhaustive in producing personalized result rate. To improve the query result rate on dynamics of user behavior over time, Hamilton Filtered Regime Switching User Query Probability (HFRS-UQP) framework is proposed. HFRS-UQP framework is split into two processes, where filtering and switching are carried out. The data mining based filtering in our research work uses the Hamilton Filtering framework to filter user result based on personalized information on automatic updated profiles through search engine. Maximized result is fetched, that is, filtered out with respect to user behavior profiles. The switching performs accurate filtering updated profiles using regime switching. The updating in profile change (i.e., switches) regime in HFRS-UQP framework identifies the second- and higher-order association of query result on the updated profiles. Experiment is conducted on factors such as personalized information search retrieval rate, filtering efficiency, and precision ratio.

Download Full-text

Web Mining in Thematic Search Engines

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch318 ◽

2011 ◽

pp. 2080-2084

Author(s):

Massimiliano Caramia ◽

Giovanni Felici

Keyword(s):

Data Mining ◽

Genetic Algorithm ◽

Search Engines ◽

Web Mining ◽

Research Work ◽

Relevant Information ◽

Additional Information ◽

Internet Users ◽

User Query ◽

Search Tool

In the present chapter we report on some extensions on the work presented in the first edition of the Encyclopedia of Data Mining. In Caramia and Felici (2005) we have described a method based on clustering and a heuristic search method- based on a genetic algorithm - to extract pages with relevant information for a specific user query in a thematic search engine. Starting from these results we have extended the research work trying to match some issues related to the semantic aspects of the search, focusing on the keywords that are used to establish the similarity among the pages that result from the query. Complete details on this method, here omitted for brevity, can be found in Caramia and Felici (2006). Search engines technologies remain a strong research topic, as new problems and new demands from the market and the users arise. The process of switching from quantity (maintaining and indexing large databases of web pages and quickly select pages matching some criterion) to quality (identifying pages with a high quality for the user), already highlighted in Caramia and Felici (2005), has not been interrupted, but has gained further energy, being motivated by the natural evolution of the internet users, more selective in their choice of the search tool and willing to pay the price of providing extra feedback to the system and wait more time to have their queries better matched. In this framework, several have considered the use of data mining and optimization techniques, that are often referred to as web mining (for a recent bibliography on this topic see, e.g., Getoor, Senator, Domingos, and Faloutsos, 2003 and Zaïane, Srivastava, Spiliopoulou, and Masand, 2002). The work described in this chapter is bases on clustering techniques to identify, in the set of pages resulting from a simple query, subsets that are homogeneous with respect to a vectorization based on context or profile; then, a number of small and potentially good subsets of pages is constructed, extracting from each cluster the pages with higher scores. Operating on these subsets with a genetic algorithm, a subset with a good overall score and a high internal dissimilarity is identified. A related problem is then considered: the selection of a subset of pages that are compliant with the search keywords, but that also are characterized by the fact that they share a large subset of words different from the search keywords. This characteristic represents a sort of semantic connection of these pages that may be of use to spot some particular aspects of the information present in the pages. Such a task is accomplished by the construction of a special graph, whose maximumweight clique and k-densest subgraph should represent the page subsets with the desired properties. In the following we summarize the main background topics and provide a synthetic description of the methods. Interested readers may find additional information in Caramia and Felici (2004), Caramia and Felici (2005), and Caramia and Felici (2006).

Download Full-text

Semantic Representation of a Geo-Social User Profile for a Personalised Information Retrieval

Journal of Information & Knowledge Management ◽

10.1142/s0219649221500441 ◽

2021 ◽

pp. 2150044

Author(s):

Tahar Rafa ◽

Samir Kechid

Keyword(s):

Information Retrieval ◽

Information Search ◽

Semantic Representation ◽

User Profile ◽

Search Process ◽

Search System ◽

User Interactions ◽

User Interests ◽

Situational Contexts ◽

User Query

The user-centred information retrieval needs to introduce semantics into the user modelling for a meaningful representation of user interests. The semantic representation of the user interests helps to improve the identification of the user’s future cognitive needs. In this paper, we present a semantic-based approach for a personalised information retrieval. This approach is based on the design and the exploitation of a user profile to represent the user and his interests. In this user profile, we combine an ontological semantics issued from WordNet ontology, and a personal semantics issued from the different user interactions with the search system and with his social and situational contexts of his previous searches. The personal semantics considers the co-occurrence relations between relevant components of the user profile as semantic links. The user profile is used to improve two important phases of the information search process: (i) expansion of the initial user query and (ii) adaptation of the search results to the user interests.

Download Full-text

Improving Webpage Access Predictions Based on Sequence Prediction and PageRank Algorithm

Interdisciplinary Journal of Information Knowledge and Management ◽

10.28945/4176 ◽

2019 ◽

Vol 14 ◽

pp. 027-044 ◽

Cited By ~ 1

Author(s):

Da Thon Nguyen ◽

Hanh T Tan ◽

Duy Hoang Pham

Keyword(s):

Web Mining ◽

User Behavior ◽

User Profile ◽

Experimental Results ◽

Prediction Algorithm ◽

Future Research ◽

Pagerank Algorithm ◽

Product Recommendation ◽

Redundant Data ◽

The Web

Aim/Purpose: In this article, we provide a better solution to Webpage access prediction. In particularly, our core proposed approach is to increase accuracy and efficiency by reducing the sequence space with integration of PageRank into CPT+. Background: The problem of predicting the next page on a web site has become significant because of the non-stop growth of Internet in terms of the volume of contents and the mass of users. The webpage prediction is complex because we should consider multiple kinds of information such as the webpage name, the contents of the webpage, the user profile, the time between webpage visits, differences among users, and the time spent on a page or on each part of the page. Therefore, webpage access prediction draws substantial effort of the web mining research community in order to obtain valuable information and improve user experience as well. Methodology: CPT+ is a complex prediction algorithm that dramatically offers more accurate predictions than other state-of-the-art models. The integration of the importance of every particular page on a website (i.e., the PageRank) regarding to its associations with other pages into CPT+ model can improve the performance of the existing model. Contribution: In this paper, we propose an approach to reduce prediction space while improving accuracy through combining CPT+ and PageRank algorithms. Experimental results on several real datasets indicate the space reduced by up to between 15% and 30%. As a result, the run-time is quicker. Furthermore, the prediction accuracy is improved. It is convenient that researchers go on using CPT+ to predict Webpage access. Findings: Our experimental results indicate that PageRank algorithm is a good solution to improve CPT+ prediction. An amount of though approximately 15 % to 30% of redundant data is removed from datasets while improving the accuracy. Recommendations for Practitioners: The result of the article could be used in developing relevant applications such as Webpage and product recommendation systems. Recommendation for Researchers: The paper provides a prediction model that integrates CPT+ and PageRank algorithms to tackle the problem of complexity and accuracy. The model has been experimented against several real datasets in order to show its performance. Impact on Society: Given an improving model to predict Webpage access using in several fields such as e-learning, product recommendation, link prediction, and user behavior prediction, the society can enjoy a better experience and more efficient environment while surfing the Web. Future Research: We intend to further improve the accuracy of webpage access prediction by using the combination of CPT+ and other algorithms.

Download Full-text

A Comprehensive Guideline for Bengali Sentiment Annotation

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3474363 ◽

2022 ◽

Vol 21 (2) ◽

pp. 1-19

Author(s):

Md. Saddam Hossain Mukta ◽

Md. Adnanul Islam ◽

Faisal Ahamed Khan ◽

Afjal Hossain ◽

Shuvanon Razik ◽

...

Keyword(s):

Data Mining ◽

Information Retrieval ◽

Sentiment Analysis ◽

Computational Linguistics ◽

Language Processing ◽

Web Mining ◽

English Language ◽

Research Work ◽

Bengali Language

Sentiment Analysis (SA) is a Natural Language Processing (NLP) and an Information Extraction (IE) task that primarily aims to obtain the writer’s feelings expressed in positive or negative by analyzing a large number of documents. SA is also widely studied in the fields of data mining, web mining, text mining, and information retrieval. The fundamental task in sentiment analysis is to classify the polarity of a given content as Positive, Negative, or Neutral . Although extensive research has been conducted in this area of computational linguistics, most of the research work has been carried out in the context of English language. However, Bengali sentiment expression has varying degree of sentiment labels, which can be plausibly distinct from English language. Therefore, sentiment assessment of Bengali language is undeniably important to be developed and executed properly. In sentiment analysis, the prediction potential of an automatic modeling is completely dependent on the quality of dataset annotation. Bengali sentiment annotation is a challenging task due to diversified structures (syntax) of the language and its different degrees of innate sentiments (i.e., weakly and strongly positive/negative sentiments). Thus, in this article, we propose a novel and precise guideline for the researchers, linguistic experts, and referees to annotate Bengali sentences immaculately with a view to building effective datasets for automatic sentiment prediction efficiently.

Download Full-text

A Personalized Web Based E-Learning Recommendation System To Enhance and User Learning Experience

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9991.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 1186-1195

Keyword(s):

Data Mining ◽

Genetic Algorithm ◽

Clustering Algorithm ◽

Recommendation System ◽

User Behavior ◽

Search Algorithm ◽

Learning Experience ◽

Research Work ◽

Web Based ◽

E Learning

The key aim of the data mining techniques is to help the user by reducing the effort for exploring the data, recovering the patterns, and implementing applications that help to find the knowledge specific contents, decision making, and predictions. This research work develops a recommendation system by using the merits of data mining algorithms. They are used for designing web-based e-learning recommendation systems. This model aims to understand the user behavior and contents requirements of the learner. This purpose is solved by obtaining the information from the data source and producing the suggestions of suitable content to the learner. The concept of web content mining and web usage mining has been combined together for performing the required work. This technique involves the genetic algorithm and k-means clustering algorithm for designing the presented model. In this work the k-means clustering algorithm has been used to track user behavior and the genetic algorithm has been used as a search algorithm to find the necessary resources in the database. Finally, the presented system is implemented and its performance is measured. The estimated results demonstrate that the presented model enhances the accuracy of recommendations and also speeds up the computations. A related performance calculation has also provided to justify this conclusion. The obtained results demonstrate that this technique is acceptable for new generation application designs

Download Full-text

A Review on Data Mining Techniques Towards Water Sustainability Issues

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190809114839 ◽

2020 ◽

Vol 13 (5) ◽

pp. 818-826

Author(s):

Ranjan Kumar Panda ◽

A. Sai Sabitha ◽

Vikas Deep

Keyword(s):

Data Mining ◽

Natural Resources ◽

Environmental Sustainability ◽

Water Distribution ◽

Research Work ◽

Distribution Networks ◽

Environmental Data ◽

World Population ◽

Water Sustainability ◽

Rapid Urbanization

Sustainability is defined as the practice of protecting natural resources for future use without harming the nature. Sustainable development includes the environmental, social, political, and economic issues faced by human being for existence. Water is the most vital resource for living being on this earth. The natural resources are being exploited with the increase in world population and shortfall of these resources may threaten humanity in the future. Water sustainability is a part of environmental sustainability. The water crisis is increasing gradually in many places of the world due to agricultural and industrial usage and rapid urbanization. Data mining tools and techniques provide a powerful methodology to understand water sustainability issues using rich environmental data and also helps in building models for possible optimization and reengineering. In this research work, a review on usage of supervised or unsupervised learning algorithms in water sustainability issues like water quality assessment, waste water collection system and water consumption is presented. Advanced technologies have also helped to resolve major water sustainability issues. Some major data mining optimization algorithms have been compared which are used in piped water distribution networks.

Download Full-text

User profile correlation-based similarity (UPCSim) algorithm in movie recommendation system

Journal Of Big Data ◽

10.1186/s40537-021-00425-x ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Triyanna Widiyaningtyas ◽

Indriana Hidayah ◽

Teguh B. Adji

Keyword(s):

Collaborative Filtering ◽

Recommendation System ◽

User Behavior ◽

Correlation Coefficients ◽

User Profile ◽

Profile Data ◽

Similarity Algorithm ◽

Previous Algorithm ◽

Movie Recommendation ◽

Recommendation Accuracy

AbstractCollaborative filtering is one of the most widely used recommendation system approaches. One issue in collaborative filtering is how to use a similarity algorithm to increase the accuracy of the recommendation system. Most recently, a similarity algorithm that combines the user rating value and the user behavior value has been proposed. The user behavior value is obtained from the user score probability in assessing the genre data. The problem with the algorithm is it only considers genre data for capturing user behavior value. Therefore, this study proposes a new similarity algorithm – so-called User Profile Correlation-based Similarity (UPCSim) – that examines the genre data and the user profile data, namely age, gender, occupation, and location. All the user profile data are used to find the weights of the similarities of user rating value and user behavior value. The weights of both similarities are obtained by calculating the correlation coefficients between the user profile data and the user rating or behavior values. An experiment shows that the UPCSim algorithm outperforms the previous algorithm on recommendation accuracy, reducing MAE by 1.64% and RMSE by 1.4%.

Download Full-text

Application of Data Pre-Processing Method in Web Mining

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.687-691.1592 ◽

2014 ◽

Vol 687-691 ◽

pp. 1592-1595

Author(s):

Yun Peng Duan ◽

Chun Xi Zhao ◽

Ying Shi

Keyword(s):

Data Mining ◽

Web Mining ◽

Early Stage ◽

Data Preprocessing ◽

Web Technology ◽

Processing Method ◽

Web Log Mining ◽

Web Log ◽

User Access ◽

Log Mining

With the widely application of the WWW and the emergence of Web technology, make the research of data mining has entered a new stage. Web log mining is based on the idea of data mining to analyze the server log processing. Paper aimed at the early stage of the data mining is put forward based on log data preprocessing methods, the purpose is to divide server logs into multiple unique user access sequence at a time, and to give a good algorithm.

Download Full-text

An Experimental Study of Spammer Detection on Chinese Microblogs

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s021819402040029x ◽

2020 ◽

Vol 30 (11n12) ◽

pp. 1759-1777

Author(s):

Jialing Liang ◽

Peiquan Jin ◽

Lin Mu ◽

Jie Zhao

Keyword(s):

Machine Learning ◽

Social Media ◽

User Behavior ◽

Real Data ◽

User Profile ◽

Data Set ◽

Sina Weibo ◽

Factors Affecting ◽

The Government ◽

Hot Event

With the development of Web 2.0, social media such as Twitter and Sina Weibo have become an essential platform for disseminating hot events. Simultaneously, due to the free policy of microblogging services, users can post user-generated content freely on microblogging platforms. Accordingly, more and more hot events on microblogging platforms have been labeled as spammers. Spammers will not only hurt the healthy development of social media but also introduce many economic and social problems. Therefore, the government and enterprises must distinguish whether a hot event on microblogging platforms is a spammer or is a naturally-developing event. In this paper, we focus on the hot event list on Sina Weibo and collect the relevant microblogs of each hot event to study the detecting methods of spammers. Notably, we develop an integral feature set consisting of user profile, user behavior, and user relationships to reflect various factors affecting the detection of spammers. Then, we employ typical machine learning methods to conduct extensive experiments on detecting spammers. We use a real data set crawled from the most prominent Chinese microblogging platform, Sina Weibo, and evaluate the performance of 10 machine learning models with five sampling methods. The results in terms of various metrics show that the Random Forest model and the over-sampling method achieve the best accuracy in detecting spammers and non-spammers.

Download Full-text

INTEGRATING DATA MINING TO A PROCESS DESIGN USING THE ROBUST BAYESIAN APPROACH

International Journal of Reliability Quality and Safety Engineering ◽

10.1142/s0218539308003155 ◽

2008 ◽

Vol 15 (05) ◽

pp. 441-464 ◽

Cited By ~ 3

Author(s):

HEAJIN JEONG ◽

SUHILL SONG ◽

SANGMUN SHIN ◽

BYUNG RAE CHO

Keyword(s):

Data Mining ◽

Process Design ◽

Research Work ◽

Design Procedure ◽

Design Tool ◽

Raw Data ◽

Correlation Based Feature Selection ◽

Vital Component ◽

Process Database ◽

Noise Factors

Although process design optimization issues have received considerable attention from researchers for more than several decades, and a number of methodologies for modeling and optimizing the process have been developed, there is still ample room for improvement. Most research work has rarely considered the use of raw data from a manufacturing process database into the process design. However, the use of cumulative raw data can be a vital component in optimizing processes. To address this, we propose a new process design procedure called robust-Bayesian data mining (RBDM). First, we show how data mining techniques and a correlation-based feature selection (CBFS) method can be applied effectively to the selection of significant factors. Second, we then show how RBDM can be incorporated into robust design. Third, we present how the proposed RBDM estimates process parameters by considering the concept of robustness of the estimated parameters while incorporating the concept of noise factors. Finally, we present numerical examples to illustrate the efficiency of the proposed RBDM as a design tool for optimizing manufacturing processes.

Download Full-text