Evaluation of Machine Learning Algorithms in Predicting the Next SQL Query from the Future

Venkata Vamsikrishna Meduri; Kanchan Chowdhury; Mohamed Sarwat

doi:10.1145/3442338

Evaluation of Machine Learning Algorithms in Predicting the Next SQL Query from the Future

ACM Transactions on Database Systems ◽

10.1145/3442338 ◽

2021 ◽

Vol 46 (1) ◽

pp. 1-46

Author(s):

Venkata Vamsikrishna Meduri ◽

Kanchan Chowdhury ◽

Mohamed Sarwat

Keyword(s):

Machine Learning ◽

Recommender Systems ◽

Machine Learning Algorithms ◽

Q Learning ◽

User Query ◽

Query Logs ◽

Markov Decision ◽

Real World Datasets ◽

Exhaustive Study ◽

Sql Query

Prediction of the next SQL query from the user, given her sequence of queries until the current timestep, during an ongoing interaction session of the user with the database, can help in speculative query processing and increased interactivity. While existing machine learning-- (ML) based approaches use recommender systems to suggest relevant queries to a user, there has been no exhaustive study on applying temporal predictors to predict the next user issued query. In this work, we experimentally compare ML algorithms in predicting the immediate next future query in an interaction workload, given the current user query or the sequence of queries in a user session thus far. As a part of this, we propose the adaptation of two powerful temporal predictors: (a) Recurrent Neural Networks (RNNs) and (b) a Reinforcement Learning approach called Q-Learning that uses Markov Decision Processes. We represent each query as a comprehensive set of fragment embeddings that not only captures the SQL operators, attributes, and relations but also the arithmetic comparison operators and constants that occur in the query. Our experiments on two real-world datasets show the effectiveness of temporal predictors against the baseline recommender systems in predicting the structural fragments in a query w.r.t. both quality and time. Besides showing that RNNs can be used to synthesize novel queries, we find that exact Q-Learning outperforms RNNs despite predicting the next query entirely from the historical query logs.

Download Full-text

The use of machine learning algorithms in recommender systems: A systematic review

Expert Systems with Applications ◽

10.1016/j.eswa.2017.12.020 ◽

2018 ◽

Vol 97 ◽

pp. 205-227 ◽

Cited By ~ 137

Author(s):

Ivens Portugal ◽

Paulo Alencar ◽

Donald Cowan

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Recommender Systems ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Statistical Comparison of Classifiers Applied to the Interferential Tear Film Lipid Layer Automatic Classification

Computational and Mathematical Methods in Medicine ◽

10.1155/2012/207315 ◽

2012 ◽

Vol 2012 ◽

pp. 1-10 ◽

Cited By ~ 15

Author(s):

B. Remeseiro ◽

M. Penas ◽

A. Mosquera ◽

J. Novo ◽

M. G. Penedo ◽

...

Keyword(s):

Machine Learning ◽

Healthy Subjects ◽

Feature Vector ◽

Tear Film ◽

Region Of Interest ◽

Machine Learning Algorithms ◽

Lipid Layer ◽

Statistical Comparison ◽

Texture Pattern ◽

Exhaustive Study

The tear film lipid layer is heterogeneous among the population. Its classification depends on its thickness and can be done using the interference pattern categories proposed by Guillon. The interference phenomena can be characterised as a colour texture pattern, which can be automatically classified into one of these categories. From a photography of the eye, a region of interest is detected and its low-level features are extracted, generating a feature vector that describes it, to be finally classified in one of the target categories. This paper presents an exhaustive study about the problem at hand using different texture analysis methods in three colour spaces and different machine learning algorithms. All these methods and classifiers have been tested on a dataset composed of 105 images from healthy subjects and the results have been statistically analysed. As a result, the manual process done by experts can be automated with the benefits of being faster and unaffected by subjective factors, with maximum accuracy over 95%.

Download Full-text

Influence of GUJarati STEmmeR in Supervised Learning of Web Page Categorization

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2021.03.03 ◽

2021 ◽

Vol 13 (3) ◽

pp. 23-34

Author(s):

Chandrakant D. Patel ◽

◽

Jayesh M. Patel

Keyword(s):

Machine Learning ◽

Information Retrieval ◽

Research Work ◽

Research Problem ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Web Page ◽

User Query ◽

On Line ◽

Gujarati Language

With the large quantity of information offered on-line, it's equally essential to retrieve correct information for a user query. A large amount of data is available in digital form in multiple languages. The various approaches want to increase the effectiveness of on-line information retrieval but the standard approach tries to retrieve information for a user query is to go looking at the documents within the corpus as a word by word for the given query. This approach is incredibly time intensive and it's going to miss several connected documents that are equally important. So, to avoid these issues, stemming has been extensively utilized in numerous Information Retrieval Systems (IRS) to extend the retrieval accuracy of all languages. These papers go through the problem of stemming with Web Page Categorization on Gujarati language which basically derived the stem words using GUJSTER algorithms [1]. The GUJSTER algorithm is based on morphological rules which is used to derived root or stem word from inflected words of the same class. In particular, we consider the influence of extracted a stem or root word, to check the integrity of the web page classification using supervised machine learning algorithms. This research work is intended to focus on the analysis of Web Page Categorization (WPC) of Gujarati language and concentrate on a research problem to do verify the influence of a stemming algorithm in a WPC application for the Gujarati language with improved accuracy between from 63% to 98% through Machine Learning supervised models with standard ratio 80% as training and 20% as testing.

Download Full-text

Optimization Techniques to Solve Travelling Salesman Problem Using Machine Learning Algorithms

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2022.39822 ◽

2022 ◽

Vol 10 (1) ◽

pp. 274-279

Author(s):

Prince Nathan S

Keyword(s):

Machine Learning ◽

Ant Colony Optimization ◽

Travelling Salesman Problem ◽

Learning Algorithms ◽

Optimization Techniques ◽

Machine Learning Algorithms ◽

Ant Colony ◽

Lasso Regression ◽

Q Learning ◽

The Right

Abstract: Travelling Salesmen problem is a very popular problem in the world of computer programming. It deals with the optimization of algorithms and an ever changing scenario as it gets more and more complex as the number of variables goes on increasing. The solutions which exist for this problem are optimal for a small and definite number of cases. One cannot take into consideration of the various factors which are included when this specific problem is tried to be solved for the real world where things change continuously. There is a need to adapt to these changes and find optimized solutions as the application goes on. The ability to adapt to any kind of data, whether static or ever-changing, understand and solve it is a quality that is shown by Machine Learning algorithms. As advances in Machine Learning take place, there has been quite a good amount of research for how to solve NP-hard problems using Machine Learning. This reportis a survey to understand what types of machine algorithms can be used to solve with TSP. Different types of approaches like Ant Colony Optimization and Q-learning are explored and compared. Ant Colony Optimization uses the concept of ants following pheromone levels which lets them know where the most amount of food is. This is widely used for TSP problems where the path is with the most pheromone is chosen. Q-Learning is supposed to use the concept of awarding an agent when taking the right action for a state it is in and compounding those specific rewards. This is very much based on the exploiting concept where the agent keeps on learning onits own to maximize its own reward. This can be used for TSP where an agentwill be rewarded for having a short path and will be rewarded more if the path chosen is the shortest. Keywords: LINEAR REGRESSION, LASSO REGRESSION, RIDGE REGRESSION, DECISION TREE REGRESSOR, MACHINE LEARNING, HYPERPARAMETER TUNING, DATA ANALYSIS

Download Full-text

Recommender System: Towards Classification of Human Intentions in E-Shopping Using Machine Learning

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8513 ◽

2019 ◽

Vol 16 (10) ◽

pp. 4280-4285

Author(s):

Babaljeet Kaur ◽

Richa Sharma ◽

Shalli Rani ◽

Deepali Gupta

Keyword(s):

Machine Learning ◽

Recommender Systems ◽

Recommender System ◽

Machine Learning Algorithms ◽

Application Domain ◽

Random Forest Algorithm ◽

Manual Search ◽

Online Shoppers ◽

New Item

Recommender systems were introduced in mid-1990 for assisting the users to choose a correct product from innumerable choices available. The basic concept of a recommender system is to advise a new item or product to the users instead of the manual search, because when user wants to buy a new item, he is confused about which item will suit him better and meet the intended requirements. From google news to netflix and from Instagram to LinkedIn, recommender systems have spread their roots in almost every application domain possible. Now a days, lots of recommender system are available for every field. In this paper, overview of recommender system, recommender approaches, application areas and the challenges of recommender system, is given. Further, we study conduct an experiment on online shoppers’ intention to predict the behavior of shoppers using Machine learning algorithms. Based on the results, it is observed that Random forest algorithm performs the best with 93% ROC value.

Download Full-text

A User- Based Recommendation with a Scalable Machine Learning Tool

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v5i5.pp1153-1157 ◽

2015 ◽

Vol 5 (5) ◽

pp. 1153 ◽

Cited By ~ 2

Author(s):

Ch. Veena ◽

B. Vijaya Babu

Keyword(s):

Machine Learning ◽

Collaborative Filtering ◽

Open Source ◽

Recommender Systems ◽

Recommender System ◽

Historical Data ◽

Machine Learning Algorithms ◽

Frame Work ◽

Machine Learning Tool ◽

Selection Of

Recommender Systems have proven to be valuable way for online users to recommend information items like books, videos, songs etc.colloborative filtering methods are used to make all predictions from historical data. In this paper we introduce Apache mahout which is an open source and provides a rich set of components to construct a customized recommender system from a selection of machine learning algorithms.[12] This paper also focuses on addressing the challenges in collaborative filtering like scalability and data sparsity. To deal with scalability problems, we go with a distributed frame work like hadoop. We then present a customized user based recommender system.

Download Full-text

Auto-CaseRec: Automatically Selecting and Optimizing Recommendation-Systems Algorithms

10.31219/osf.io/4znmd ◽

2020 ◽

Author(s):

Srijan Gupta ◽

Joeran Beel

Keyword(s):

Machine Learning ◽

Recommender Systems ◽

Recommender System ◽

Parameter Tuning ◽

Machine Learning Algorithms ◽

Algorithm Selection ◽

Automated Algorithm ◽

Recommendation Algorithms ◽

Automated Machine Learning ◽

Human Effort

The advances in the ﬁeld of Automated Machine Learning (AutoML) have greatly reduced human effort in selecting and optimizing machine learning algorithms. These advances, however, have not yet widely made it to Recommender-Systems libraries. We introduce Auto-CaseRec, a Python framework based on the CaseRec recommender-system library. Auto-CaseRec provides automated algorithm selection and parameter tuning for recommendation algorithms. An initial evaluation of Auto-CaseRec against the baselines shows an average 13.88% improvement in RMSE for theMovielens100K dataset and an average 17.95% improvement in RMSE for the Last.fm dataset.

Download Full-text

Unifying recommendation and active learning for information filtering and recommender systems

10.31234/osf.io/jqa83 ◽

2020 ◽

Author(s):

Scott Cheng-Hsin Yang ◽

Chirag Rank ◽

Jake Alden Whritner ◽

Olfa Nasraoui ◽

Patrick Shafto

Keyword(s):

Machine Learning ◽

Active Learning ◽

Recommender Systems ◽

Learning Algorithms ◽

Information Filtering ◽

Relevant Information ◽

Machine Learning Algorithms ◽

Negative Consequences ◽

Parameterized Model ◽

Recommendation Accuracy

The enormous scale of the available information and products on the Internet has necessitated the development of algorithms that intermediate between options and human users. These AI/machine learning algorithms attempt to provide the user with relevant information. In doing so, the algorithms may incur potential negative consequences stemming from the need to select items about which it is uncertain to increase predictive accuracy versus the need to select items about which it is certain to increase recommendation accuracy. This tension between predicting relevant recommendations to the users and learning about the user's interests can be considered an instantiation of the well-known exploration-exploitation tradeoff in the context of information filtering and recommender systems. Building from existing machine learning algorithms, we introduce a parameterized model that unifies and interpolates between recommending relevant information and active learning. We present three experiments investigating the unified model. Specifically, we illustrate the tradeoffs of optimizing prediction and recommendation within a tightly controlled concept-learning paradigm, show the conditions under which a broad parameter range can optimize for both, and identify the effects of human variability on algorithm performance. Thus, combining methods and models from cognitive science and computer science, we quantify implications of tradeoffs between recommendation accuracy and learning about preferences of human users, demonstrating the value of experimental approaches to understanding real world human-machine feedback loops.

Download Full-text

wCM based hybrid pre-processing algorithm for class imbalanced dataset

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210624 ◽

2021 ◽

pp. 1-16

Author(s):

Deepika Singh ◽

Anju Saha ◽

Anjana Gosain

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Processing Algorithm ◽

Imbalanced Dataset ◽

Imbalanced Datasets ◽

Intrinsic Factors ◽

Class Noise ◽

Real World Datasets ◽

Small Disjuncts

Imbalanced dataset classification is challenging because of the severely skewed class distribution. The traditional machine learning algorithms show degraded performance for these skewed datasets. However, there are additional characteristics of a classification dataset that are not only challenging for the traditional machine learning algorithms but also increase the difficulty when constructing a model for imbalanced datasets. Data complexity metrics identify these intrinsic characteristics, which cause substantial deterioration of the learning algorithms’ performance. Though many research efforts have been made to deal with class noise, none of them focused on imbalanced datasets coupled with other intrinsic factors. This paper presents a novel hybrid pre-processing algorithm focusing on treating the class-label noise in the imbalanced dataset, which suffers from other intrinsic factors such as class overlapping, non-linear class boundaries, small disjuncts, and borderline examples. This algorithm uses the wCM complexity metric (proposed for imbalanced dataset) to identify noisy, borderline, and other difficult instances of the dataset and then intelligently handles these instances. Experiments on synthetic datasets and real-world datasets with different levels of imbalance, noise, small disjuncts, class overlapping, and borderline examples are conducted to check the effectiveness of the proposed algorithm. The experimental results show that the proposed algorithm offers an interesting alternative to popular state-of-the-art pre-processing algorithms for effectively handling imbalanced datasets along with noise and other difficulties.

Download Full-text

Machine Learning Algorithms for building Recommender Systems

2019 International Conference on Intelligent Computing and Control Systems (ICCS) ◽

10.1109/iccs45141.2019.9065538 ◽

2019 ◽

Author(s):

Richa Sharma ◽

Shalli Rani ◽

Sarvesh Tanwar

Keyword(s):

Machine Learning ◽

Recommender Systems ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text