Evaluation of Machine Learning Algorithms in Predicting the Next SQL Query from the Future

2021 ◽  
Vol 46 (1) ◽  
pp. 1-46
Author(s):  
Venkata Vamsikrishna Meduri ◽  
Kanchan Chowdhury ◽  
Mohamed Sarwat

Prediction of the next SQL query from the user, given her sequence of queries until the current timestep, during an ongoing interaction session of the user with the database, can help in speculative query processing and increased interactivity. While existing machine learning-- (ML) based approaches use recommender systems to suggest relevant queries to a user, there has been no exhaustive study on applying temporal predictors to predict the next user issued query. In this work, we experimentally compare ML algorithms in predicting the immediate next future query in an interaction workload, given the current user query or the sequence of queries in a user session thus far. As a part of this, we propose the adaptation of two powerful temporal predictors: (a) Recurrent Neural Networks (RNNs) and (b) a Reinforcement Learning approach called Q-Learning that uses Markov Decision Processes. We represent each query as a comprehensive set of fragment embeddings that not only captures the SQL operators, attributes, and relations but also the arithmetic comparison operators and constants that occur in the query. Our experiments on two real-world datasets show the effectiveness of temporal predictors against the baseline recommender systems in predicting the structural fragments in a query w.r.t. both quality and time. Besides showing that RNNs can be used to synthesize novel queries, we find that exact Q-Learning outperforms RNNs despite predicting the next query entirely from the historical query logs.

2012 ◽  
Vol 2012 ◽  
pp. 1-10 ◽  
Author(s):  
B. Remeseiro ◽  
M. Penas ◽  
A. Mosquera ◽  
J. Novo ◽  
M. G. Penedo ◽  
...  

The tear film lipid layer is heterogeneous among the population. Its classification depends on its thickness and can be done using the interference pattern categories proposed by Guillon. The interference phenomena can be characterised as a colour texture pattern, which can be automatically classified into one of these categories. From a photography of the eye, a region of interest is detected and its low-level features are extracted, generating a feature vector that describes it, to be finally classified in one of the target categories. This paper presents an exhaustive study about the problem at hand using different texture analysis methods in three colour spaces and different machine learning algorithms. All these methods and classifiers have been tested on a dataset composed of 105 images from healthy subjects and the results have been statistically analysed. As a result, the manual process done by experts can be automated with the benefits of being faster and unaffected by subjective factors, with maximum accuracy over 95%.


2021 ◽  
Vol 13 (3) ◽  
pp. 23-34
Author(s):  
Chandrakant D. Patel ◽  
◽  
Jayesh M. Patel

With the large quantity of information offered on-line, it's equally essential to retrieve correct information for a user query. A large amount of data is available in digital form in multiple languages. The various approaches want to increase the effectiveness of on-line information retrieval but the standard approach tries to retrieve information for a user query is to go looking at the documents within the corpus as a word by word for the given query. This approach is incredibly time intensive and it's going to miss several connected documents that are equally important. So, to avoid these issues, stemming has been extensively utilized in numerous Information Retrieval Systems (IRS) to extend the retrieval accuracy of all languages. These papers go through the problem of stemming with Web Page Categorization on Gujarati language which basically derived the stem words using GUJSTER algorithms [1]. The GUJSTER algorithm is based on morphological rules which is used to derived root or stem word from inflected words of the same class. In particular, we consider the influence of extracted a stem or root word, to check the integrity of the web page classification using supervised machine learning algorithms. This research work is intended to focus on the analysis of Web Page Categorization (WPC) of Gujarati language and concentrate on a research problem to do verify the influence of a stemming algorithm in a WPC application for the Gujarati language with improved accuracy between from 63% to 98% through Machine Learning supervised models with standard ratio 80% as training and 20% as testing.


Author(s):  
Prince Nathan S

Abstract: Travelling Salesmen problem is a very popular problem in the world of computer programming. It deals with the optimization of algorithms and an ever changing scenario as it gets more and more complex as the number of variables goes on increasing. The solutions which exist for this problem are optimal for a small and definite number of cases. One cannot take into consideration of the various factors which are included when this specific problem is tried to be solved for the real world where things change continuously. There is a need to adapt to these changes and find optimized solutions as the application goes on. The ability to adapt to any kind of data, whether static or ever-changing, understand and solve it is a quality that is shown by Machine Learning algorithms. As advances in Machine Learning take place, there has been quite a good amount of research for how to solve NP-hard problems using Machine Learning. This reportis a survey to understand what types of machine algorithms can be used to solve with TSP. Different types of approaches like Ant Colony Optimization and Q-learning are explored and compared. Ant Colony Optimization uses the concept of ants following pheromone levels which lets them know where the most amount of food is. This is widely used for TSP problems where the path is with the most pheromone is chosen. Q-Learning is supposed to use the concept of awarding an agent when taking the right action for a state it is in and compounding those specific rewards. This is very much based on the exploiting concept where the agent keeps on learning onits own to maximize its own reward. This can be used for TSP where an agentwill be rewarded for having a short path and will be rewarded more if the path chosen is the shortest. Keywords: LINEAR REGRESSION, LASSO REGRESSION, RIDGE REGRESSION, DECISION TREE REGRESSOR, MACHINE LEARNING, HYPERPARAMETER TUNING, DATA ANALYSIS


2019 ◽  
Vol 16 (10) ◽  
pp. 4280-4285
Author(s):  
Babaljeet Kaur ◽  
Richa Sharma ◽  
Shalli Rani ◽  
Deepali Gupta

Recommender systems were introduced in mid-1990 for assisting the users to choose a correct product from innumerable choices available. The basic concept of a recommender system is to advise a new item or product to the users instead of the manual search, because when user wants to buy a new item, he is confused about which item will suit him better and meet the intended requirements. From google news to netflix and from Instagram to LinkedIn, recommender systems have spread their roots in almost every application domain possible. Now a days, lots of recommender system are available for every field. In this paper, overview of recommender system, recommender approaches, application areas and the challenges of recommender system, is given. Further, we study conduct an experiment on online shoppers’ intention to predict the behavior of shoppers using Machine learning algorithms. Based on the results, it is observed that Random forest algorithm performs the best with 93% ROC value.


Author(s):  
Ch. Veena ◽  
B. Vijaya Babu

Recommender Systems have proven to be valuable way for online users to recommend information items like books, videos, songs etc.colloborative filtering methods are used to make all predictions from historical data. In this paper we introduce Apache mahout which is an open source and provides a rich set of components to construct a customized recommender system from a selection of machine learning algorithms.[12] This paper also focuses on addressing the challenges in collaborative filtering like scalability and data sparsity. To deal with scalability problems, we go with a distributed frame work like hadoop. We then present a customized user based recommender system.


2020 ◽  
Author(s):  
Srijan Gupta ◽  
Joeran Beel

The advances in the field of Automated Machine Learning (AutoML) have greatly reduced human effort in selecting and optimizing machine learning algorithms. These advances, however, have not yet widely made it to Recommender-Systems libraries. We introduce Auto-CaseRec, a Python framework based on the CaseRec recommender-system library. Auto-CaseRec provides automated algorithm selection and parameter tuning for recommendation algorithms. An initial evaluation of Auto-CaseRec against the baselines shows an average 13.88% improvement in RMSE for theMovielens100K dataset and an average 17.95% improvement in RMSE for the Last.fm dataset.


2020 ◽  
Author(s):  
Scott Cheng-Hsin Yang ◽  
Chirag Rank ◽  
Jake Alden Whritner ◽  
Olfa Nasraoui ◽  
Patrick Shafto

The enormous scale of the available information and products on the Internet has necessitated the development of algorithms that intermediate between options and human users. These AI/machine learning algorithms attempt to provide the user with relevant information. In doing so, the algorithms may incur potential negative consequences stemming from the need to select items about which it is uncertain to increase predictive accuracy versus the need to select items about which it is certain to increase recommendation accuracy. This tension between predicting relevant recommendations to the users and learning about the user's interests can be considered an instantiation of the well-known exploration-exploitation tradeoff in the context of information filtering and recommender systems. Building from existing machine learning algorithms, we introduce a parameterized model that unifies and interpolates between recommending relevant information and active learning. We present three experiments investigating the unified model. Specifically, we illustrate the tradeoffs of optimizing prediction and recommendation within a tightly controlled concept-learning paradigm, show the conditions under which a broad parameter range can optimize for both, and identify the effects of human variability on algorithm performance. Thus, combining methods and models from cognitive science and computer science, we quantify implications of tradeoffs between recommendation accuracy and learning about preferences of human users, demonstrating the value of experimental approaches to understanding real world human-machine feedback loops.


2021 ◽  
pp. 1-16
Author(s):  
Deepika Singh ◽  
Anju Saha ◽  
Anjana Gosain

Imbalanced dataset classification is challenging because of the severely skewed class distribution. The traditional machine learning algorithms show degraded performance for these skewed datasets. However, there are additional characteristics of a classification dataset that are not only challenging for the traditional machine learning algorithms but also increase the difficulty when constructing a model for imbalanced datasets. Data complexity metrics identify these intrinsic characteristics, which cause substantial deterioration of the learning algorithms’ performance. Though many research efforts have been made to deal with class noise, none of them focused on imbalanced datasets coupled with other intrinsic factors. This paper presents a novel hybrid pre-processing algorithm focusing on treating the class-label noise in the imbalanced dataset, which suffers from other intrinsic factors such as class overlapping, non-linear class boundaries, small disjuncts, and borderline examples. This algorithm uses the wCM complexity metric (proposed for imbalanced dataset) to identify noisy, borderline, and other difficult instances of the dataset and then intelligently handles these instances. Experiments on synthetic datasets and real-world datasets with different levels of imbalance, noise, small disjuncts, class overlapping, and borderline examples are conducted to check the effectiveness of the proposed algorithm. The experimental results show that the proposed algorithm offers an interesting alternative to popular state-of-the-art pre-processing algorithms for effectively handling imbalanced datasets along with noise and other difficulties.


Sign in / Sign up

Export Citation Format

Share Document