scholarly journals Knowledge-Preserving Certain Answers for SQL-like Queries

Author(s):  
Etienne Toussaint ◽  
Paolo Guagliardo ◽  
Leonid Libkin

Answering queries over incomplete data is based on finding answers that are certainly true, independently of how missing values are interpreted. This informal description has given rise to several different mathematical definitions of certainty. To unify them, a framework based on "explanations", or extra information about incomplete data, was recently proposed. It partly succeeded in justifying query answering methods for relational databases under set semantics, but had two major limitations. First, it was firmly tied to the set data model, and a fixed way of comparing incomplete databases with respect to their information content. These assumptions fail for real-life database queries in languages such as SQL that use bag semantics instead. Second, it was restricted to queries that only manipulate data, while in practice most analytical SQL queries invent new values, typically via arithmetic operations and aggregation. To leverage our understanding of the notion of certainty for queries in SQL-like languages, we consider incomplete databases whose information content may be enriched by additional knowledge. The knowledge order among them is derived from their semantics, rather than being fixed a priori. The resulting framework allows us to capture and justify existing notions of certainty, and extend these concepts to other data models and query languages. As natural applications, we provide for the first time a well-founded definition of certain answers for the relational bag data model and for value-inventing queries on incomplete databases, addressing the key shortcomings of previous approaches.

Author(s):  
Stefan Esser ◽  
Dirk Fahland

AbstractProcess event data is usually stored either in a sequential process event log or in a relational database. While the sequential, single-dimensional nature of event logs aids querying for (sub)sequences of events based on temporal relations such as “directly/eventually-follows,” it does not support querying multi-dimensional event data of multiple related entities. Relational databases allow storing multi-dimensional event data, but existing query languages do not support querying for sequences or paths of events in terms of temporal relations. In this paper, we propose a general data model for multi-dimensional event data based on labeled property graphs that allows storing structural and temporal relations in a single, integrated graph-based data structure in a systematic way. We provide semantics for all concepts of our data model, and generic queries for modeling event data over multiple entities that interact synchronously and asynchronously. The queries allow for efficiently converting large real-life event data sets into our data model, and we provide 5 converted data sets for further research. We show that typical and advanced queries for retrieving and aggregating such multi-dimensional event data can be formulated and executed efficiently in the existing query language Cypher, giving rise to several new research questions. Specifically, aggregation queries on our data model enable process mining over multiple inter-related entities using off-the-shelf technology.


Author(s):  
Artem Chebotko ◽  
Shiyong Lu

Relational technology has shown to be very useful for scalable Semantic Web data management. Numerous researchers have proposed to use RDBMSs to store and query voluminous RDF data using SQL and RDF query languages. This chapter studies how RDF queries with the so called well-designed graph patterns and nested optional patterns can be efficiently evaluated in an RDBMS. The authors propose to extend relational algebra with a novel relational operator, nested optional join (NOJ), that is more efficient than left outer join in processing nested optional patterns of well-designed graph patterns. They design three efficient algorithms to implement the new operator in relational databases: (1) nested-loops NOJ algorithm, NL-NOJ, (2) sort-merge NOJ algorithm, SM-NOJ, and (3) simple hash NOJ algorithm, SH-NOJ. Using a real life RDF dataset, the authors demonstrate the efficiency of their algorithms by comparing them with the corresponding left outer join implementations and explore the effect of join selectivity on the performance of these algorithms.


Author(s):  
Marco Console ◽  
Paolo Guagliardo ◽  
Leonid Libkin

Querying incomplete data is an important task both in data management, and in many AI applications that use query rewriting to take advantage of relational database technology. Usually one looks for answers that are certain, i.e., true in every possible world represented by an incomplete database. For positive queries, expressed either in positive relational algebra or as unions of conjunctive queries, finding such answers can be done efficiently when databases and query answers are sets. Real-life databases however use bag, rather than set, semantics. For bags, instead of saying that a tuple is certainly in the answer, we have more detailed information: namely, the range of the numbers of occurrences of the tuple in query answers. We show that the behavior of positive queries is different under bag semantics: finding the minimum number of occurrences can still be done efficiently, but for maximum it becomes intractable. We use these results to investigate approximation schemes for computing certain answers to arbitrary first-order queries that have been proposed for set semantics. One of them cannot be adapted to bags, as it relies on the intractable maxima of occurrences, but another scheme only deals with minima, and we show how to adapt it to bag semantics without losing efficiency.


2021 ◽  
Vol 22 (4) ◽  
pp. 1-52
Author(s):  
Marcelo Arenas ◽  
Pablo BarcelÓ ◽  
Mikaël Monet

We study the complexity of various fundamental counting problems that arise in the context of incomplete databases, i.e., relational databases that can contain unknown values in the form of labeled nulls. Specifically, we assume that the domains of these unknown values are finite and, for a Boolean query  q , we consider the following two problems: Given as input an incomplete database  D , (a) return the number of completions of  D that satisfy  q ; or (b) return the number of valuations of the nulls of  D yielding a completion that satisfies  q . We obtain dichotomies between #P-hardness and polynomial-time computability for these problems when  q is a self-join–free conjunctive query and study the impact on the complexity of the following two restrictions: (1) every null occurs at most once in  D (what is called Codd tables ); and (2) the domain of each null is the same. Roughly speaking, we show that counting completions is much harder than counting valuations: For instance, while the latter is always in #P, we prove that the former is not in #P under some widely believed theoretical complexity assumption. Moreover, we find that both (1) and (2) can reduce the complexity of our problems. We also study the approximability of these problems and show that, while counting valuations always has a fully polynomial-time randomized approximation scheme (FPRAS), in most cases counting completions does not. Finally, we consider more expressive query languages and situate our problems with respect to known complexity classes.


Author(s):  
Giovanni Amendola ◽  
Leonid Libkin

When a dataset is not fully specified and can represent many possible worlds, one commonly answers queries by computing certain answers to them. A natural way of defining certainty is to say that an answer is certain if it is consistent with query answers in all possible worlds, and is furthermore the most informative answer with this property. However, the existence and complexity of such answers is not yet well understood even for relational databases. Thus in applications one tends to use different notions, essentially the intersection of query answers in possible worlds. However, justification of such notions has long been questioned. This leads to two problems: are certain answers based on informativeness feasible in applications? and can a clean justification be provided for intersection-based notions? Our goal is to answer both. For the former, we show that such answers may not exist, or be very large, even in simple cases of querying incomplete data. For the latter, we add the concept of explanations to the notion of informativeness: it shows not only that one object is more informative than the other, but also says why this is so. This leads to a modified notion of certainty: explainable certain answers. We present a general framework for reasoning about them, and show that for open and closed world relational databases, they are precisely the common intersection-based notions of certainty.


2020 ◽  
Vol 08 (01) ◽  
pp. 133-151
Author(s):  
Munqath Alattar ◽  
Attila Sali

In general, there are two main approaches to handle the missing data values problem in SQL tables. One is to ignore or remove any record with some missing data values. The other approach is to fill or impute the missing data with new values [A. Farhangfar, L. A. Kurgan and W. Pedrycz, A novel framework for imputation of missing values in databases, IEEE Trans. Syst. Man Cybern. A, Syst. Hum. 37(5) (2007) 692–709]. In this paper, the second method is considered. Possible worlds, possible and certain keys, and weak and strong functional dependencies were introduced in Refs. 4 and 2 [H. Köhler, U. Leck, S. Link and X. Zhou, Possible and certain keys for SQL, VLDB J. 25(4) (2016) 571–596; M. Levene and G. Loizou, Axiomatisation of functional dependencies in incomplete relations, Theor. Comput. Sci. 206(1) (1998) 283–300]. We introduced the intermediate concept of strongly possible worlds in a preceding paper, which are obtained by filling missing data values with values already existing in the table. Using strongly possible worlds, strongly possible keys and strongly possible functional dependencies (spFDs) were introduced in Refs. 5 and 1 [M. Alattar and A. Sali, Keys in relational databases with nulls and bounded domains, in ADBIS 2019: Advances in Databases and Information Systems, Lecture Notes in Computer Science, Vol. 11695 (Springer, Cham, 2019), pp. 33–50; Functional dependencies in incomplete databases with limited domains, in FoiKS 2020: Foundations of Information and Knowledge Systems, Lecture Notes in Computer Science, Vol. 12012 (Springer, Cham, 2020), pp. 1–21]. In this paper, some axioms and rules for strongly possible functional dependencies are provided, These axioms and rules form the basis for a possible axiomatization of spFDs. For that, we analyze which weak/strong functional dependency and certain functional dependency axioms remain sound for strongly possible functional dependencies, and for the axioms that are not sound, we give appropriate modifications for soundness.


2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Spyridoula Vazou ◽  
Collin A. Webster ◽  
Gregory Stewart ◽  
Priscila Candal ◽  
Cate A. Egan ◽  
...  

Abstract Background/Objective Movement integration (MI) involves infusing physical activity into normal classroom time. A wide range of MI interventions have succeeded in increasing children’s participation in physical activity. However, no previous research has attempted to unpack the various MI intervention approaches. Therefore, this study aimed to systematically review, qualitatively analyze, and develop a typology of MI interventions conducted in primary/elementary school settings. Subjects/Methods Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed to identify published MI interventions. Irrelevant records were removed first by title, then by abstract, and finally by full texts of articles, resulting in 72 studies being retained for qualitative analysis. A deductive approach, using previous MI research as an a priori analytic framework, alongside inductive techniques were used to analyze the data. Results Four types of MI interventions were identified and labeled based on their design: student-driven, teacher-driven, researcher-teacher collaboration, and researcher-driven. Each type was further refined based on the MI strategies (movement breaks, active lessons, other: opening activity, transitions, reward, awareness), the level of intrapersonal and institutional support (training, resources), and the delivery (dose, intensity, type, fidelity). Nearly half of the interventions were researcher-driven, which may undermine the sustainability of MI as a routine practice by teachers in schools. An imbalance is evident on the MI strategies, with transitions, opening and awareness activities, and rewards being limitedly studied. Delivery should be further examined with a strong focus on reporting fidelity. Conclusions There are distinct approaches that are most often employed to promote the use of MI and these approaches may often lack a minimum standard for reporting MI intervention details. This typology may be useful to effectively translate the evidence into practice in real-life settings to better understand and study MI interventions.


Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 786
Author(s):  
Yenny Villuendas-Rey ◽  
Eley Barroso-Cubas ◽  
Oscar Camacho-Nieto ◽  
Cornelio Yáñez-Márquez

Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.


1997 ◽  
Vol 08 (03) ◽  
pp. 301-315 ◽  
Author(s):  
Marcel J. Nijman ◽  
Hilbert J. Kappen

A Radial Basis Boltzmann Machine (RBBM) is a specialized Boltzmann Machine architecture that combines feed-forward mapping with probability estimation in the input space, and for which very efficient learning rules exist. The hidden representation of the network displays symmetry breaking as a function of the noise in the dynamics. Thus, generalization can be studied as a function of the noise in the neuron dynamics instead of as a function of the number of hidden units. We show that the RBBM can be seen as an elegant alternative of k-nearest neighbor, leading to comparable performance without the need to store all data. We show that the RBBM has good classification performance compared to the MLP. The main advantage of the RBBM is that simultaneously with the input-output mapping, a model of the input space is obtained which can be used for learning with missing values. We derive learning rules for the case of incomplete data, and show that they perform better on incomplete data than the traditional learning rules on a 'repaired' data set.


Algorithms ◽  
2021 ◽  
Vol 14 (3) ◽  
pp. 85
Author(s):  
Andreas Rauh ◽  
Julia Kersten

Continuous-time linear systems with uncertain parameters are widely used for modeling real-life processes. The uncertain parameters, contained in the system and input matrices, can be constant or time-varying. In the latter case, they may represent state dependencies of these matrices. Assuming bounded uncertainties, interval methods become applicable for a verified reachability analysis, for feasibility analysis of feedback controllers, or for the design of robust set-valued state estimators. The evaluation of these system models becomes computationally efficient after a transformation into a cooperative state-space representation, where the dynamics satisfy certain monotonicity properties with respect to the initial conditions. To obtain such representations, similarity transformations are required which are not trivial to find for sufficiently wide a-priori bounds of the uncertain parameters. This paper deals with the derivation and algorithmic comparison of two different transformation techniques for which their applicability to processes with constant and time-varying parameters has to be distinguished. An interval-based reachability analysis of the states of a simple electric step-down converter concludes this paper.


Sign in / Sign up

Export Citation Format

Share Document