scholarly journals The Complexity of Counting Problems Over Incomplete Databases

2021 ◽  
Vol 22 (4) ◽  
pp. 1-52
Author(s):  
Marcelo Arenas ◽  
Pablo BarcelÓ ◽  
Mikaël Monet

We study the complexity of various fundamental counting problems that arise in the context of incomplete databases, i.e., relational databases that can contain unknown values in the form of labeled nulls. Specifically, we assume that the domains of these unknown values are finite and, for a Boolean query  q , we consider the following two problems: Given as input an incomplete database  D , (a) return the number of completions of  D that satisfy  q ; or (b) return the number of valuations of the nulls of  D yielding a completion that satisfies  q . We obtain dichotomies between #P-hardness and polynomial-time computability for these problems when  q is a self-join–free conjunctive query and study the impact on the complexity of the following two restrictions: (1) every null occurs at most once in  D (what is called Codd tables ); and (2) the domain of each null is the same. Roughly speaking, we show that counting completions is much harder than counting valuations: For instance, while the latter is always in #P, we prove that the former is not in #P under some widely believed theoretical complexity assumption. Moreover, we find that both (1) and (2) can reduce the complexity of our problems. We also study the approximability of these problems and show that, while counting valuations always has a fully polynomial-time randomized approximation scheme (FPRAS), in most cases counting completions does not. Finally, we consider more expressive query languages and situate our problems with respect to known complexity classes.

Author(s):  
Katrin Casel ◽  
Henning Fernau ◽  
Serge Gaspers ◽  
Benjamin Gras ◽  
Markus L. Schmid

AbstractIn the smallest grammar problem, we are given a word w and we want to compute a preferably small context-free grammar G for the singleton language {w} (where the size of a grammar is the sum of the sizes of its rules, and the size of a rule is measured by the length of its right side). It is known that, for unbounded alphabets, the decision variant of this problem is NP-hard and the optimisation variant does not allow a polynomial-time approximation scheme, unless P = NP. We settle the long-standing open problem whether these hardness results also hold for the more realistic case of a constant-size alphabet. More precisely, it is shown that the smallest grammar problem remains NP-complete (and its optimisation version is APX-hard), even if the alphabet is fixed and has size of at least 17. The corresponding reduction is robust in the sense that it also works for an alternative size-measure of grammars that is commonly used in the literature (i. e., a size measure also taking the number of rules into account), and it also allows to conclude that even computing the number of rules required by a smallest grammar is a hard problem. On the other hand, if the number of nonterminals (or, equivalently, the number of rules) is bounded by a constant, then the smallest grammar problem can be solved in polynomial time, which is shown by encoding it as a problem on graphs with interval structure. However, treating the number of rules as a parameter (in terms of parameterised complexity) yields W[1]-hardness. Furthermore, we present an $\mathcal {O}(3^{\mid {w}\mid })$ O ( 3 ∣ w ∣ ) exact exponential-time algorithm, based on dynamic programming. These three main questions are also investigated for 1-level grammars, i. e., grammars for which only the start rule contains nonterminals on the right side; thus, investigating the impact of the “hierarchical depth” of grammars on the complexity of the smallest grammar problem. In this regard, we obtain for 1-level grammars similar, but slightly stronger results.


Author(s):  
Etienne Toussaint ◽  
Paolo Guagliardo ◽  
Leonid Libkin

Answering queries over incomplete data is based on finding answers that are certainly true, independently of how missing values are interpreted. This informal description has given rise to several different mathematical definitions of certainty. To unify them, a framework based on "explanations", or extra information about incomplete data, was recently proposed. It partly succeeded in justifying query answering methods for relational databases under set semantics, but had two major limitations. First, it was firmly tied to the set data model, and a fixed way of comparing incomplete databases with respect to their information content. These assumptions fail for real-life database queries in languages such as SQL that use bag semantics instead. Second, it was restricted to queries that only manipulate data, while in practice most analytical SQL queries invent new values, typically via arithmetic operations and aggregation. To leverage our understanding of the notion of certainty for queries in SQL-like languages, we consider incomplete databases whose information content may be enriched by additional knowledge. The knowledge order among them is derived from their semantics, rather than being fixed a priori. The resulting framework allows us to capture and justify existing notions of certainty, and extend these concepts to other data models and query languages. As natural applications, we provide for the first time a well-founded definition of certain answers for the relational bag data model and for value-inventing queries on incomplete databases, addressing the key shortcomings of previous approaches.


2021 ◽  
Vol 6 ◽  
Author(s):  
Suzanne W. Dietrich ◽  
Don Goelman ◽  
Jennifer Broatch ◽  
Sharon Crook ◽  
Becky Ball ◽  
...  

The goal of the Databases for Many Majors project is to engage a broad audience in understanding fundamental database concepts using visualizations with color and visual cues to present these topics to students across many disciplines. There are three visualizations: introducing relational databases, querying, and design. A unique feature of these learning tools is the ability for instructors in diverse disciplines to customize the content of the visualization’s example data, supporting text, and formative assessment questions to promote relevance to their students. This paper presents a study on the impact of the customized introduction to relational databases visualization on both conceptual learning and attitudes towards databases. The assessment was performed in three different courses across two universities. The evaluation shows that learning outcomes are met with any visualization, which appears to be counter to expectations. However, students using a visualization customized to the course context had more positive attitudes and beliefs towards the usefulness of databases than the control group.


2021 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Shuen Guo ◽  
Zhichao Geng ◽  
Jinjiang Yuan

<p style='text-indent:20px;'>In this paper, we study the single-machine Pareto-scheduling of jobs with multiple weighting vectors for minimizing the total weighted late works. Each weighting vector has its corresponding weighted late work. The goal of the problem is to find the Pareto-frontier for the weighted late works of the multiple weighting vectors. When the number of weighting vectors is arbitrary, it is implied in the literature that the problem is unary NP-hard. Then we concentrate on our research under the assumption that the number of weighting vectors is a constant. For this problem, we present a dynamic programming algorithm running in pseudo-polynomial time and a fully polynomial-time approximation scheme (FPTAS).</p>


2018 ◽  
Vol 6 (11) ◽  
pp. 254-265
Author(s):  
Damitha D Karunaratna

Relational Databases are typically created to fulfil the information requirements of a community of users generally belongs to a single organization. Data stored in these databases were typically accessed by using Structured Query Languages or through customized interfaces.  With the popularity of the World Wide Web and the availability of large number of Relational Databases for public access there is a need for users to retrieve data from these databases by using a text-based queries, possibly by using the terms that they are familiar with. However, the inherent limitations of Structured Query Languages used to create and access data in relational Data Bases does not allow uses to access data by using text-based queries. Also, the terms used in queries should be limited to those used during the construction of the databases. This paper proposes an architecture to generated ontologies over relation databases and show how they could be enhanced semantically by using available domain-specific or top-level ontologies so that the data managed by the DBs can be accessed by using text-based queries. The feasibility of the proposed architecture was demonstrated by building a prototype system over a sample MySQL database.


Sign in / Sign up

Export Citation Format

Share Document