Enumeration algorithm
In computer science, an enumeration algorithm is an algorithm that enumerates the answers to a computational problem. Formally, such an algorithm applies to problems that take an input and produce a list of solutions, similarly to function problems. For each input, the enumeration algorithm must produce the list of all solutions, without duplicates, and then halt. The performance of an enumeration algorithm is measured in terms of the time required to produce the solutions, either in terms of the total time required to produce all solutions, or in terms of the maximal delay between two consecutive solutions and in terms of a preprocessing time, counted as the time before outputting the first solution. This complexity can be expressed in terms of the size of the input, the size of each individual output, or the total size of the set of all outputs, similarly to what is done with output-sensitive algorithms.
Formal definitions
[edit]An enumeration problem is defined as a relation over strings of an arbitrary alphabet :
An algorithm solves if for every input the algorithm produces the (possibly infinite) sequence such that has no duplicate and if and only if . The algorithm should halt if the sequence is finite.
Common complexity classes
[edit]Enumeration problems have been studied in the context of computational complexity theory, and several complexity classes have been introduced for such problems.
A very general such class is EnumP,[1] the class of problems for which the correctness of a possible output can be checked in polynomial time in the input and output. Formally, for such a problem, there must exist an algorithm A which takes as input the problem input x, the candidate output y, and solves the decision problem of whether y is a correct output for the input x, in polynomial time in x and y. For instance, this class contains all problems that amount to enumerating the witnesses of a problem in the class NP.
Other classes that have been defined include the following. In the case of problems that are also in EnumP, these problems are ordered from least to most specific:
- Output polynomial, the class of problems whose complete output can be computed in polynomial time.
- Incremental polynomial time, the class of problems where, for all i, the i-th output can be produced in polynomial time in the input size and in the number i.
- Polynomial delay, the class of problems where the delay between two consecutive outputs is polynomial in the input (and independent from the output).
- Strongly polynomial delay, the class of problems where the delay before each output is polynomial in the size of this specific output (and independent from the input or from the other outputs). The preprocessing is generally assumed to be polynomial.
- Constant delay, the class of problems where the delay before each output is constant, i.e., independent from the input and output. The preprocessing phase is generally assumed to be polynomial in the input.
Common techniques
[edit]- Backtracking: The simplest way to enumerate all solutions is by systematically exploring the space of possible results (partitioning it at each successive step).[2] However, performing this may not give good guarantees on the delay, i.e., a backtracking algorithm may spend a long time exploring parts of the space of possible results that do not give rise to a full solution.
- Flashlight search: This technique improves on backtracking by exploring the space of all possible solutions but solving at each step the problem of whether the current partial solution can be extended to a partial solution.[1] If the answer is no, then the algorithm can immediately backtrack and avoid wasting time, which makes it easier to show guarantees on the delay between any two complete solutions. In particular, this technique applies well to self-reducible problems.
- Closure under set operations: If we wish to enumerate the disjoint union of two sets, then we can solve the problem by enumerating the first set and then the second set. If the union is non disjoint but the sets can be enumerated in sorted order, then the enumeration can be performed in parallel on both sets while eliminating duplicates on the fly. If the union is not disjoint and both sets are not sorted then duplicates can be eliminated at the expense of a higher memory usage, e.g., using a hash table. Likewise, the cartesian product of two sets can be enumerated efficiently by enumerating one set and joining each result with all results obtained when enumerating the second step.
Examples of enumeration problems
[edit]- The vertex enumeration problem, where we are given a polytope described as a system of linear inequalities and we must enumerate the vertices of the polytope.
- Enumerating the minimal transversals of a hypergraph. This problem is related to monotone dualization and is connected to many applications in database theory and graph theory.[3]
- Enumerating the answers to a database query, for instance a conjunctive query or a query expressed in monadic second-order. There have been characterizations in database theory of which conjunctive queries could be enumerated with linear preprocessing and constant delay.[4]
- The problem of enumerating maximal cliques in an input graph, e.g., with the Bron–Kerbosch algorithm
- Listing all elements of structures such as matroids and greedoids
- Several problems on graphs, e.g., enumerating independent sets, paths, cuts, etc.
- Enumerating the satisfying assignments of representations of Boolean functions, e.g., a Boolean formula written in conjunctive normal form or disjunctive normal form, a binary decision diagram such as an OBDD, or a Boolean circuit in restricted classes studied in knowledge compilation, e.g., NNF.[5]
Connection to computability theory
[edit]The notion of enumeration algorithms is also used in the field of computability theory to define some high complexity classes such as RE, the class of all recursively enumerable problems. This is the class of sets for which there exist an enumeration algorithm that will produce all elements of the set: the algorithm may run forever if the set is infinite, but each solution must be produced by the algorithm after a finite time.
References
[edit]- ^ a b Strozecki, Yann; Mary, Arnaud (2019). "Efficient Enumeration of Solutions Produced by Closure Operations". Discrete Mathematics & Theoretical Computer Science. 21 (3). doi:10.23638/DMTCS-21-3-22.
- ^ Read, Ronald C.; Tarjan, Robert E. (1975). "Bounds on Backtrack Algorithms for Listing Cycles, Paths, and Spanning Trees". Networks. 5 (3): 237–252. doi:10.1002/net.1975.5.3.237.
- ^ Hagen, Matthias (2008). Algorithmic and Computational Complexity Issues of MONET. Göttingen: Cuvillier. ISBN 9783736928268.
- ^ Bagan, Guillaume; Durand, Arnaud; Grandjean, Etienne (2007). Duparc, Jacques; Henzinger, Thomas A. (eds.). "On Acyclic Conjunctive Queries and Constant Delay Enumeration". Computer Science Logic. Lecture Notes in Computer Science. 4646. Springer Berlin Heidelberg: 208–222. doi:10.1007/978-3-540-74915-8_18. ISBN 9783540749158.
- ^ Marquis, P.; Darwiche, A. (2002). "A Knowledge Compilation Map". Journal of Artificial Intelligence Research. 17: 229–264. arXiv:1106.1819. doi:10.1613/jair.989. S2CID 9919794.