Inference rule
From Dyna
ricboelttroc cadarzelal c4toloolor An inference rule specifies how to compute the value of an item (specifically, a theorem) from the values of other items.
Contents |
A simple case
A simple inference rule is
area = length*width.
The item on the left-hand side (area) is known as the consequent, while the expression on the right-hand side (length*width) is known as the antecedent.
Variables in inference rules
Typically, an inference rule contains variables. In this case it represents not just a single equation, but an equational schema that describes how to compute the values of many different items. For example,
area(R) = length(R)*width(R).
defines how to compute all items of the form area(R). It implies infinitely
many statements such as
area(this_rectangle) = length(this_rectangle)*width(this_rectangle).
area(that_rectangle) = length(that_rectangle)*width(that_rectangle).
area(room("kitchen",myhouse)) = length(room("kitchen",myhouse))
* width(room("kitchen",myhouse)).
Thus, the single antecedent of area(this_rectangle) is the expression
length(this_rectangle)*width(this_rectangle).
Accumulation in inference rules
Most inference rules involve accumulation of many values. For example, the += accumulation operator in
dot_product += vector1(I) * vector2(I).
defines dot_product to be a sum,
The above rule can be generalized to define the dot product of any two vectors:
dot_product(Vector1,Vector2) += element(Vector1,I) * element(Vector2,I).
implies infinitely many statements such as the following, which defines
the dot product of vectors named vect(22) and vect(17), if any:
In both cases, the summation is over antecedents that vary in their vector index I. This is because I is not already specified by the consequent item (dot_product or dot_product(Vector1,Vector2)).
The latter inference rule can be read as follows: "For all terms Vector1, Vector2, and I, compute element(Vector1,I)*element(Vector2,I) and add it to dot_product(Vector1,Vector2) using +=."
Other accumulation operators
Other accumulation operators besides += can be used. For example, whereas += sums over numeric-valued antecedents, max= maximizes over them.
Suppose the value of edge_weight(U,V) is the weight of the edge from vertex U to vertex V in a weighted graph. The following use of max= finds the maximum weight of any edge in the graph,
maxweight max= edge_weight(U,V). % or edge_weight(_,_); see variable
whereas the following finds the maximum weights of any edge leaving vertex U or entering vertex V:
maxout(U) max= edge_weight(U,V). maxin(V) max= edge_weight(U,V).
In all cases, the maximization is over variables that appear only in the antecedent expression, just as for summation.
NOTE - Because of certain issues with the implementation, there is currently no min= accumulation operator. To do a minimization over positive numbers, please negate all the numbers and maximize.
Similarly, |= and &= respectively take the disjunction and conjunction of boolean-valued antecedents. Suppose we are given an unweighted
graph, where edge(U,V) is true if and only if there is an edge from vertex U to vertex V. The following derives the transitive closure of the graph:
path(U,V) |= edge(U,V). path(U,V) |= path(U,W) & edge(W,V).
In this case, a particular theorem such as path(a,b) will have value true if either edge(a,b) has value true,
or there is some W such that path(a,W) and edge(W,b) are both true. All the antecedents from both rules are combined disjunctively because the accumulation operator is |=.
(If &= were used instead of |=, then path(a,b) would be true if edge(a,b) had value true, and for all W, path(a,W) and path(W,b) were both true
So |= and &= respectively accomplish existential and universal quantification over W.)
Notice that Dyna uses | and & as the boolean operators. This avoids the connotations of || or &&, which in C guarantee an order-dependent short-circuit evaluation. In Dyna, the evaluation order is up to the compiler.
NOTE - There is currently no &= accumulation operator. There is also no | operator.
Formal discussion
Form of inference rules
The general form of an inference rule is
consequent accumulation_operator antecedent
where the consequent is an item (known as a "theorem" by virtue of appearing in this context) that may contain variables as subterms, and the antecedent is an expression that may contain variables as subterms. (See pattern.)
A restriction on variables
In the present version of Dyna, variables in the consequent must also appear in the antecedent. (The same restriction appears in Datalog, the logical query language for relational databases.)
For example, the following rules are not allowed:
foo(X,Y) += bar(X). word(epsilon,I,I) = true. rewrite(X,X/Y,Y) = true. append([X|List1],List2,[X|List]) |= append(List1, List2, List).
Why not? Because the present version only supports forward-chaining, and only
on variable-free terms (ground terms). If it finds a value for bar(a),
it is unable to update all items of the form foo(a,Y).
Fortunately, such rules can often be eliminated by applying an "unfolding" transformation to the program. But the transformation fails to halt if the rules can produce items with arbitrarily deep variables, as in Prolog's difference lists.
A restriction on accumulation operators
If path is an item type, then all rules with a consequent of the form path(...) must use the same accumulation operator.
Hence, the choice of accumulation operator is really a property of the path type, even though it is written syntactically as if it is part of the inference rules. For purposes of discussion, let us call this operator op=.
The meaning of inference rules
To define the value of a particular item γ (such as path(a,b)), we use the op operation to
aggregate the values of all expressions α (such as path(a,c)*edge(c,b)) such that γ op= α matches some inference rule.
In practice (because Dyna finds the least fixpoint of a set of equations), only a finite number of these expressions α will have non-default values
. The value of γ is defined to be
, where d is the default value for items of γ's type, as specified by an :- item(..., d) declaration.
Default values are significant
NOTE - nothing in this section is implemented.
This "in practice" definition ignores the infinitely many α expressions that have the default value. Thus, it works only if defaults are specified appropriately (roughly speaking, so that the inference rule equations are consistent when all items have default values).
For example, foo(X) max= bar(X,Y)+1 maximizes for each foo(X) over infinitely many bar items that have the default value (as well as 0 or more bar items for which a non-default value has been derived). For consistency, the default value of foo(X) should be the result of maximizing infinitely many bar(X,Y)+1 expressions with the default value:
that is, it should be 1 plus the default value of bar. If max= is replaced by += here, then the default value of foo(X) should be the result of summing over those expressions; hence there is no consistent (finite) default value for foo unless the default value for bar is -1.
A less artificial example is given by the program for arc-consistency in constraint programming:
:- item(in_domain, bool, false). % axiom - user may assert some true values :- item(consistent, bool, true). % axiom - user may assert some false values :- item(variable, bool, false). % what are the variables of the system? :- item(possible, bool, false). % what values are arc-consistent for each variable? :- item(support, bool, false). % why are those values arc-consistent?
variable(Var) |= in_domain(Var:Val). possible(Var:Val) &= in_domain(Var:Val). possible(Var:Val) &= support(Var:Val, Var2) whenever variable(Var2). support(Var:Val, Var2) |= consistent(Var:Val, Var2:Val2) & possible(Var2:Val2).
Note that all the equations in the above program are consistent when all items have default values. In this case, possible defaults to false even
though it is accumulated with &=, whose identity element is true.
This is because one of the conjuncts is in_domain, which
is initially false. If the user asserts that conjunct to be true, then possible may become true as well (at least temporarily, until some consistent axioms are asserted false).
- This example highights an error in the previous section: it shows that defaults are not used as accumulator identities. If they were, the declaration that
possibledefaults tofalsewould mean that the value of apossibleitem would be fixed atfalsebecause it would be defined as an accumulationfalse & ... & .... No, it is clear that identity elements are associated with the operator&. What defaults are used for is to specify values of unasserted axioms (this contributes to the semantics) and underived theorems (the compiler should be able to figure this out for itself, but can't because it's uncomputable, so we ask the user to do it and merely check). Hardcoded axioms likefoo=3are treated as assertions that override the defaults, as are axiom-like (antecedentless) rules such as bar(a)+=4).
- The compiler will eventually check that the defaults are consistent and issue a warning if not. This may get a little tricky to formulate once we allow lines like
foo(0):=5andconstit(epsilon,I,I)=1.
- The final line in the above program means that if
possible(Var2:Val2)becomestrue, then infinitely many derived itemssupport(_:_,Var2)must be updated totrue, much as if there were variables in the consequent that didn't appear in the antecedent. This is because it is possible to get a non-default antecedent expression when just one of its subexpressions has non-default value. Things are much easier if we rephrase the program so that consistent starts out false and must be asserted true, as in the standard presentation of arc consistency.
Special accumulation operators
NOTE - nothing in this section is implemented
In general, an accumulation operator should be an associative, commutative binary operator. In other words, it can be regarded as acting on a set of antecedent values.
It is possible to ignore this condition: then the order of accumulation will affect the result. At present Dyna may not formally guarantee any particular accumulation order, so you'd better know what you're doing if you use non-commutative accumulation operators.
The = operator
One unusual accumulation operator is =, which was used at the start of this page. It only works on sets of size 1. If asked to aggregate two or more values, it will throw a runtime exception.
The = operator is useful for rules where no accumulation is possible. For example,
area = length*width. output(Node) = sigmoid_function(input(Node)). % for a neural network
In these cases, all the antecedent variables also appear in the consequent,
so a given consequent item will have at most one antecedent (provided that
there are no other rules that contribute to defining area and output). There is no need to pick an accumulation operator, and
it would be confusing to arbitrarily select one like +=. That is
why Dyna provides =.
The = operator is also useful for rules where no accumulation is expected
in practice. For example, in a deterministic finite-state automaton (DFA),
path(State,[]) = 1. path(State,[Symbol|Rest]) = transition(State,Newstate,Symbol)*path(State,Rest).
implies that path(q,[a,b,c]) is the weight of the path from state q that reads the string [a,b,c]. The above rules using = are appropriate because in a DFA, this path is unique if it exists. If the rules are applied to a non-deterministic automaton, they will trigger a runtime
exception when they attempt to aggregate multiple paths. In that case, one would have to replace = with an ordinary accumulation operator such as += or max=.
There may only be one derived antecedent per consequent, at any time (explain). However, it is fine for this antecedent's value to change as often as it likes.
Where = is the accumulation operator, the default value of the consequent item should equal the default value of the antecedent expression.
- This is clearly the right course when the consequent binds all variables in the antecedent. It's probably also best when there are free variables in the antecedent, although an alternative in this case is to allow the user to choose any default value, or to use a special "undefined" value as the default.
In some cases, the compiler can prove at compile time that no accumulation is possible, in which case it can avoid the runtime check. This is the case if all the antecedent variables appear in the consequent, or are known through declarations to be determined by the variables in the consequent. (In particular, Dyna may eventually allow Mercury-style determinism declarations, which can be used to state and enforce the DFA requirement: for each State, Symbol pair there is at most one Newstate such that transition(State,Newstate,Symbol) has non-default value.)
Warning: The = accumulation operator is distinct from the term-unification operator = that may appear in expressions.
The := operator
A useful non-associative accumulation operator is assignment, :=.
Recall that foo(X) += bar(X,Y) says to add values of bar(X,Y) to a running total foo(X), and foo(X) max= bar(X,Y) says to maximize values of bar(X,Y) into a running maximum foo(X). In much the same way,
foo(X) := bar(X,Y).
says to use values of bar(X,Y) to replace a running value foo(X). In other words, foo(X) holds the value that was
most recently accumulated into it. This corresponds to a binary operator op
such that a op b is defined to be b.
The usual warnings about non-associative accumulation operators apply (see above).
Nonetheless, we suspect that := can be useful.
- A subtlety: if
foo(a)accumulates values 3,8,5 (giving it value 5), and the antecedent with value 8 later changes its value to 9, doesfoo(a)become 9 or does it stay 5? We need to study use cases.
- There should perhaps be a complementary operator that uses the value of the first rather than the last antecedent found. This is useful for algorithms like union-find and Prim's algorithm. However, it can be handled anyway with
foo(X) = bar(X,Y) if !derived(foo(X)).
Importing accumulation operators from C++
Dyna will eventually have the ability to import new accumulation operators from C++ for
a given type, using some kind of :- accumulator(...) declaration.
If foo is a binary function or operator that has already been imported, then we will perhaps create a default definition for foo= as well. But it may be more efficient to implement foo= directly.
- The direct implementation has the following advantages. When carrying out
a foo= bto updatea, it is faster to updateain place than to computea foo bin a temporary and then copy it intoa. Dyna often also needs to know whetherahas changed as a result of this update, and thefoo=implementation can provide information about that. Finally, the implementor ought to provide some other information to the Dyna compiler: e.g., whetherfoodistributes over other operators, and the result (if any) of x + y + y + y + ... (given x and y), which is needed for a compile-time check on the consistency of default values declared by:- item(...).
Conditionals
It is often necessary to write conditional expressions in inference rules.
Ordinary conditional expressions
The simplest conditional rules use C-style conditional expressions, of the form boolean-expression ? then-expression : else-expression.
The following rule sums up the absolute values of the weights of all edges from vertex U:
total_abs_out(U) += edge(U,V) > 0 ? edge(U,V) : -edge(U,V).
Semi-expressions using if
Often, however, there is no "else" clause. In that case you can write a
"semi-expression" for the antecedent. The following rule sums up the weights of
all positively-weighted edges from vertex U:
total_pos_out(U) += edge(U,V) if edge(U,V) > 0.
And this rule sums up the weights of all edges from vertex U to a blue vertex:
total_out_to_blue(U) += edge(U,V) if blue(V).
This defines total_out_to_blue(U) as the summation
. In other words, the if clause restricts the set of terms V to sum over. Note that the keyword if is always followed by a boolean-valued expression.
Another nice example:
factorial(N) *= I if between(1,I,N). % where between(...) is a foreign axiom
Semi-expressions differ from conditional expressions
The above total_out_to_blue rule is very similar to
total_out_to_blue(U) += blue(V) ? edge(U,V) : 0.
since 0 is the identity element for +=. However, it would work even if += did not have an identity element. There is also a subtle difference.
Consider the following set of asserted axioms:
edge(a,b) = 3. edge(a,c) = 4. edge(b,c) = 5. blue(b) = true. blue(c) = false.
Under the ?: rule, total_out_to_blue(a) would have
two non-default antecedents
blue(b) ? edge(a,b) : 0 (with value 3) blue(c) ? edge(a,c) : 0 (with value 0)
whereas under the if rule, it would have the single non-default antecedent
edge(a,b) if blue(b) (with value 3)
Thus, although total_out_to_blue(a) has value 3 in both cases, the C++ antecedents() method (see chart object) will return fewer antecedents if the if rule is used.
The difference also affects which items are considered to have been derived (give reference). In the if case, total_out_to_blue(b) is not regarded as derived, because it has no antecedents.
Semi-expressions don't always have values
Not only is edge(a,c) if blue(c) not an antecedent of foo(a), but it has no value.
In general, an if-expression has no value at all if its test evaluates to false. Thus, it is not properly an expression at all, but rather a "semi-expression"
that may or may not have a value.
For simplicity, however, the C++ interface still regards these semi-expressions as expressions. So the antecedents of the double item total_out_to_blue(a), such as edge(a,b) if blue(b), are C++ terms of type expr_double. It is possible to ask the chart for their double values in the usual way (see chart object). Antecedents are in fact guaranteed to have values. An arbitrary semi-expression such as edge(a,c) if blue(c) also has type expr_double, of course, but might have
no value in the chart (not even a default value), in which case asking for its value will throw a runtime exception.
Quantification in semi-expressions (whenever, if ever)
The following case can lead to confusion:
foo(X) += bar(X) if edge(X,Y) != 0.
% some axioms edge(a,b) = 3. edge(a,c) = 4. bar(a) = 5.
Given the axioms above, what is the value of foo(a)?
The best answer is probably 10, since += always sums over
variables that appear in the antecedent pattern but not the consequent pattern,
such as Y. The antecedents of foo(a) are
bar(a) if edge(a,b) != 0 (with value 5) bar(a) if edge(a,c) != 0 (also with value 5)
(If you're still not convinced, notice that foo(a) would certainly be 10 if the rule had been written as foo(X) += edge(X,Y) != 0 ? bar(X) : 0.)
However, people often seem to think that foo(a) should
be 5 in this case. To avoid confusion, Dyna requires you to word the
rule as follows:
foo(X) += bar(X) whenever edge(X,Y) != 0.
This can be read as "increment foo(X) by bar(X) whenever you discover
an edge edge(X,Y) with nonzero value." Thus, foo(a) is 10.
The whenever keyword must be used instead of if whenever there are variables like Y that appear only in the conditional test. It emphasizes the universal quantification over these variables. Here's another example that puts universal quantification to good use:
out_degree(U) += 1 whenever edge(U,V) != 0. % counts the number of nonzero edges leaving U
But what if you really want existential rather than universal quantification? For instance, what if you really wanted foo(a) to be 5 in the previous example? This case
comes up often in practice. In future, Dyna may allow you to write it conveniently as
foo(X) += bar(X) if ever edge(X,Y) != 0. % existential quantification over Y
However, this construction is not strictly necessary. It is just syntactic sugar for
:- item(temp, bool, false). foo(X) += bar(X) if temp(X). temp(X) |= edge(X,Y) != 0.
- Querying the antecedents of
foo(a)may expose this syntactic sugar, as it's not clear what the antecedents should be otherwise.
- We considered writing
for eachandif some, instead ofwheneverandif ever. Please let us know if you think that would be nicer.
- In addition to
if ever(which implicitly introducestemp(X) |= ...), we could addif always(which implicitly introducestemp(X) &= ...). Likeif ever, and unlikewhenever, this only adds the consequent once.
- While we're adding new operators, we could also have ones for convenience that negate the condition:
ifvs.unless,if alwaysvs.if never(or equivalentlyunless ever),if ever(or equivalentlyunless never) vs.unless always.
- While we're adding new operators, we could also have ones for convenience that negate the condition:
In cases like the above, is also often useful to use the ? operator (although be aware
that it can carry a little extra overhead):
out_degree(U) += 1 whenever ?edge(U,V). % counts the number of edges leaving U, even of value 0
