Three important strengths of Cyc inferencing are:
- The inferencing code is modular and stable.
- The state of the search is maintained, so that a search which suspends due to resouce constraints can resume where it left off.
- Search is completely parameterized, so that various kinds of search may be performed.
The first and second items should be self-explanatory, but the third demands elaboration. Cyc-10 does heuristic search by default, but depth-first search is also implemented (and is used for forward inference, since forward inference requires traversal of the whole search tree anyway). Certain applications also take advantage of parameterized search, doing mostly heuristic search but substituting in a special method for expanding nodes, or identifying goal nodes, etc. Below we discuss the heuristics Cyc-10 search uses and then we cover the various search parameters.
Cyc-10 uses a number of heuristic rules to decide which leaf node to expand next. Some of these are purely syntactic heuristics:
- Favor nodes with fewer literals to satisfy (as compared with all other unexpanded nodes). This heuristic helps to steer the search toward branches that are likely to bottom out soon.
- Favor nodes that have fewer free variables (as compared with all other unexpanded nodes). Like the preceding heuristic, this heuristic helps to steer the search toward branches that are likely to bottom out soon.
- Very weakly favor nodes that are higher up in the search tree . This heuristic helps to avoid going too far down into the search without ever following up on other branches.
- Strongly favor nodes with no free variables left at all . If there are no free variables left, there's a good chance we'll be able to determine the truth value of the formula just by doing a KB lookup.
- Weakly disfavor nodes which include negative literals (that is, anything of the form (not P)). Because the KB consists primarily of positive assertions, it is easier to find bindings for positive literals than for negative literals.
Other heuristics are semantic:
- Disfavor nodes which might be part of a unification cycle . Basically, try to avoid going in circles.
- Disfavor nodes which are less likely to be satisfied by KB lookup . This is done by adding a measure of disfavor to a node for each of its goal literals that mention a predicate and a constant, but the constant has no index for the predicate in the KB.
These heuristic rules act in concert, according to a linear summation rule. One way to think of this is as a chorus of agents, each of which corresponds to one heuristic and who are each looking to see if their heuristic applies. As each candidate node is presented to the chorus, the agents shriek more or less loudly, according to how strongly they disfavor the node. The sum of the volumes of the shrieks is compared with the sums of other nodes, and the open node with the lowest sum is expanded next.
In many cases, the formula at a node will contain more than one literal. In the example developed above, for instance, we saw a node whose formula to satisfy was
(and (isa ?x Agent) (owns ?x ?y))
If we choose to expand this node, which of these two literals should we expand first? Should we look for elements of Agent, or should we look for owns pairs?
The solution used in Cyc-10 is to require that each HL module which applies to a literal must be able to provide a heuristic estimate for how expensive it would be to apply that module to the literal. These heuristics are linearly summed, and the literal which has the lowest heuristic cost is favored most strongly.
The state of an inference search is provided via a generic search structure whose slots maintain the methods used to implement the specific search method. These slots are filled by functions with the given arguments and results:
too-deep-func : node depth-cut -> [boolean]
The function in this slot is used to test if the search node is beyond the depth cutoff.
add-node-func : node leaves -> leaves
This slot contains the function used to add a new search node to the leaves of the search.
no-leaves-p-func : leaves -> [boolean]
The function in this slot checks to see if the current search has no leaves (in which case the search has been exhaustively completed).
next-node-func : leaves -> node, leaves
This slot contains the function which chooses, from among the search leaves, the next search node to expand.
goal-p-func : node -> [boolean]
This function in this slot is used to determine if the search node satisfies the search goal.
add-goal-func : node goals -> goals
If the search node is a goal node, the function in this slot adds it to the list of goals.
options-func : node -> [list of options]
This slot contains the function used to determine which options to use in expanding the search node.
expand-func : node option -> [list of nodes]
This is the function used to expand a search node into a list of new 'child' search nodes to add to the search.
Depth-first and heuristic search are done by the same basic function. This is possible because when the search is initialized, the slots above get filled with the correct functions to implement the kind of search desired.
Under Cyc-10, each search node is a structure with the following slots:
- search : the search this node is part of
- parent : this node's parent node
- children : A list of pointers to this node's children
- depth : the depth of this node in the search tree
- options : current ways left to expand the node
- state : a datastructure that contains the semantics of the node
The search node semantic state includes the following information:
- formula : the CycL formula to satisfy, which represents the intermediate state of the inference search at this node.
- inference supports : the assertions and HL modules used to transform our parent's formula into our formula. The inference supports of this node and all its ancestors together constitute the complete inference path down to this search node.
- variable bindings : a mapping between variables in the parent node and what they are bound to in this node.