This lesson will discuss the efficiency aspects of inference in Cyc. Earlier we discussed the logical aspects of inference, the steps we use to perform a logical deduction and how we chain them together. We talked about some of the incompleteness issues that have to do with the large space of potential proofs that we can construct in this fashion and some of the strategies which would not work in our system to approach this; and then we introduced heuristic search as the mechanism that we do use to attempt to efficiently identify some successful proofs in the system. This talk is now going to concentrate on the efficiency aspects of heuristic search.
One of the most interesting techniques that we use for performing inference in our system is to identify the fact that we don’t consider each of these inference mechanisms as one mechanism that can handle the whole space of possibilities. Instead, we consider inference in our system as being one of potentially hundreds or thousands of special cases, each of which has an efficient mechanism to identify and solve that problem. In our system, inference is modular in the sense that we have hundreds of independent modules that we call inference modules, or HL modules, to perform this reasoning -- each of which is specially tuned to handle one particular kind of problem.
So in the example on the slide, we have a node which has to do with proving what types of things Hamlet is an instance of; and out of all of the hundreds of potential reasoning modules, one is specially designed to handle questions of the form (#$isa <TERM> <VARIABLE>). In our system, we have a special-purpose reasoning structure for performing this sort of type-reasoning that can efficiently generate the approximately 25 to 30 bindings for ?WHAT and use that to generate the 25 to 30 child nodes that should come under this node.
By carving up the space of potential proofs into a large number of special cases and having a very flexible system where you can add yet another special case to the system, you can identify the space of interesting problems that you want to solve and design HL modules that are optimized to solve the particular problems you’re interested in. In the tight inner loop of our inference mechanism, which is used to construct these search trees and generate child nodes, it has effectively a very efficient expert system inside it to perform this sort of meta-reasoning of “what are the inference options and modules that I have available to solve a problem on a specifically chosen node?” In this fashion it can quickly identify the irrelevant mechanisms for performing inference and then bring the relevant ones to bear.
In an earlier talk we had described the differences between inference steps which strictly simplify a problem and inference steps which transform the problem into one which is either of the same or greater complexity. We noted the tension between these two: one of them uses facts in the system to potentially simplify a proof hopefully down towards true (which would be a goal node that indicates that you now have a successful proof) and one of them applies general purpose rules to potentially produce a completely new and novel way of trying to prove your problem.
In our system, these can be thought of as the two large classes of mechanisms for performing proofs. Because they are so different, our system deserves to identify the difference between these two and try each of them in an optimal fashion. Inference in our system can be considered bimodal in that we have these two large modes to consider. These two modes we call “removal” and “transformation”. They have to do with the fact that the inference steps which apply rules tend to transform the problem from one problem into a potentially completely different problem by applying the general purpose rule to the proof of the question versus the application of facts which, because it doesn't add anything new, conditionally, to the system, will strictly remove one of the steps that you’re trying to prove, and thus strictly decrease the complexity in the proof.
The transformation steps provide the large fan-out which have to do with stitching together a large number of the common-sense rules that quantify over things in the world. They’re the ones which effectively produce the large space of potential novel, interesting ways to approach and solve a problem; whereas the facts in the system can be thought of as ways that, given this approach, can be used to either solve or not solve by bottoming-out the proof into facts that are known about the world.
You can think of the interplay between these two as being analogous to being at the top of a mountain, like Pike’s Peak in the middle of the Rocky Mountains, and you know that somewhere deep in the valley somewhere in the Rocky Mountains, is a pot of gold and you have to search and find this pot of gold. The application rules are analogous to transforming you from the top of one mountain to the top of another mountain (which could be potentially a great distance, maybe even several states away). So the application rules can bring you quite far afield from the place where you were originally trying to do the proof to somewhere potentially quite far away.
Given that you have identified a promising peak to start at, the application of facts can be analogous to very quickly and efficiently pounding out an exhaustive search of all of the valleys right around that one mountain top. Because Removal strictly simplifies the problem, you know that there is a limit to the amount of work that you have to do -- there are only so many valleys around a single mountain top.
Thus there is a trade-off between introducing more mountains to start looking at and pounding out efficient searches as quickly as possible of the valleys around a particular mountain. In our system, then, we have two large sets of heuristics which have to do with these two qualitatively different approaches to solving the problem: generating new, complex proofs to examine and then efficiently trying to solve those proofs using facts that you know in our system.
Inference in our system is heuristic. Now, the key thing to remember about heuristics in our system is that they affect efficiency, not correctness. All they do is provide an ordering on the possible proofs that you’re examining, hopefully so that you get the proofs that are going to succeed earlier rather than those that are less likely to succeed. If you are asking an exhaustive query, one that searches the entire search space, the ordering doesn't really matter because you’re intending to search all of it. So the benefits of heuristics only really show up when you provide resource constraints that allow the inference to halt early. Hopefully then you’ve hit the good ones by the time you halt early and you’ve ignored most of the less promising proofs.
Let’s explore the difference between the Transformation heuristics and the Removal heuristics. The purpose of the Transformation heuristics is effectively to order the possible proofs based on the rules involved in them, hopefully trying to provide proofs that use more coherent sets of rules -- rules that are working together in a more promising fashion. The purpose of the Removal heuristics is to generate answers as efficiently as possible and -- a corollary to that -- to prune dead ends as early as possible, noticing as early as possible that something is going to be a dead end and heading away from that. This way you can prune off entire sections of the proof space.
Inference is Modular
- Internal expert system does meta-reasoning
- HL modules
Inference is Bimodal
- Removal (facts)
- Transformation (rules)
Inference is Heuristic
- affect efficiency, not correctness