We described earlier how inference can be viewed as a sequence of logical deductions chained together. At each point along the way, you might often have different ways that you could now perform a new deduction. So, in effect, from what you start with there is a branching set of possibilities for how you can prove it. And that branching set can spread out in all kinds of different, novel ways.
For example, you might want to try to prove “Who does Hamlet know and love?” based on information about his family relationships. Or you might want to try to prove it based on knowledge of his patriotism, or his knowledge of his king -- information like this. So, you’re going to find that it’s not just a single path that you’re trying to prove. You’ll end up having this forking, branching set of possibilities. Therefore it’s very natural to view inference in our system as a kind of search problem where it’s much like a search tree.
You start out at the top of the tree, the root, with the branches as steps away going down. You can imagine the top of the tree as being the query that you actually asked. Each step down to child nodes in this tree can be viewed as one potential logical deduction that would take you from trying to prove the original query into trying to prove a different query below using this logical deductive step. The fan out of possibilities can be viewed as this branching tree, getting bushier and bushier and deeper and deeper.
So, for purposes of explaining this process, pretend that you magically have available to you something that will tell you all of the things that you could try to prove in order to answer the query. Each of the things you could try ends up being one of the child steps to a child node. Later we’ll talk about how we deal with actually finding useful paths. Now we’ll talk about how to decide amongst them.
Imagine that each node in this tree represents something that you’re trying to prove. Each link from a parent node higher up to a child node below represents one logical deduction that you did to go from the query above to the query below. So, associated with that is going to have to be the formula you used to go there and the Most General Unifier you used (the recipe that you used to match the formula above with this rule to get this formula below).
Since each dashed line represents one potential path, you can see that a path from where you start down through the tree (shown above with a solid pink line) represents just one unique proof approach or path that you’re attempting. You want this proof to end, or be successful, which would be identified in this metaphor by finding a child node in the tree where you have nothing left to prove. This would be where the thing to prove at that leaf node is ‘true’ -- like the ‘base case’ of this. We refer to this kind of node in the tree as a ‘Goal Node.’ This is where you have found one successful end to this proof -- there’s nothing left to prove here. When you find these, then the path all the way up represents one unique, justified reason for one set of answers for the variables in your query. You can identify that by gathering up all of the formulas you used in each deductive step and looking at the Most General Unifiers -- asking what the variable finally got bound to and passing that information up.
Note that if there are no free variables then the answer you’re trying to prove is “Can I prove it true or not?” In this situation a goal node would be “yes, I was able to prove it true.” Along the path to the goal node, I may have introduced some variables along the way such that the top formula would be true if you could find some individual with some particular property. And then later I might find an individual who actually has that property.
So it could be that you introduced variables along the way and the actual fact of what those individuals were doesn’t show up in the bindings* at the top, but they would show up in the justification. So if there are intermediate steps along the way in your proof, these would show up as intermediate steps of the node.
*Bindings are values that have been assigned to variables in the unification process.
Now the problem is that you have a big tree of possibilities and you often have cases where this proof could just go on forever. Let me now describe some potential situations for proofs that go on forever and how we deal with them.
In any sufficiently interesting or complex logical system, there is going to be an arbitrarily large number of potential proofs that you can make. Some of them are arbitrarily long and you may not know whether you’re ever going to find an end to this proof. This is the essence of two of the main problems for computer science; they’re often referred to as the “halting problem” and “incompleteness of logic.” Gödel proved in the 1930’s that any sufficiently complicated logical system is inherently incomplete. There are going to be statements that you just cannot logically prove. His argument for that is related to this other problem, the halting problem.
The halting problem refers to the fact that for certain algorithms, you cannot look at the algorithm plus its starting data and be able to convince yourself that the algorithm will ever end in an answer. It may run and run and you don’t know whether if you just give it another minute or so it might actually end in an answer or if it will just run and run forever. When we talk about Cyc we’re not talking about a system with only ten axioms and five facts to start with, we’re talking about millions of facts and tens of thousands of rules that can chain together in arbitrarily complicated and interesting ways; so the space of potential proofs is infinite. And you’re dealing with a tree which logically is infinitely large for many interesting queries. Due to this, you will run into some inherent incompleteness issues; for example, you cannot simply say “let’s just manifest this entire tree” or “let’s look at every possible proof and gather up all the answers”because there just aren’t enough seconds in the history of the universe for this.
You will never be able to convince yourself, because of logic and the halting problem, that you’re done yet. Maybe if you work a little harder, you’ll get even more and more. So we immediately ran into the problems of incompleteness in our system. The algorithm I’m about to describe is expressly for the purpose of dealing with the problem of incompleteness in our system.
We run into incompleteness because the search tree that we’re describing here is just too large. So our approach is to only search portions of the tree. There are well-known strategies from Artificial Intelligence for how one addresses search problems like this. One strategy is to search the tree in a “depth-first” fashion.
A depth-first search would start at the top of the tree and go as deeply as possible down some path, expanding nodes as you go, until you find a dead end. A dead end is either a goal (success) or a node where you are unable to produce new children (you don’t have enough information to keep going deeper). An example of an unsuccessful node would be if you came to a node where you could continue if you just knew if Nelson Mandela had a PhD in physics, but this fact is not in the system. So the system can’t prove anything beyond that point. The system is not omniscient.
Let’s walk through a depth-first search and traversal of this tree. You start at the top node and go as deeply as possible:
1) Start at the highest node
2) Go as deeply as possible down one path
3) When you run into a dead-end, back-up to the last node that you turned away from. If there is a path there that you haven’t tried, go down it. In this case, we have to go up several nodes before we arrive at a node with another option. Follow this option until you reach a dead-end or a goal.
4) This path leads to another dead-end, so go back up a node and try the other branch.
5) This path leads to a goal. In other words, this final node is a case in which you have to prove that #$true is true. So you have one answer. Keep searching for other answers by going up a couple more nodes and then down a path you haven’t tried.
6) Another dead-end! So go back up a couple of nodes and try another path.
Continue going as deeply as possible on every path, bounded only by goals and dead-ends.
The advantage of depth-first search is that it is a very algorithmically efficient way to search trees in one format. It limits the amount of space that you have to keep for remembering the things you haven’t looked at yet. All you have to remember is the path back up and, for each of the steps along the path back up, their sibling nodes that you haven’t explored yet. The amount you have to remember is proportional to the average branching factor of the tree times the depth of the tree. So it scales roughly with the depth of the tree, and depth is a rather scalable thing -- not very explosive.
The disadvantage with depth-first search is that once you get started down some path, you will trace it all the way to the bitter end -- and we’ve got some infinite bitter ends in our system. So it can be like Alice in Wonderland going down the rabbit hole -- there’s no end to this rabbit hole.
Let me give you an Alice-in-Wonderland-like proof from our history.
Several years ago we had a system which wanted to demonstrate two Cyc systems collaborating to answer something (we referred to this as “Cyc-ic Friends”). We had two Cyc’s. One had a bunch of political knowledge; one had a bunch of geographic knowledge. If you asked some geo-political question like “Who are all of the elected leaders of countries north of the equator?” neither Cyc alone had enough information to actually answer this question. But when they collaborated with each other, the geographic Cyc would try to answer until it got stuck on a political thing. Then it would say “I know you know some political stuff -- I’ll ask you. If you can answer this question, I’ll take your answer and splice it in and continue. So if they can collaborate, then the two of them together can answer something that neither one of them alone could answer.
In this particular situation, we asked “Who are all of the elected leaders of countries north of the equator?” At the time it came up with some of the obvious answers quickly: Bill Clinton, John Major, etc. And then it sort of ground-up for a while. I interrupted it to see what it was actually attempting to prove at one point.
It was way down the rabbit hole on one proof, trying to prove if #$NelsonMandela was an instance of #$LawnFurniture. “What?!!” It turns out that it had gone really deep down some potential path and had clearly reached something that was going to be logically nonsensical, so this path was doomed to fail. It had lots of ways available to it that it thought it could still try and prove it. So this is an example of the halting problem, where it thought it still had some mechanism available -- if I just work a little bit harder I might be able to actually prove that he is lawn furniture. It turns out that at that point we were missing some constraints about the disjointness between #$BiologicalArtifacts and #$Furniture -- something like that.
This highlights another aspect of why our system is so large. At each step along the way it had made one logical deduction, conditioned on some default-true rule. Each of the steps was based on a rule which locally is usually always true. Notice that Cyc did not produce a nonsensical answer, although it was doing some reasoning that appeared nonsensical.
In this example, the steps were something like the following:
Certain objects are usually only found north of the equator. There are certain objects that are geographic regions of certain types. There are certain geographic objects which are typically outdoor regions. There are certain objects which would be in an outdoor region. There are certain other objects which are found outdoors and are objects. And there are certain objects like lawn furniture and are typically found outdoors….
Each of these steps along the way involve a default-true rule which in isolation is usually almost always true -- there are some exceptions. Well, when you start chaining up a number of these default-true rules together, the longer afield you get, the more likely that you’ll run into some exceptional condition. So, at some point along the way we cross the line where we were attempting to prove that a person is an artifact.
This gets us into another trade-off. At every point along the way in doing these proofs, you’re often faced with the halting problem -- should I just try and do a little more work to try and prove this? Or should I actually try and do a meta-proof by looking at the entire proof and asking if it is nonsensical. Can I prove that I shouldn’t even be bothering to try and do this proof here? You can spend your effort by either trying to do more of the proof or trying to prove if the proof is going to be fruitful at all.
You might not be able to prove, for example, if Nelson Mandela is a piece of lawn furniture -- maybe you’re missing some constraint. Maybe he is. Maybe certain people are robots and certain robots are robotic lawn furniture, right? You’re missing some key constraint. You might not be able to prove that this whole line of reasoning is nonsensical. So you’re facing this trade-off of you only have so many resources available. I can spend some of it cranking out more of this tree, or I can spend some of it trying to prune off large sections of the tree -- saying don’t bother, this is never going to come up with an answer.
This is analogous to doing a geometry proof. You might be forced to try to do a constructive proof for a while. Then you run into problems and say, “Wait a minute.” Now I’m going to use a different mode -- something like proof by contradiction. I’ll assume that the thing I’m trying to prove is not true and see if I can run into some cheap contradiction. If I do, then my original assumption must not be true. Therefore, that’s a proof.
So it’s a lot like trying to find a switch in modes. Can I prove this is true? Or should I switch into a different mode where I try to prove that this whole thing could not possibly be true? Save yourself a bunch of future work by pruning this off here. Because we’re dealing with actual computer hardware, a finite machine, you have to decide how you’re going to spend your resources. It’s a difficult trade-off between trying to do more and trying to prove that you shouldn't be trying to do more. Work harder or work smarter? You could spend all day analyzing your job, too, and never do your job. You can’t do all of one and not any of the other.
The previous examples demonstrate that in our system, for virtually all interesting proofs, you will have rabbit holes. You really will have a way to generate infinitely many and infinitely deep proofs. So you can’t just use blind depth-first search. We have these Alice-in-Wonderland holes for almost everything.
Another strategy for searching is a breadth-first search. This is where you search layer by layer. First you try to do all of the zero-step proofs, then you try to do all of the one-step proofs, then all of the two-step proofs, etc. So the advantage of breadth-first search is that you’re guaranteed to get this Ockham’s Razor benefit where you’ll get all of the simplest proofs before you get anything that’s strictly more complicated. If there is an n-step proof, you’ll look at it before you even look at any n+1 -step proofs. You get the simplest ones early.
The disadvantage of breadth-first search is that just as we’ve got these huge deep trees, we also have huge bushy trees. We have steps like the one on the right of the slide where we could have thousands, tens of thousands, of child nodes. For example, you might be doing a proof and you come to a point where if you could just find a person, some random person, then you might be able to do one more step of inference. At this point I could use any person to answer this. But let’s say that you have available to you every current living person represented. You could potentially have a six billion fan-out factor right there. There are even worse situations. For example, if all you need is an integer -- there is an infinite number of integers. So there are cases where you can’t even generate all of the child nodes because they are infinite.
Another disadvantage of breadth-first searching is the amount of space you have to use to store “the stuff I haven’t looked at yet” is truly explosive. By the time I ever look at the last node on the second layer of the tree on this slide, I may have to have stored the entire third layer because I will have generated it from all of the others in the second layer -- all of their children. So, if the third layer is explosively large, I would have to store all of them before I could even look at them. The amount of space you use to store is proportional to the average branching factor to the power of the depth of the tree. So that grows exponentially and explosively -- especially when the average branching factor is huge.
With a breadth-first search, the deeper you go into the tree, you have enormous space requirements for storing that which you haven’t looked at yet. There just isn’t enough memory in a computer to store an infinitely large third step -- the space of possible solutions is far too large.
So we run into the problem where two of the traditional algorithms for search, depth-first and breadth-first, are going to run into problems with a large system like ours. If you notice the ellipses on the left and right of the chart, you’ll see that if you used either algorithm, you’d never get to the goal-node in the middle. In the depth-first case, you’d get stuck down the rabbit hole. In the breadth-first case, you’d get lost generating endless nodes on the first step.
In our system, because we can’t generate all of the possible nodes -- the incompleteness of it -- we have to address searching in a different way. The search strategy that we use attempts to identify the most promising path down below and expand each node in order of which ones look most promising at the time. If we have a good heuristic that recognizes that the node on the upper right is going to generate an enormous number of children or the node on the lower left looks like it’s going to lead down to a rabbit hole -- if we have a good estimate for which looks most promising, we ought to be able to hone in on that goal node before we look at other ones that are not as promising. This is called a heuristic search. You search the tree in order of heuristic quality, thereby pushing off the parts that might go on infinitely. The devil is in the details here. In this case, the details are: you better have a good heuristic; at each point you’d better be able to look at this node and know if it looks like it’s headed towards success. So when we go down the solid purple path, each node should get a higher and higher heuristic estimate like: looks good, looks better, looks really good, and eventually hit the goal, where the heuristic should be like “obviously these are the best nodes.” So, we have explosive trees and we work on them in the order which looks most promising.
- The Search Tree
- Incompleteness of Logic
The Halting Problem
- Infinite Possibilities
- Work harder or work smarter?
- Infinite fan-out
- Infinite space required to store possible solutions
This concludes the section on the Incompleteness of Searching.