### Canonicalization, resolution, and inference search in CycL

The core CycL inference engine algorithm is based on treating the inference problem as a search through proof-space for a proof which will provide satisfying bindings to a particular query.

Each step in the search represents a single inference step -- a single supporting formula in the eventual proof. The root nodes of the search indicate the starting points, or initial queries, which are to be proved. A particular goal leaf node in the search occurs when we have nothing left to prove down that path from root to leaf. At this point, the path upward from the goal node to its root node constitutes a proof that a particular set of formulas together prove that a particular set of terms satisfy the requested bindings of the query.

Each node in the search stores "the query so far" as its state, along with the formula used to infer this state from its parent's state. The mechanisms used to transform the parent state to the child node state are modus ponens and modus tollens via the resolution principle.

This document describes in more detail how the resolution principle is used in inference in CycL and how this is related to the canonicalization of assertions and queries.

### Conjunctive Normal Form Assertions

Rules and gafs in the system get canonicalized into conjunctive normal form (CNF) clauses.

Example 1:

(implies (and (p ?x) (q ?x) (not (r ?x))) (s ?x))

is canonicalized into CNF as follows:

(and (or (not (p ?x)) (not (q ?x)) (r ?x) (s ?x)))

Each disjunct in the outer conjunct is called a 'clause'. Note that in this case, there is a single clause. Since the set of all the assertions in the entire knowledge base are considered implicitly conjoined, we typically treat the resulting CNF as a set of clauses to add to this already very large knowledge base. Moreover, since each clause is a single disjunct of a set of literals, we can very tersely represent a clause as a list of two items: a set of negated literals and a set of the non-negated (positive) literals. Finally, since variable names are arbitrary, we canonically rename them to simplify the detection of identical clauses. Thus, the final terse representation of the single CNF clause given above is :

(((p ?var0)(q ?var0)) ((r ?var0)(s ?var0)))

Example 2:

(and (p a)(q a)(not (r a))(t a a))

is canonicalized into the four CNF clauses:

(() ((p a))) (() ((q a))) (((r a)) ()) (() ((t a a)))

Here's a BNF-style grammar for CNF clauses in CycL:

<CNF-clause> ::= ( <neg-lits> <pos-lits> ) <neg-lits> ::= list of <atomic-formula> <pos-lits> ::= list of <atomic-formula> <atomic-formula> ::= ( <predicate> . <args> ) <args> ::= list of <term>

At this point, <predicate> and <term> are defined as in the "Syntax of CycL" documentation page.

### Disjunctive Normal Form Queries

Queries to the system get canonicalized into disjunctive normal form (DNF), which is the logical dual of conjunctive normal form.

Example 1:

(and (s ?z) (t ?z ?z))

is canonicalized into DNF as follows:

(or (and (s ?z) (t ?z ?z)))

By analogy with the CNF case, each conjunct in the outer disjunct is also called a 'clause'. Note that in this case, there is a single clause. We typically treat queries as a set of DNF clauses to try to satisfy. Moreover, since each clause is a single conjunct of a set of literals, we can again very tersely represent a clause as a list of two items: a set of negated literals and a set of the non-negated (positive) literals. Finally, variables are canonically renamed as in the CNF case. Thus, the final terse representation of the single DNF clause given above is :

(() ((s ?var0)(t ?var0 ?var0)))

Example 2:

(or (not (r ?foo)) (s ?foo))

is canonicalized into the two DNF clauses:

(((r ?var0)) ()) (() ((s ?var0)))

Here's a BNF-style grammar for DNF clauses in CycL:

<DNF-clause> ::= ( <neg-lits> <pos-lits> ) <neg-lits> ::= list of <atomic-formula> <pos-lits> ::= list of <atomic-formula> <atomic-formula> ::= ( <predicate> . <args> ) <args> ::= list of <term>

Again, <predicate> and <term> are defined as in the "Syntax of CycL" documentation page.

### Resolution Inference in CycL

For now, let's consider queries which result in a single DNF clause. When a user asks a query, CycL performs a heuristic search through proof-space for a sequence of inference steps which will resolve the DNF query into "Box", which is the null DNF clause:

(()())

The root of the search is the DNF resulting from the user's query. Each step in the proof search maintains the DNF which is the current proof state of the search down that path. The DNF in each of these intermediate states in the search path results from resolving a single literal from its parent's DNF clause with a single literal from the CNF clause which was used to transform its parent state into this state.

The CNF clause used in one step of inference, from a parent node to one of its children in the search, comes from one of two places:

(a) the CNF clause of an assertion in the KB

(b) the equivalent CNF clause provided by an HL inference module.

There are two possible cases:

(1) Resolving a pos-lit in the DNF

For simplicity, A-F and X-Z represent ground literals here:

parent DNF: ((A B C) (D E F)) resolvent literal: E inference CNF: ((X Y)(E Z))

== Resolves into ==>

child DNF: ((A B C Z) (D X Y F))

(2) Resolving a neg-lit in the DNF

parent DNF: ((A B C) (D E F)) resolvent literal: B inference CNF: ((X B)(Y Z))

== Resolves into ==>

child DNF: ((A Y Z C) (D E F X))

In both cases, the actual resolution step proceeds as follows:

(1) Assuming that the parent DNF and inference CNF clauses are variable disjoint, identify a most general unifier of the resolvent literal in the parent DNF and the resolvent literal in the inference CNF.

(2) Remove the resolvent literals from both the parent DNF and the inference CNF to produce parent-DNF' and inference-CNF'.

(3) Apply the most general unifier to the parent-DNF' and inference-CNF' to produce parent-DNF'' and inference-CNF''. (Any duplicate resulting literals are removed.)

(4) Produce the child-DNF as follows:

child-DNF-neg-lits = neg-lits(parent-DNF'') Union pos-lits(inference-CNF'') child-DNF-pos-lits = pos-lits(parent-DNF'') Union neg-lits(inference-CNF'') child-DNF = make-DNF(child-DNF-neg-lits, child-DNF-pos-lits)

This algorithm is typically referred to as the "resolution principle".

### One complete inference

Here's a complete inference involving the above examples:

Query:

(and (s ?z) (t ?z ?z))

is canonicalized into the single DNF clause

(() ((s ?var0)(t ?var0 ?var0)))

This is the root node (level 0) of our search.<h4

parent (level 0) DNF: (() ((s ?var0)(t ?var0 ?var0))) resolvent literal: (t ?var0 ?var0) inference CNF: (() ((t a a))) Most general unifier: ((?var0 . a)) child (level 1) DNF: (() ((s a)))

#### Inference Step 2:

parent (level 1) DNF: (() ((s a))) resolvent literal: (s a) inference CNF: (((p ?var0)(q ?var0)) ((r ?var0)(s ?var0))) Most general unifier: ((?var0 . a)) child (level 2) DNF: (((r a)) ((p a)(q a)))

#### Inference Step 3:

parent (level 2) DNF: (((r a)) ((p a)(q a))) resolvent literal: (p a) inference CNF: (() ((p a))) Most general unifier: NO-OP child (level 3) DNF: (((r a)) ((q a)))

#### Inference Step 4:

parent (level 3) DNF: (((r a)) ((q a))) resolvent literal: (r a) inference CNF: (((r a)) ()) Most general unifier: NO-OP child (level 4) DNF: (() ((q a)))

#### Inference Step 5:

parent (level 4) DNF: (() ((q a))) resolvent literal: (q a) inference CNF: (() ((q a))) Most general unifier: NO-OP child (level 5) DNF: (() ()) = Box!

The level 5 child node has bottomed-out in a successful proof. The top-level binding set of ((?var0 . a)) can be computed as a trivial composition of the most general unifiers along the path to the successful goal node. The 5 inference CNFs used in the path enable a 5-step proof of

(and (s a) (t a a))

In general, each choice of resolvent literal and inference CNF was merely one of many possible options for transforming a search state from level N to level N+1. This is where heuristic search kicks in. CycL keeps track of all the non-goal (non-box) leaf nodes which it has yet to expand, and heuristically determines which node looks most promising to expand next, and which inference method looks like the most promising way to expand that node.

Search proceeds until either:

(a) a resource contraint is met, like

-time cutoff has been met

-number of successful answers cutoff has been met

(b) there are no more non-goal leaf nodes to expand.

Two other effects of resource constraints involve determining which non-goal leaf nodes to not expand:

(a) leaf nodes deeper than the depth cutoff are not further expanded.

(b) leaf nodes which have more than the backchain depth number of non-gaf inference CNFs along their inference path are not further expanded.

### Generalization to multi-clause DNF queries

Let's now consider multi-clause DNF queries such as:

(or (not (r ?bar)) (s ?bar))

This is canonicalized into two DNF clauses:

(((r ?var0)) ()) (() ((s ?var0)))

There are two ways to pursue a search for bindings for such disjuncts:

(1) Independent Disjunct Satisfaction

Bindings for any single one of the DNF clauses will of course satisfy the entire query. Therefore, each clause can in effect be pursued as independent queries. CycL currently supports this kind of simple, independent disjunctive search.

(2) Dependent Disjunct Satisfaction (proof by cases)

The other possible search strategy for disjunctive DNFs is to perform a proof-by cases approach. This is needed when no particular disjunct can be independently proven, but the disjunction as a whole can be proven from the knowledge base.

First note that a set of DNF clauses D1 ... Dn can be rewritten as:

(implies (and (not D1) ... (not Dn-1)) Dn)

.. where each DNF clause D is a conjunction of m literals, (and LIT1 ... LITm).

We are asking for proofs of bindings that will satisfy this implication.

CycL does not currently have support for this kind of query implemented in FI-ASK. However, CycL does already provide support for this kind of query using FI-PROVE. Ultimately, FI-ASK will be extended to support this kind of query by dispatching to FI-PROVE to handle this kind of proof-by-cases reasoning.