April 23, 2015

# The Internal Representation of Assertions

In previous versions of Cyc, formulas are stored and reasoned with in the same form in which they appear in the KB browser, e.g.

(implies       (and          (isa ?afp AdultFemalePerson)          (residesInRegion ?afp Guam))       (and          (acquaintedWith Zippy ?afp)          (likesAsFriend Zippy ?afp)))

In Cyc-10, formulas asserted to the KB are stored internally, and reasoned with, in conjunctive normal form (CNF). When converted to CNF, a formula gets rewritten as a conjunction of disjunctions of negated and non-negated literals. So, for example, the formula above would be written in CNF as:

(and       (or          (not (isa ?afp AdultFemalePerson))          (not (residesInRegion ?afp Guam))          (acquaintedWith Zippy ?afp))       (or          (not (isa ?afp AdultFemalePerson))          (not (residesInRegion ?afp Guam))          (likesAsFriend Zippy ?afp)))

Each of the conjuncts would become a separate assertion.

Converting to CNF is part of the job of the Cyc-10 canonicalizer. The canonicalizer turns CycL formulas into canonical form, so that that they can be added to the KB as assertions, looked up in the KB, etc. Some of the other things the canonicalizer does are outlined below.

In Cyc-10, as well as in earlier versions of Cyc, universal quantification is handled trivially by leaving universally quantified variables unbound, while existential quantification is handled through the use ofSkolem functions. Thus an assertion which is originally entered as:

(forAll ?A       (implies          (isa ?A Animal)          (thereExists ?M             (and                (mother ?A ?M)                (isa ?M FemaleAnimal)))))

will be converted to something like:

(implies         (isa ?A Animal)        (and            (mother ?A (SkolemFunctionFn (?A) ?M))           (isa (SkolemFunctionFn (?A) ?M) FemaleAnimal)))

and then further converted to CNF:

(and       (or          (not (isa ?A Animal))          (mother ?A (SkolemFunctionFn (?A) ?M)))       (or          (not (isa ?A Animal))          (isa (SkolemFunctionFn (?A) ?M) FemaleAnimal)))

Skolem functions are handled exactly like all other functions (except that Cyc creates and names the function term for you). You are free to rename the function term, or do anything else with it that you might do with a function such as MotherOf. In fact, in Cyc-10 the Skolem function itself is reified, so that the formula above becomes the following, and would be added to the KB as 6 separate assertions:

(and       (or          (isa SKF-8675309 SkolemFunction))       (or          (arity SKF-8675309 1))       (or          (arg1Isa SKF-8675309 Animal))       (or          (resultIsa SKF-8675309 FemaleAnimal))       (or          (not (isa ?U Animal))          (mother ?U (SKF-8675309 ?U)))       (or          (not (isa ?U Animal))          (isa (SKF-8675309 ?U) FemaleAnimal)))

Another task the Cyc-10 canonicalizer takes care of is the ordering of literals in assertions. The advantage of a canonical literal order is that it simplifies KB lookup; only a test for structural equality is required to determine that 2 formulas are the same.

The major advantage to using CNF as an internal representation is that it greatly simplifies the conceptual scheme used in inferencing, because all axioms have a uniform structure. For example, using CNF as a canonical form means that you get modus tollens inferencing “for free”. When you add

P and Q => R

to the system, it gets canonicalized to the CNF form

not(P) or not(Q) or R

Note that both of the following would be canonicalized to the same CNF form:

P and not(R) => not(Q)    Q and not(R) => not(P)

Since the internal CNF can be used to attempt to prove any of three literals of which it is constituted, it’s as if all three forms of the rule were virtually present in the KB.

There is only one potential downside to using CNF, which is that certain types of assertions which can be expressed quite compactly in conditional form become somewhat unwieldy when converted to CNF. Specifically, an assertion of the form:

(implies       (or  P1 P2 P3 ... Pm)       (and Q1 Q2 Q3 ... Qn))

will result in a CNF with m times n conjuncts. However, we do not regard this as a significant problem, since asserting formulas of this form constitutes bad KE style, and thus is unlikely to occur very often. In particular, a knowledge enterer who writes such a formula is probably attempting to use the P1 … Pm literals as an exhaustive list of cases in which the consequent should hold, when he or she should have been shooting for a single meaningful generalization.