Mistakes to Avoid

9.1. Skipping the Predicate

9.1.1. What is "skipping the predicate" and why is it bad?

"Skipping the predicate" is using a functional (NAT) representation without any underlying semantic predicate representation. It's bad because it (a) restricts inference capability to terms that have a particular syntactic form, even if the semantics should allow the inference, (b) forces ontologists to overuse forward rules, and (c) causes ontologists to write rules that are more complicated than necessary, and thus have poor representational modularity as well as suboptimal inference efficiency.

NAT functions should almost always be viewed as constructors for underlying predicate vocabulary. Therefore, the defining rules for the NAT should usually be forward, and conclude to the predicate representation. Then, other rules (which may be forward or backward as appropriate) define the predicate representation.

When "skipping the predicate" occurs, ontologists write rules going straight from the NAT representation past any intermediate predicate representation. Thus, they almost always have to make their rules forward in order to conclude correct behavior from a new #$termOfUnit, even if the final result need not be concluded forward. Also, they have to duplicate the NAT quantification in every rule.

The reason "skipping the predicate" is easy to do is that #$termOfUnit can be used as the (very weak) underlying predicate representation. In other words, the relationship between a NAT and its args is implicit in the very representation of the NAT, and rules that quantify into NATs exploit this syntactic relationship.

Most rules should be defining and depending on semantic relationships rather than syntactic ones. Therefore, it is important that the underlying relationship hinted at syntactically in a NAT be made semantically explicit.

Here's a perfect example from KB history :

Originally the NAT function

  #$SaintFn : #$Person -> #$Saint

was created to denote things like

    (SaintFn ThomasAquinas)

However, there was no explicit relationship between

    (SaintFn ThomasAquinas)

and #$ThomasAquinas defined! There was only the implicit one that the NAT is constructed from two terms, its ARG0 and ARG1.

There were some rules defining #$SaintFn, and all of them were more complicated than necessary because the representation "skipped the predicate". One example:

  (implies
    (termOfUnit ?SAINT (SaintFn ?PERSON))
    (startsAfterEndingOf ?SAINT ?PERSON))

One drawback of this representation is that if we reified

  #$SaintThomasAquinas

we would have no way to conclude that he starts after the ending of #$ThomasAquinas, because the CycL term #$SaintThomasAquinas does not match the syntactic pattern (SaintFn ?PERSON) in the faulty rule.

Clearly, there is a deeper semantic relationship between Thomas Aquinas and St. Thomas Aquinas rather than "arg 1 of NAT":

  #$saintlyIncarnation : #$Person x #$Saint

With this, the only forward defining rule needed for #$SaintFn is:

  (implies
    (termOfUnit ?SAINT (SaintFn ?PERSON))
    (saintlyIncarnation ?PERSON ?SAINT))

or, equivalently and more simply:

  (saintlyIncarnation ?PERSON (SaintFn ?PERSON))

Now, the rules defining #$SaintFn are used to define #$saintlyIncarnation instead, in much simpler and efficient form. For example, we can now unassert

  (implies
    (termOfUnit ?SAINT (SaintFn ?PERSON))
    (startsAfterEndingOf ?SAINT ?PERSON))

and replace it with

  (genlInverse saintlyIncarnation startsAfterEndingOf).

Reifiable functions should be considered constructors of objects which participate in relationships. In almost every case, there should be an underlying predicate for which the function is defining uniform uses. The underlying relationship that #$SaintFn was implicitly defining was

  "relation between a person and the associated saint of that person"

That relation is at least as important as #$SaintFn, and deserves to be semantically explicit rather than syntactic and implicitly hidden inside #$termOfUnit.

The arguments to any NAT should always have some underlying predicate relationship to the NAT itself. In the case of a high-arity NAT, there may be many predicates being skipped, not just one. For example, consider the binary function #$BorderBetweenFn. We could use the ternary predicate #$bordersOnRegion to relate the two regions to their border, or we could use the binary predicate #$borderOf to relate the border to each region it borders. Sometimes you will need a different predicate to relate each argument to the NAT; sometimes you can use the same one. Often you will find the the underlying semantic relationship between the NAT and one of its arguments is #$resultIsaArg or #$resultGenlArg.

9.1.2. How to avoid skipping the predicate

CycL is more strongly predicate-based than function-based. Hence, you should always have a predicate-based representation; a function-based representation is optional. The best way to avoid skipping the predicate is to think in terms of predicates, and that way you'll only create functions if their corresponding predicates are already defined.

However, if you come across an example of where the predicate has been skipped, you can usually use the following set of instructions to clean it up.

Let's call the instance of #$Function-Denotational FUNC, and you need to find or create a corresponding predicate, PRED.

  1. First, make sure that there is no existing predicate which can fill the role of PRED.
  2. If there is not, then create PRED, make it an instance of #$FunctionalPredicate, and make the appropriate #$argFormat #$SingleEntry assertion on PRED.
  3. Assert the proper arg constraints on PRED.
  4. Assert the atomic defining rule for the function, with forward direction. For example, if N is 1, the defining rule would be:
      (PRED (FUNC . ?ARGS) . ?ARGS)
  5. Thereafter, never use a NAT formed with FUNC in a rule. Use PRED instead.
  6. Another thing that is useful to assert is a functionCorrespondingPredicate-Canonical GAF linking FUNC and PRED. See #$functionCorrespondingPredicate-Canonical for more details.

9.1.3. Examples of skipping the predicate, and how to repair each one

Here are some more real examples from KB history in which Cycorp ontological engineers who have skipped the predicate, or have identified a case of where the predicate was being skipped.

9.1.3.a. #$VaccineAgainstMicroorganismFn skipping #$vaccineEffectiveMicroorgType

Example:

  (implies
    (and
      (termOfUnit ??VACC (VaccineAgainstMicroorganismFn ?MICTYPE))
      (isa ?MIC ?MICTYPE))
    (hasAttributes ?MIC Pathogenic))

Using #$termOfUnit in the above fashion is writing what is in essence a semantic rule based instead on a happenstance syntactic form. This is directly the essence of why the predicate shouldn't be "skipped".

Instead, we should have #$vaccineEffectiveMicroorgType which provides a semantic relationship between ?TERM and ?MICTYPE.

Then we would have the forward rule:

  ;; define NAT in terms of predicate
     (implies
        (termOfUnit ?VACC (VaccineAgainstMicroorganismFn ?MICTYPE))
        (vaccineEffectiveMicroorgType ?MICTYPE ?VACC))

Then, the above rule can be reformulated as this semantic rule:

   ;; ??VACC naming convention since the variable is a don't care
      (implies
         (and
           (vaccineEffectiveMicroorgType ?MICTYPE ??VACC)
           (isa ?MIC ?MICTYPE))
         (hasAttributes ?MIC Pathogenic))

Even better, we should have

   ;; this could justifiably be forward
     (implies
       (vaccineEffectiveMicroorgType ?MICTYPE ??VACC)
        (relationAllInstance hasAttributes ?MICTYPE Pathogenic))

and leverage the HL modules for relationAllInstance for backward inference.

9.1.3.b. TravelingByMeansOfFn skipping transportMeansOfType

Example:

The following query

  (tourTransportTypes EHbT-Tour 
    (TravelingByMeansOfFn ?TBT))

relying on the supporting rule in TourAndVacationPackageItinerariesMt

  (implies 
    (and 
      (tourHighlights EHbT-Tour ?EVENT-TYPE) 
      (genls ?EVENT-TYPE ?TRAVELINGBYMEANSOFFN) 
      (termOfUnit ?TRAVELINGBYMEANSOFFN 
        (TravelingByMeansOfFn ?MEANS)))
    (tourTransportTypes EHbT-Tour ?TRAVELINGBYMEANSOFFN))

is an example of skipping the predicate for #$TravelingByMeansOfFn.

We should have transportMeansOfType

1 forward rule :

   (implies
     (termOfUnit ?NAT (TravelingByMeansOfFn ?VEHICLE-TYPE))
     (transportMeansOfType ?VEHICLE-TYPE ?NAT))

All 6 rules involving

  (termOfUnit ?NAT (TravelingByMeansOfFn ?VEHICLE-TYPE))

should be written in terms of #$transportMeansOfType

The query could then be

 (thereExists ?TRANSPORT-TYPE
  (and
   (tourTransportTypes EHbT-Tour ?TRANSPORT-TYPE)
   (transportMeansOfType ?VEHICLE-TYPE ?TRANSPORT-TYPE)))

9.1.3.c. BodyPartFn skipping anatomicalParts

Example:

The function #$BodyPartFn in the KB is insufficiently defined unless it is hooked up to a predicate. In this case, the predicate #$anatomicalParts exists, and we can either use that predicate, or create a more precise spec-pred of #$anatomicalParts.

Don't skip the predicate

A rule like this would be skipping the predicate:

  (implies
    (isa ?GIRAFFE Giraffe)
    (hasVisibleSurfacePatternType 
      (BodyPartFn ?GIRAFFE Neck-AnimalBodyPart)
      SpottedPattern))

Or, after canonicalization,

  (implies
    (and
      (isa ?GIRAFFE Giraffe)
      (termOfUnit ?BODYPARTFN (BodyPartFn ?GIRAFFE Neck-AnimalBodyPart)))
    (hasVisibleSurfacePatternType ?BODYPARTFN SpottedPattern))

So, the right way to write this rule, instead of linking the giraffe with its neck via the syntactic relationship #$termOfUnit, is to create a new predicate to use as the underlying semantic relationship. Let's call it #$uniqueBodyParts, and rewrite the rule as follows:

  (implies
    (and
      (isa ?GIRAFFE Giraffe)
      (isa ?NECK Neck-AnimalBodyPart)
      (uniqueBodyParts ?GIRAFFE ?NECK))
    (hasVisibleSurfacePatternType ?NECK SpottedPattern))

Multiple functions can use the same predicate

The suggested way to fix the "Skipping the Predicate" malady does not require that there must be a 1-1 relationship between denotational functions and their underlying predicates.

There just should not be a 1-0 relationship for any of them.

So several functions could relate to the same predicate, like #$anatomicalParts. Hence we do not need to create #$uniqueBodyParts just for this reason, and in fact our rule would be even better and more general if we used #$anatomicalParts instead of #$uniqueBodyParts.

  (implies
    (and
      (isa ?GIRAFFE Giraffe)
      (isa ?NECK Neck-AnimalBodyPart)
      (anatomicalParts ?GIRAFFE ?NECK))
    (hasVisibleSurfacePatternType ?NECK SpottedPattern))

This would apply to, for example, two-necked giraffes.

Appendix: An Object-Oriented Programming Analogy

For those familiar with object-oriented programming, the following analogy may help you understand why skipping the predicate is undesirable.

CycL terms are like objects, and NATs are like specialized constructors for objects. When you're designing an object hierarchy, it's important to have the methods inherit from the class, not from the particular way it was constructed. Skipping the predicate is like putting all the methods inside the constructor. The methods should apply to any instance of a class, regardless of the method by which that object was constructed.

9.2. "Don't-Care" Variables

9.2.1. What are don't-care variables?

A special naming convention for CycL variables is the "??" prefix. By prepending two question marks instead of just one, this indicates to Cyc that the variable is a "don't-care" variable. "Don't-care" variables are variables in rules for which you don't care about the bindings. Here's an example usage of a don't-care variable.

  (implies
    (singular ?WORD ??STRING)
    (posForms ?WORD SimpleNoun))

This rule says that if a word has a singular form, no matter what the singular form is, the word is going to be a SimpleNoun.

9.2.2. Why are don't-care variables useful?

  (implies
    (objectHasColor ?OBJECT CrimsonColor)
    (objectHasColor ?OBJETC RedColor))

Because of the typo in the second variable, this is logically equivalent to

    (or
      (forAll ?OBJECT
        (not
          (objectHasColor ?OBJECT CrimsonColor)
      (forAll ?OBJETC
        (objectHasColor ?OBJETC RedColor))))

So, either every object is non-crimson, or every object is red. Although implausible, this is semantically well-formed. Typos in variables can lead to outrageous inferences. To help ontologists track these down, there are diagnostics (accessible via the Diagnose button in the Cyc Browser) that can be applied to rules to identify variables that only appear once in a rule. However, occasionally you want to use a variable exactly once in a rule, as in the original example:

  (implies
    (singular ?WORD ??STRING)
    (posForms ?WORD SimpleNoun))

where the variable ??STRING is only mentioned once. If you explicitly label such variables as don't-care, as in this example, the diagnostics will assume you knew what you were doing, and let it be.