The IKB build is probably the most complex KB update-and-rebuild operation currently undertaken at Cycorp. Essentially it involves taking a slice of a continuously-revised Knowledge Base, and running a series of cleanup operations to insure that there are no seams showing from the numerous editings that occur when the IKB is extracted. It cannot be emphasized enough to those working on IKB related projects:
The (isa ?X IKBConstant) tag determines, and is the only thing that determines, what does or doesn't get included in the IKB. If you want a constant you are creating to be included in the IKB, it must be tagged in this way, using either IKBConstant itself, or a #$CycKBSubsetCollection that is a known spec of IKBConstant.
The implications of this should be borne in mind: in particular, there are two that are critical. First, if a constant is an IKBConstant, but some term that is referenced in a key piece of its definitional info (isa or genls, say, in the case of a collection) isn't, then it will go into the IKB missing that piece of definitional info. Therefore, you need to make sure that all of the terms needed to define your constant are also IKBConstants. I emphasize this particularly because it is the single biggest source of questions concerning whether or not a constant should be released. If questions of this nature arise, consult your project manager.
The other key point to bear in mind is that the microtheory in which an IKBConstant is defined must also be an IKBConstant; otherwise the term itself will go in, but with most of its assertions lost because they don't have an mt-of-assertion in the IKB. Again, if this fact gives rise to questions about what should or shouldn't be released, you should consult your project manager.
#$PublicConstant is the collection of CycL constants that are included in the part of the Cyc ontology that is released to the public. Many of constants in the so-called upper-ontology -- which comprises the most general concepts represented in Cyc -- are public constants. middle- and lower-ontology constants can be public as well, if deemed appropriate for public consumption. While all constants in the Cyc Knowledge Base should in principle be well-defined, nicely-commented, and adequately axiomatized, these virtues are particularly important in the case of public constants. Let's briefly consider each of these three virtues in turn.
A constant CONST is well-defined, in the present sense, when the "definitional" assertions on it are accurate (given the intended meaning CONST) and complete. The definitional assertions on CONST include (but are not limited to) most of those displayed when one brings up CONST in the KB Browser and selects the "Definitional Info" option in the Index. Thus included are atomic assertions in which CONST is the first argument (arg1) to a #$DefinitionalPredicate (or, in some cases, a #$SometimesDefinitionalPredicate or a strictly #$PossibleDefinitionalPredicate), such as #$isa, #$genls, #$arg1Isa, #$arg2Genl, #$arg3Format, #$genlPreds, #$genlInverse, #$negationPreds,#$negationInverse, #$interArgIsa1-2, #$disjointWith, #$collectionIntersection, or #$salientAssertions.
A good comment on a constant is clear, accurate, and as precise as it reasonably can be. It should include -- as appropriate -- examples, qualifications, borderline cases, exceptions, and so on. It should be grammatically perfect, and free of undefined acronyms with which the average non-Cyclist user cannot safely be assumed to be familiar.
A constant is adequately axiomatized for present purposes when, at a minimum, the content of its comment is completely represented in rules or other assertions. Ideally, all of these comment-explicating assertions should consist entirely of public constants, as assertions that contain any non-public constants will not be part of the public release. Also, ideally it should not contain any instances of #$ProprietaryConstant, as no #$ProprietyaryConstant can be a release constant of any kind. Axioms that are of central importance to the meaning of a constant and would not otherwise appear in the Arg1 index for the constant in the KB Browser, should be made its #$salientAssertions.
The predicates #$quotedCollection and #$quotedArgument are used in certain situations where one wants to use CycL to assert something about a CycL expression itself, or about a class of CycL expressions. Cyclists have found it useful to reify various collections whose instances are CycL expressions, and various predicates that relate CycL expressions to things of one sort or another. Such collections and predicates enable us to represent (and make inferences with) many key facts about the CycL language and the Cyc system as assertions in the Knowledge Base, rather than consigning them strictly to separate company documentation (which tends to become stale) and the minds of Cyclists (who can forget, misremember, or fail to understand things). For example, we have the quoted-collection #$PublicConstant, which is the collection of all the CycL constants included in the part of the Knowledge Base released to the public; and we have the predicate #$myCreator, which has a quoted-argument-place, and relates a given CycL constant to the Cyclist who created it.
A potential stumbling-block to the effective use of these kinds of collections and predicates it that there is currently no general mechanism in place for denoting particular CycL expressions within the CycL language itself. As with any language, to use a given CycL term is, usually, to refer to the term's (usual) denotation. But what if one wants to refer to the term itself? In natural languages like ordinary written English, of course, one can employ quotation marks (or italics) for this purpose: by enclosing an English word or phrase within a pair of quotes (or italicizing it) we indicate our intention to refer to that word or phrase itself (as opposed to whatever the word/phrase might normally denote or represent). There is nothing in CycL, however, analogous to direct quotation. (This is the case with most formal languages.) Hence, there is no direct way to say, in CycL, that a particular CycL expression is an instance of a given collection, or that a given predicate holds of the expression. #$quotedCollection and #$quotedArgument, however, enable us to make such assertions, albeit in an indirect way.
- A unary predicate that can take any instance of #$Collection as argument. (#$quotedCollection COLLECTION) means
- the instances of COLLECTION are CycL expressions and
- an #$isa statement in which COLLECTION appears as the second argument (or "arg2") -- i.e. a CycL sentence of the form (#$isa TERM COLLECTION) -- means that the CycL term TERM itself (as opposed to what TERM denotes) is an instance of COLLECTION.
Thus, (#$isa #$IndianOcean #$PublicConstant) means that the CycL constant '#$IndianOcean' is an instance of the collection #$PublicConstant; it does not mean the Indian Ocean (which is an ocean and not a constant) is a public constant.
- A binary predicate that takes a #$Relation (such as a #$Predicate or #$Function-Denotational) and a #$PositiveInteger as arguments. (#$quotedArgument RELATION N) means that RELATION's Nth argument-place is quoted. That is, a ground atomic formula (or GAF) built from RELATION and having a CycL term TERM in its Nth argument-place means that TERM itself (as opposed to what TERM denotes) stands in RELATION with the rest of the GAF's arguments. Thus, (#$myCreator #$Thing #$Lenat) means that the CycL constant '#$Thing' was reified by Doug Lenat; it does not mean that Lenat created the collection of all things.
A notable consequence of the meaning of #$quotedCollection as described above is that a GAF based on #$isa that has a quoted-collection as its second argument is interpreted somewhat differently than a similar GAF having a non-quoted-collection in that position. Further, since #$genls derives its meaning (via its rule macro expansion) from #$isa, #$genls statements involving quoted-collections also mean something slightly different than they otherwise do. This is especially true when one of the collections is quoted and the other is not. Suppose QUOTED-COL is a quoted-collection and NON-QUOTED-COL is a collection that is not quoted. Then (#$genls QUOTED-COL NON-QUOTED-COL) means that if a CycL term TERM is an instance of QUOTED-COL, then (not TERM itself, but) the thing denoted by TERM is an instance of NON-QUOTED-COL. (#$genls #$CycLExpression #$Thing) is an example such an assertion in the Knowledge Base. (A sentence of the form (#$genls NON-QUOTED-COL QUOTED-COL), though rarely asserted in the Knowledge Base, means that if a thing THING is an instance of NON-QUOTED-COL then any CycL term that denotes THING is an instance of QUOTED-COL.)
Similarly, certain #$genlPreds statements involving predicates with quoted argument-places have somewhat skewed meanings. Suppose QUOTED-PRED is a predicate whose Nth argument-place (only) is quoted and NON-QUOTED-PRED is a predicate with no quoted argument-places. Then (#$genlPreds QUOTED-PRED NON-QUOTED-PRED) means that if QUOTED-PRED holds of a given sequence SEQ of argument-values, then NON-QUOTED-PRED holds (not of SEQ itself, but) of the sequence obtained by replacing the Nth item in SEQ (which will be some CycL term) with what it denotes. If QUOTED-PRED has multiple quoted argument-places a completely analogous, though more complicated, replacement is required. (#$genlPreds #$equalSymbols #$equals) is an example of such an assertion in the Knowledge Base.
It should be noted that the sorts of meaning-shifts associated with #$quotedCollection and #$quotedArgument described above raise certain difficulties for the interpretation of CycL, only some of which can be explained here. Consider the facts, described above, that the meaning of a sentence of the form (#$isa X COLLECTION) varies depending on whether or not COLLECTION is quoted, and that of (PRED ARG1...ARGN) varies depending on whether any of PRED's argument-places are quoted. The question arises as to how to reconcile these shifts in sentence-meaning with the intended meanings of such sentences' constituent expressions. Let us define a quoted-occurrence of a CycL term TERM as any occurrence of TERM as either (i) the arg1 of an #$isa sentence whose arg2 is a quoted-collection or (ii) the argN of a sentence built with a predicate whose Nth argument-place is quoted. A simple and initially attractive account would be to say that quoted-occurrences of TERM denote TERM itself (as opposed to whatever TERM normally denotes). Thus, (#$isa #$IndianOcean #$PublicConstant) means what it does (see above) precisely because '#$IndianOcean' denotes itself (the term '#$IndianOcean') in that context, while the other constituents of the sentence -- in particular '#$isa' -- mean exactly the same thing they do in any other context. And the same account serves equally well in explaining the meanings of sentences built from predicates with quoted argument-places. On this approach, the source of such sentences' skewed meaning is traced soley to a meaning-shift associated with quoted-occurrences of terms.
But this approach, while providing a reasonable account of GAFs containing quoted-occurrences of terms, falters in the case of rules. In a rule, a single bound variable can occupy both "quoted" and "non-quoted" positions simultaneously. Take the above #$genls example and consider its rule-macro expansion:
(#$implies (#$isa ?VAR QUOTED-COL) (#$isa ?VAR NON-QUOTED-COL).
The second, non-quoted occurrence of '?VAR' in this rule is ordinary and unproblematic. But what about the first, "quoted" occurrence? If we extend the present approach to include variables, and say that as a quoted-occurrence it denotes itself, it follows that this occurrence of '?VAR' is functioning as a constant (that denotes a certain CycL variable) instead of as a genuine variable, and that, consequently, the rule says something very different from what it is intended to. Conversely, if we refuse to extend our approach to variables, and view both occurrences of '?VAR' as genuine occurrences of the same variable, the rule still would not say what it was intended to, but rather the (quite possibly false) claim that any instance of the quoted-collection QUOTED-COL (i.e. some CycL term) is an instance of the non-quoted collection NON-QUOTED-COL.
An alternate approach, which traced the meaning-shifts associated with #$isa sentences to a systematic ambiguity in the meaning of the predicate '#$isa' itself (instead of to the term occupying its first argument-place) might avoid this difficulty with rules and variables. But that approach would raise new difficulties of its own, one being that it seems incapable of being extended to a plausible account of the meaning-shifts involving #$genlPreds sentences. (Space limitations prevent a more thorough pursual of this alternate approach here.)
A more radical alternative -- which would obviate the need for predicates like #$quotedCollection and #$quotedArgument and the meaning-shifts they entail -- would be to introduce a general method into the CycL language such that, given any CycL term TERM, another term can be found or constructed that unambiguously denotes TERM. Such an alternative would amount to having something like direct quotation in CycL. It has been suggested that this could be done via the introduction of a "quotation function" on CycL expressions, but that is a dubious proposal. For there is not and cannot be a general function from things to their names, even if the named things are restricted to being linguistic expressions themselves, due to the simple fact that a single thing can have multiple names (Cyc's so-called unique name assumption notwithstanding). A more promising approach along the same lines would be to introduce a new class of special CycL terms expressly intended to denote other CycL terms, and then assign the former terms their denotations by some rules or principles stated in the meta-language of CycL instead of trying to do this via a function defined in CycL itself. Perhaps these special terms should (or must) themselves be precluded from being denoted by CycL terms, but that would seem to be a minor limitation. Implementing this latter approach, however, would require giving a considerably more explicit description of the meta-language of CycL than has hitherto been done.