April 20, 2015

Documenting the KB


Every constant in Cyc should be well-documented. While the term “documentation” refers to any English text in the Knowledge Base used to explain the intended meaning of a CycL constant, ontological engineers are only required to give every constant a comment, which is the the most common form of documentation in the KB. Whenever a constant is created, it is important to attach a clear and illuminating #$comment to it. While it may appear to the Cyclist creating the constant that the name of the constant gives all the significant information, future Cyclists may not agree. By reading the comment, Cyclists should be able to understand both the meaning of a CycL assertion containing the constant (assuming she understands the meaning of all of the other constants in the assertion), as well as how to construct future assertions using this new constant.

More detailed guidelines on writing comments are available in the next section, “A Style Guide for Writing #$comments.” It may also be useful to look at the comment on #$comment.

When writing a comment, be sure to mention what type of thing is denoted by the constant (i.e. individual, collection, function, or predicate). Then use clear and grammatically correct prose to attempt to explain what counts as a proper use of the constant.

Let’s look at some examples of well-written comments:

4.1.1. Collection: StuffType

comment: "The collection of all collections that are stuff-like in at least one respect. A collection COL is stuff-like just in case there is some sense of 'part' according to which every part of an instance of COL is itself an instance of COL.

More precisely, for a collection to be a StuffType it is sufficient that there be some spec-pred of #$parts, PARTPRED, such that if (isa OBJECT1 COL) and (PARTPRED OBJECT1 OBJECT2), then (isa OBJECT2 COL). (See CyclistNotes for more detail). Here are two examples. Consider #$Breathing. Take an instance of that, say, a ten minute long period in which I am breathing. Imagine some two minute snippet of that, one of its #$timeSlices (a spec-pred of #$parts). That, too, is an instance of #$Breathing. So #$Breathing is a #$StuffType, since all #$timeSlices of an instance of #$Breathing are also instances of #$Breathing. Consider #$Water. Take any instance of #$Water -- say the water in the Pacific Ocean. Now take any portion of that water -- say a handful that I scoop up near Honolulu, one of its #$physicalPortions (a spec-pred of #$parts). That handful is itself an instance of #$Water. Hence #$Water is a#$StuffType, in virtue of the fact that all #$physicalPortions of all instances of #$Water are themselves instances of #$Water. Other examples are: #$AbstractInformationalThing, which is stuff-like with respect to #$subInformation; #$CharacterString, which is stuff-like with respect to #$subCharacterStrings; and #$List, which stuff-like with respect to #$sublists. These examples are somewhat exceptional -- most #$StuffTypes are like the examples of #$Breathing and #$Water. Before using #$StuffType read the cyclistNotes. See #$ObjectType, for the contrasting notion of being object-like."

StuffType is a particularly technical and complicated concept in the Cyc KB, one not easily understood by new Cyclists; it is likewise a very important concept in the knowledge base, hence requiring a particularly perspicuous comment. It is a virtue, then, that this comment is so very clear, accurately defining the concept in a manner that even someone unfamiliar with CycL will understand.

There are several desirable features of this comment:

  1. The author of the comment told us that, for collections to be StuffTypes, they must be stuff-like in at least one respect. Often, comment writers take for granted that other Cyclists know whether, in order for something to meet a criterion outlined in a comment, some part must fulfill the criterion in some respect, the whole must fulfill the criterion in every respect, or some gradation in between will suffice.
  2. The author clearly defined “stuff-like” in both Cyc-terms (“there is some spec-pred of #$parts, PARTPRED, such that if (isa OBJECT1 COL) and (PARTPRED OBJECT1 OBJECT2), then (isa OBJECT2 COL”), and in English.
  3. The author gives us two positive examples and carefully explains how they meet the criteria for being instances of #$StuffType (again both in English and in Cyc-terms).
  4. The author gives additional examples and briefly explains them. These examples are particularly useful because, while they do not meet the criteria in the paradigmatic way in which #$Water and #$Breathing do, they are in fact examples. Their differences from the prototypical cases demonstrate how broad the usage of the constant is permitted to be.
  5. The author points to two additional sources of information anyone intending to use this constant might want to examine: the cyclistNote and the “contrasting notion”, #$ObjectType.

Also examine:

4.1.2. Predicate: inputsCommitted.

comment : "The predicate #$inputsCommitted is used when some input to a particular event is incorporated into some output of that event, but remains recognizable rather than being destroyed. (#$inputsCommitted EVENT OBJECT) means that OBJECT exists before EVENT and continues to exist afterwards, and as
a result of EVENT, OBJECT becomes incorporated into something created during EVENT. Once incorporated into the output of EVENT, OBJECT can't be independently incorporated in any other creation event. For example, bricks that are used to build a wall continue to exist as bricks once the wall has been built. While a part of the wall, a brick cannot be used as an independent input in another creation event. (See also #$outputsCreated.) Note: there is a grey area between #$inputsCommitted and #$inputsDestroyed (q.v.); the less possible it is to take apart the relevant outputs of EVENT and get OBJECT back as an independent thing, the more likely it is that the relationship between EVENT and OBJECT should be #$inputsDestroyed, rather than #$inputsCommitted."

Again, the author gives a clear explanation of the proper use of the predicate, both in English, and in giving a translation from CycL to English. Furthermore, she identifies a problem in using this constant: it is sometimes very difficult to tell whether it is more proper to use #$inputsCommited or #$inputsDestroyed. Here she gives us a rough guideline to follow should we be in the “grey area” between two constants.

Neither of these constants use negative examples, but these can be very useful as well because they can assist Cyclists in determining just how closely the criteria have to be met in order to determine whether a constant should be used.

Poorly written comments lead to a great deal of confusion. If a comment does not constrain a constant’s use effectively, that constant might become something of a junkyard of unrelated uses. #$MentalObject had to be killed, for instance, because just such a morass of assertions and spec-collections made this a useless and ill-defined constant. Alternatively, should a comment be too confusing, Cyclists, rather than trying to guess at its appropriate use, will ignore the constant entirely. Worse, some Cyclist might create a constant which performs an identical function without realizing it. Thus some critical assertions would be made on one constant and other assertions on another, while both sets of assertions should be on the same constant.

The following comments from the Cyc KB are unilluminating or bad for various reasons. These comments will be changed in the knowledge base, making the constants themselves more usable.

Avoid making similar mistakes when writing comments:

4.1.3. Example Set 1: coworkers, competitors, and hasAssistants

4.1.3.a. Predicate: coworkers.

comment: “Entries are the persons who work with ARG1 as members of a TaskGroup.”

4.1.3.b. Predicate: competitors

comment: “(competitors AGENT1 AGENT2) means that AGENT1 and AGENT2 are competitors.”

4.1.3.c. Predicate: hasAssistants.

comment: “Entries are the persons who are U’s assistants”.

These examples are instances where it seems as if the authors of the comments believed the use to be so obvious that no further explanation is necessary. While it may even be the case that the usage is obvious to most people, even in these cases clearer comments should be written.

4.1.4. Example 2: Connection-Configuration

4.1.4.a. Collection: Connection-Configuration.

comment: "This is the parent unit for all the connection units. A connection is used to describe a relation that one cannot (or does not wish to) represent as a binary predicate or if you want to say something very specific about that specific relation. In general, if a binary predicate is available then the connection unit is not required. eg. The time relations are expressed in terms of binary relations and hence don't come under this. (Though in some abstract sense they are also connections.)"

This is a case where the use of the constant is so obscured in the prose in the comment (and lack of examples), that the collection has become fairly useless.

4.1.5. Example 3: ConstructionArtifact and FixedStructure

4.1.5.a. Collection: ConstructionArtifact.

comment: "A collection of artificial tangible objects. Each element of #$ConstructionArtifact is a structure designed and built by humans. This collection includes buildings and parts of buildings, as well as things like dams, railroad lines, and roads. Examples: the #$RomanColiseum, the #$ArcDeTriomphe, #$HooverDam, the #$WorldTradeCenter, #$HollywoodBowl. For further information, see #$FixedStructure, an important subset.

4.1.5.b. Collection: FixedStructure.

comment: "A collection of artifacts. Each element of #$FixedStructure is a humanly-constructed, freestanding object that exists in a fixed location; e.g., buildings, pyramids, the Great Wall of China, dams, elevated roadways, canals, etc. Such structures may have parts which are also elements of #$FixedStructure (e.g. bridge pilings) and parts which are not freestanding (e.g., the span of a bridge, or a room in a building)."

It is unclear what the distinction between these two constants might be. The comment on #$ConstructionArtifact states that #$FixedStructure is “an important subset” of #$ConstructionArtifact, but neither comment explains what additional criteria elements of #$FixedStructure must meet. At first it appears that parts of structures can count as construction artifacts and perhaps only full-structures can count as fixed structures. But this assumption is falsified by the observation that bridge pilings can be fixed structures. It may be the case the elements of #$ConstructionArtifact needn’t be freestanding the way that elements of #$FixedStructure do. Or, alternatively, perhaps elements of #$ConstructionArtifact need not be fixed locations, so that a space station or a camping trailer might be elements of #$ConstuctionArtifact, but not of #$FixedStructure. But these are merely guesses, and not explicitly stated in the comment. Furthermore, all of the examples appear to be equally plausible as instances of both collections. If creating a constant which may easily be confused with another, be sure to state explicitly the distinguishing characteristics.

4.1.6. Example 4: FriedrichNietzsche

4.1.6.a. Constant: FriedrichNietzsche.

comment: "The German rhetorical stylist, philosopher of ethics and essayist who is famous for his development of #$Nietzschean-Superman, or UberMensch, a person who sets his/her own moral standards and is not dependent on the approval or approbation of his social group. Despite his remarkable misanthropy and misogyny, Nietzsche never directed his bile against any ethnic grouping in fact, it was a distaste for racism in general that caused him to end his friendship with the German composer Richard Wagner. Nonetheless, his moral standpoint was open to misinterpretation, and his philosophies on the superman mentality were later adopted and perverted by the #$GermanNaziParty.".

In addition to its grammatical errors, this comment contains quite a bit of editorializing on the part of the author. It is considered inappropriate to make controversial claims in a constant’s comment without expressly stating the position is a disputed one. Inside jokes and cheeky remarks are also to be avoided when writing comments. While a particular constant may not be part of the public ontology currently, it may be made public eventually, and the comments on public constants must be acceptable to people other than Cyclists. Any notes that are only of use to fellow Cyclists should be included in #$cyclistNotes.

4.2. A Style Guide for Writing #$comments

The predicate #$comment is used to attach a string of explanatory text to a CYC constant. Whenever a CYC constant is created, a #$comment assertion should be written for it. This page gives some guidelines for the content and style of text in comments. Clarity is the most important stylistic criterion, because we want readers to understand the intended meaning and proper usage of our constants.

4.2.1. GENERAL GUIDELINES for Composing Comments

  1. Mention at the outset the type of constant you’re describing (e.g., collection, predicate, function). Guidelines for explaining specific types of constant are given below.
  2. Explain the meaning of the constant in clear and economical prose, correctly punctuated and spelled. Use complete sentences, except perhaps for the very first one, which can be a dictionary-like phrase such as “A collection of events.”
  3. Be precise in your use of CYC constant names within #$comment text:
    1. In #$comment text, CYC constant names should be preceded by the “#$” prefix, so that links from your comment to those constants can be generated in the html browser. Misspelled names will not link. Note that CYC’s code can handle a leading or ending parenthesis, and a trailing comma, period, semicolon, colon, question mark, exclamation point, hyphen, or plurals formed with “s” and “es.” It cannot, however, handle plurals formed by removing “y” and adding “ies”; for example, ‘#$Microtheories’ (from #$Microtheory) would not be recognized.
    2. Don’t confuse the names of CYC constants (e.g., #$Animal) with their English homonyms (e.g., ‘animal’). The name of a CYC collection (e.g., #$Person, #$Speaking) is not the same thing as an English noun. Don’t say, for example, “#$Person is the collection of those #$Animals which….” Instead, say “#$Person is the collection of those animals which…;” or, alternatively, “#$Person is a specialization of #$Animal.” Although an experienced Cyclist could recognize that ‘Flipper is a #$Dolphin’ “means” that Flipper is an element of the CYC collection #$Dolphin, rather than that he is the same type of thing as that collection, the sentence is potentially confusing. Such casual “insider” usages exist in the KB, but we try to avoid adding similar violations in new work.
  4. To help explain the meaning of a constant, give some example instances of collections and some typical uses of a predicate or a function. It is permissible to use made-up constant names for examples, with the following proviso: If a made-up constant represents a concept that CYC should eventually know about, write its name preceded by “#$” (e.g., #$SpaceStation). If it is only a random example, such as Dog001 or KarensLeftEar, omit the ‘#$’ prefix.
  5. Mention and perhaps illustrate any special restrictions on the meaning or use of the constant.
  6. Use a Note at the end of the #$comment text for any extra clarifications or details relevant for a general user. A Note may be used to present exceptions, borderline examples, one or more “footnotes,” or pointers to related constants that a reader might want to examine (especially if those are not displayed already