April 20, 2015

Naming Conventions for CycL Constants

When reading CycL or sorting through CycL constants, it is useful to be able to to garner as much information as possible about the kind of thing a given constant denotes just by looking at that constant itself. To facilitate this, Cyclists have adopted various conventions for giving names to constants. By the “name” of a CycL constant, what we really mean is the string of characters that make up the constant, minus the initial ‘#$’.

The name of a CycL constant must follow these rules. Some of these rules that are strictly enforced and inviolable, and thus Cyc will not even permit a Cyclist to introduce a constant that violates it. Other such rules are simply strong suggestions, and, while Cyclists are encouraged to follow them, they are tehcnically violable. First, we’ll examine the conventions that the Cyc system enforces.

  • All CycL constant names must be at least 2 characters long, not including the prefix. ¬†All CycL constants begin with ‘#$’, but this prefix is supplied automatically by the Cyc sytem in most cases, making it unnecessary to include when entering a constant into the ‘Create Constant’ window of the ‘Search’ window, for example.
  • Constant names can contain upper and lower case characters and colons, underscores, and dashes. valid-constant-name-p can be used to check that a string is a valid constant name.
  • While the names of constants is case-sensitive, creating multiple constants that differ only in character case is not allowed.

The rest of the conventions, while not enforced by the Cyc system, are important to follow.

  • All Cyc constant names should be composed of one or more meaningful “words” in sequence, with no breaks except for dashes or underlines (e.g. #$isa and #$SportsCar). A sequence of numeric characters may count as a “word” (e.g., #$FrontOfficeOf123Corp). With the exception noted above for predicate names, each (non-numeric) “word” in a sequence must begin with a capital letter. Abbreviations for common English words, such as ‘Reln’ for ‘Relation’ are also acceptable. Am acronym or non-acronymic initials of multi-word names (such as IBT for “information bearing thing) may count as a “word”, but all its characters will be the same case (e.g., lower case if the acronym begins the name of a predicate constant; otherwise uppercase).
  • When creating constants with more than one word in the name, each new word begins with a capital, letter except the first word in the name of a predicate or if it’s a numeric string.
  • Hyphens are used to to set off parts of names which restrict or refined the meaning of the name, as in #$Fruit-TheWord or #$Horse-Domesticated.
  • All things being equal, it is best to give similar constants names which are similar, thus making them easier to find and identify in the Knowedge Base.

As Cyc’s natural language capability improves, and new lexical lookup utilities are added, it becomes easier to look up constants by any of the strings known to refer to them, rather than by their constant name. For example, if you type “FBI” into the “Search” window of the browser, it offers #$FederalBureauOfInvestigation as a disambiguation. Hence, naming constants is only one piece of the work; doing thorough lexification is also very important.

When naming a constant, it’s important to assign a name that distinguishes the denoted concept or object from other concepts or objects it might get confused with. So “Bow” would be a terrible name for a constant. Instead, names like “Bow-BoatPart”, “BowTheWeapon”, “Bowing-BodyMovement” should be used, depending on the underlying concept or object denoted.

Sometimes it is possible to take this principle of specificity in names to an extreme, and attempt to embody the whole meaning of the constant in its name. This is discouraged. For example, one might be tempted to give the constant #$physicalParts the name “distinctIdentifiablePhysicalParts”, but it is better to leave the name a bit terser since it isn’t easily confused with some other concept, and put the additional information in the constant documentation. When names of constants become overly long, they become cumbersome to include in assertions and the risk for making undetected typos is much greater.

It’s important to remember that the names we assign to constants mean nothing to Cyc. It doesn’t matter whether the color green is represented by #$Green,#$GreenColor, #$Verde, #$Gruen, or #$EMRG. It’s also very important to note that, while it’s likely that one can determine the denotation of a CycL constant by looking at its name, one cannot always make such a determinination. It is best to read the comment and the axioms on a constant to determine its denotation. For instance, does #$Turkey refer to the country or the bird? The naming convention on countries currently does not require a hypen followed by a disambiguation, so it is likely that this is the country and the bird would be represented by #$Turkey-TheBird. It is important to read the documentation before making any assumptions, just in case.

A person can ascertain a constant’s meaning by looking at the assertions in the Knowledge Base that use that constant. For example, from the following CycL sentences, (assuming we know what is denoted by Color, Okra, Grass37, etc) it is easy to tell what the denotatum for the hypothetical CycL constant #$MRG:

   (isa EMRG Color)
   (uniformColorOfObject Grass37 EMRG)
   (relationAllInstance uniformColorOfObject Okra EMRG)

For convenience, we choose names for CycL constanst that will indicate to human users what the constant is intended to mean, for example #$RedColor. But evocative names such as #$LittleRedHairedGirlLikedByCharlieBrown does not have any meaning in CycL without being related to the terms #$FemaleChild, #$hairColor, #$RedHairColor, #$CharlieBrown, and #$likesAsFriend, for example.]

The names of constants denoting collections, functions, individual objects all begin with a capital letter, while all predicates begin with a lower case letter. Thus, a the CycL constant #$pyhsicalAttractiveness names a predicate, while #$PhysicalAttractiveness would denote a collection.

The names of constants representing people are usually named with both the first and last name in that order. ‘#$AudreyHepburn’ is conventional, while ‘#$Hepburn’ and ‘#$HepburnAudrey’ are not. There are a few exceptions: the constant names for Cyclists frequently violate this convention. Likewise, if a person is most commonly known by only one name (Colette), or has a pseudonym more common than the person’s real name (Captain Kangaroo), then it is likely easier on Cyclists and users alike to name the constant with the more common name (‘#$Colette’ or ‘#$CaptainKangaroo’ instead of ‘#$Sidonie-GabrielleColette’ or ‘#$BobKeeshan’).

Because the way information works are represented, it is possible to represent both the work itself as well as particular copies or performances of that work. Therefore, it is preferable to distinguish between “Hair-TheMusical” and “Hair-MusicalPerformance” and “Hair-MusicalPerformance006”. Whole types of information works are usually designated with the suffix “-CW”, as in “#$Novel-CW”. The CW stands for “conceptual work”. Computer programs should have the suffix “-TheProgram”. Please see the Handbook section on Conceptual Works for more information.

If two people have the same name, it is useful to distinguish between the two constants by following the actual names with a hyphen and some very notable distinguishing characteristic, such as #$HenryMiller-Author. As “Henry Miller” is acommon name, distinguishing here is probably a good idea.

There are also conventions regarding how CycL terms are referred to in the Knowledge Base. Whenever the word “formula” is used when we talk about CycL, we mean a relation applied to some arguments, enclosed in parenthesis. Thus

        (isa Lassie Dog)
        (JuvenileFn Dog)

are formulae, but #$True and #$Lassie are not.

The word “sentence” is the more specific term used to denote a relation (either a predicate or a truth function) applied to some arguments. Sentences are syntactically well-formed and have a truth-value when they are semantically well-formed as well.

The entire CycL grammar ontology has been recently revised to be a more accurate reflection of the grammar. Each CycL grammar collection has been given more appropriate code support for membership (in the form of defnIff hooks).

All CycL grammar collections should begin with either CycL, EL, HL, or SubL in their names. For example, ‘#$CycLAssertion’ and ‘#$ELSentence-Assertable’ are correct, whereas ‘#$Assertion’ and ‘#$Formula’ are not.