We will now discuss various other errors in knowledge representation.
#$equiv is Cyc’s logical connective for bidirectional implication. This connective exists mainly for completeness and because the canonicalizer uses it (The raison d'etre of the canonicalizer is to soundly translate equivalent epistemological level formulae into a single heuristic level construct. This avoids redundantly adding one formula that is simply a rephrasing of another formula.). It’s best to avoid using this connective during regular KR work, because it’s easy to make errors with it.
Refer to the rule given on the slide. Do you see the problem with this rule? It’s certainly true that all of one’s parents’ brothers are one’s uncles, but not all of one’s uncles are the brother of some parent. Some uncles might have married one of the sisters of a parent.
Another common knowledge representation problem is the over-definition of collections. If you are considering introducing a new collection for which there are only a few things you want to say, and the collection can already be defined in terms of other, already established vocabulary, it’s best not to introduce it and instead just refer to the concept by writing the expanded formula. However, if there are on the order of 10 or more assertions you’d like to make that use that concept, go ahead and introduce the collection even if it’s already definable. For these reasons, it probably would not make sense to introduce the collection #$WhiteCat; however, there is probably sufficient justification for introducing the collection #$BlackCat.
Certain types of knowledge may be characterized by the grouping together of a large number of objects, about which the same kinds of things can be stated. In cases where these properties can be functionally determined, it’s a good idea to take advantage of the regularity and avoid having to create and populate each term by hand, because doing so is an error-prone process.
The metric units of measure are one such knowledge area. As an example, what needs to be stated about Kilogram? It’s a unit of measure. It measures Mass, but that could be determined by the fact that Kilogram is derived from Gram, and Grams measure Mass. A Kilogram is 1000 times larger than a Gram, but that can be determined by the fact that it uses the “kilo” prefix.
We can avoid having to create all the metric units of measure by hand by creating a few basic units of measure such as #$Gram, #$Meter, #$Herz, and introducing this class of functions: #$MetricUnitPrefix. Functions of this class are unary, so all instances take one argument. An example is shown at the bottom of the slide: the function #$Kilo. When applied to a single argument which is a #$UnitOfMeasureNoPrefix such as #$Meter, it produces an instance of #$UnitOfMeasureWithPrefix. This example takes one meter and makes one thousand of them. The 1 in the formula comes from the 1 in front of “km” in natural language.
By functionally generating these new units of measure, we can write rules that completely and correctly enforce the proper constraints. The first rule on the slide states that #$Kilo applied to any unit of measure will generate a new one that is 1000 times greater than the base unit.
The second rule says that any metric prefix applied to a base unit will yield a unit of measure that measures the same quantity as the base unit measures. From these two rules, we get our definitional assertions about (#$Kilo #$Meter). [Note for the interested: #$natArgument and #$natFunction are #$EvaluatableFunctions that can be used in the left hand side of rules to extract items from a NAT expression. You can’t make assertions using #$EvaluatableFunctions, but supporting code exists that will prove these statements true: (#$natFunction (#$Kilo #$Meter) #$Kilo) and (#$natArgument (#$Kilo #$Meter) 1 #$Meter).]
Another common knowledge representation error is to create high arity predicates. People with a relational database background are prone to making this sort of error. The major problem with having a high-arity predicate that collects all of the details of a situation together is that it makes it difficult to state or reason if you only have partial information. For instance, the inventor and time of invention are really independent properties of an artifact and therefore should be represented as separate concepts.
Refer to the slide for another example of a gratuitous higher-arity predicate. In this case, the designer was hiding information about position-played-on-the-team in the order of the arguments. Since each position is a rich object, about which many other things could be stated, and because we’ll want to be able to say that Aikman was quarterback even if we don’t know who all the other players were, it’s better to reify the positions and state each position held independently.
This concludes the final lesson on Errors in Representing Knowledge.