In this FAQ, we will briefly address the following: “What differentiates the Cyc ontology?” We will discuss the size, expressive nature of its contents, its context-sensitivity, and its track-record in a variety of domains.
The Cyc knowledge base (KB) is composed of some 25 million assertions. When you combine this with the generality of the knowledge and the efficient inference engines that can leverage this knowledge to generate new conclusions, Cyc easily has trillions of pieces of usable knowledge. When you add to this the knowledge that Cyc possesses by accessing external databases (akin to how you might say you know the phone numbers stored on the contact list in your phone, even if you could not recite them by memory alone), Cyc’s KB size is clearly differentiated from other AI platforms.
Second, the Cyc ontology is expressed in a higher order logic. While the details of this are important and interesting to logicians, the upshot can be clearly seen in contrast with triplestores (RDF stores). Triplestores are so-called because they take three arguments: subject, object, and a relation between them. Triplestores are often represented graphically, with two nodes connected by some directed relational arrow. This is useful for saying things like the following:
Casey works as an engineer.
The triplestore can then relate the object <Casey> to the object <engineer> by the <works as a> relation. However, English sentences, and the propositions that they represent, are often much more complicated than these two-place relations can handle. Consider:
Casey believes that Lara had coffee with breakfast.
The latter part, “Lara had coffee with breakfast”, is amenable to a triplestore, but nesting that sentence inside a “Casey believes” component is going to be very complicated to represent in such a framework. On the other hand, Cyc’s language, CycL, is expressive enough that you can say anything in CycL that you can say in English. In the present context, the differentiator is that CycL allows for arbitrarily high-arity relations. Instead of being stuck with relations between two objects, you can relate arbitrarily many things. Take another example:
Wearing the red shirt rather than a green one caused the bull to charge rather than ignore him.
In CycL you can express this with something like:
(#$causes-Contrastive <WearingARedShirt> <WearingAGreenShirt> <BullCharges> <BullDoesNotCharge>)
In that case, we have a relation with four relata. Such expressivity allows for flexible representations that do not have to simplify and lose information when knowledge is encoded in Cyc. For a more in-depth discussion, see our Technology Overview.
Third, the Cyc knowledge base utilizes something called “microtheories” to contextualize knowledge. For instance, we could assert in the
#$TheSimpsonsMt (the microtheory for knowledge about the Simpsons) that Bart is a male fourth-grader. But in another context,
#$RealWorldDataMt (the knowledge about the real world), we can assert that Bart is a cartoon character. This means that even though Cyc knows that cartoon characters cannot be real persons, we can put ourselves in the context of the cartoon when appropriate. This contextualization is not just useful for fictional contexts. Consider:
- There are many different legal contexts: e.g. you should drive on different sides of the road in the United States versus England.
- Newtonian and Quantum Physics are inconsistent, but it is often very useful to act as if one or the other is the right model to use.
- In personal belief contexts microtheories are very useful: we can build a microtheory that contains all and only the beliefs held by a given agent to see what would be reasonable for that agent to conclude.
This approach to contextual knowledge has allowed Cycorp to build a massive knowledge base without worries of violating global consistency: we only need to maintain consistency within contexts.
Lastly, Cycorp has a distinguished history of working with a wide variety of clients, ranging from government defense agencies to private companies in the health, energy, and financial sectors. This means that Cyc does not have siloed information that is only relevant to a particular domain or sub-domain. Rather, this knowledge is asserted generally, and has been proven in applications ranging from taxes to chemistry, from engineering to natural language understanding.