|
the Cyc Knowledge BaseTM The Cyc knowledge base (KB) is a formalized representation of a vast quantity of fundamental human knowledge: facts, rules of thumb, and heuristics for reasoning about the objects and events of everyday life. The medium of representation is the formal language CycL, described below. The KB consists of terms--which constitute the vocabulary of CycL--and assertions which relate those terms. These assertions include both simple ground assertions and rules. Cyc is not a frame-based system: the Cyc team thinks of the KB instead as a sea of assertions, with each assertion being no more "about" one of the terms involved than another. The Cyc KB is divided into many (currently thousands of) "microtheories", each of which is essentially a bundle of assertions that share a common set of assumptions; some microtheories are focused on a particular domain of knowledge, a particular level of detail, a particular interval in time, etc. The microtheory mechanism allows Cyc to independently maintain assertions which are prima facie contradictory, and enhances the performance of the Cyc system by focusing the inferencing process. At the present time, the Cyc KB contains nearly two hundred thousand terms and several dozen hand-entered assertions about/involving each term. New assertions are continually added to the KB by human knowledge enterers. Additionally, term-denoting functions allow for the automatic creation of millions of non-atomic terms, such as (LiquidFn Nitrogen); and Cyc adds a vast number of assertions to the KB by itself as a product of the inferencing process.
natural-language processing Consider the following pair of sentences:
Here are a couple more examples; these involve pronoun disambiguation:
Cyc's NL capabilities form the foundation for applications in knowledge-enhanced searching of captioned information, and for user-friendly interfaces to other applications, including the database integration application. Future directions for Cyc-NL will include:
Other potential applications are myriad. For more information, see the more detailed description of the Cyc NL subsystem.
Semantic Integration BusTM Cyc treats each database record as if it were an implicit assertion in the knowledge base. These implicit assertions are then available during inference. Similarly, text fields can be read using the natural language processor to see if they contain any useful implicit assertions. Sometimes the assertions describe what the text is "about". Cyc can use this information to locate and report information resources which the user may employ to answer a particular query.
![]()
In the above diagram, information stored in a database or on the web is made available to the
inference engine as virtual assertions. These sets of virtual assertions are managed by heuristic
level (HL) modules. For example, the inference engine "broadcasts" a query on the bus. An HL
module recognizes that the request asks for an assertion which maps into its virtual knowledge
space. The HL module intercepts the request, communicates with the database, web site or other
knowledge source, and returns bindings to the inference engine. Inference then continues,
combining information from multiple sources.
The most commonly-used tool, our HTML browser, allows the user to view the KB in a hypertexty way. HTML pages describing Cyc terms are generated on the fly by the Cyc system. Each page describes a Cyc term by showing all the assertions in which it is involved, organized according to a standard schema. Every occurrence of a Cyc term is an HTML link to a (dynamically-generated) HTML page describing that term, so that it is easy to surf around the KB following a network of relationships. The HTML browser also includes facilities for searching and editing the KB and for posing queries to the inference engine. Other HTML interface tools include:
|