The Cyc TPTP Challenge Problem Set

Purpose

In the history of the Cyc project, Cyc's knowledge base and inference engine have evolved in a direction far different from most other theorem provers. Eventually they ceased even speaking the same language; Cyc started speaking higher-order logic and was thus rendered incompatible. An unfortunate consequence of this is that up until now there has been no corpus accessible by both Cyc and the theorem proving community.

This set of challenge problems is that missing link.

Structure

The Cyc TPTP Challenge Problem Set consists of a background theory and many conjectures, expressed in the TPTP format.

The background axioms for the Cyc TPTP Challenge Problem Set are a first-order translation of a subset of the OpenCyc KB plus some additional synthetic content created algorithmically, especially for this problem set. The autogenerated synthetic content is representative of various structures in the ResearchCyc KB, and allows for arbitrary scaling. For details on the first-order translation process, see First-Orderized ResearchCyc: Expressivity and Efficiency in a Common-Sense Ontology by Deepak Ramachandran, P. Reagan, K. Goolsbey.

The conjectures were also generated algorithmically, according to criteria that are explained below. They are large-scale common sense inferences, designed to be representative of a commonly occurring subset of the queries Cyc's inference engine is asked in practice.

The Cyc TPTP Challenge Problem Set consists of two main sections: the Scaling Challenge Problem Set and the Elaboration Challenge Problem Set. Both problem sets test a theorem prover's ability to efficiently perform the types of inference that have proven most useful in the past and present of the Cyc project. The Scaling Challenge Problem Set also tests a theorem prover's ability to find the correct inference path from a vast ocean of distracting chaff. The Elaboration Challenge Problem Set tests all that, plus a theorem prover's ability to efficiently handle small changes to the background theory.

The Scaling Challenge Problem Set

 The Scaling Challenge Problem Set consists of 300 problems that test a theorem prover's ability to perform large-scale common sense reasoning. It consists of 6 segments, and each segment contains 50 problems. All of them are provable. These 50 problems contain a small sampling of inference types representative of the queries Cyc's inference engine is most often asked in practice. Each segment consists of the exact same 50 problems -- the only difference is that each segment contains a larger amount of distracting chaff. The Scaling Challenge Problem Set was first released as part of TPTP v3.4.0 in the CSR (Common Sense Reasoning) domain. The problem numbers are CSR025+S through CSR074+S, where S is the segment number.

Finding the needle in larger and larger haystacks

Here is a breakdown of the 6 segments:

Segment Segment name Segment description Axioms per problem
1 Focused Each problem contains only the exact axioms needed to prove that individual conjecture. ~80
2 Tiny The static background theory shared by all 50 problems contains only the axioms needed to prove the union of all 50 conjectures. ~1100
3 Small As above, with some distracting "chaff" axioms added. The distracting chaff is selected from the OpenCyc KB based on similarity to the axioms that are actually required. ~8,000
4 Medium As above, but with more specially-selected distracting chaff added. ~44,000
5 Large As above, but with many randomly selected chaff axioms added. ~540,000
6 Huge Includes the entire OpenCyc KB as a background theory. This is a superset of the above. ~3,340,000

Representative inference types

Each segment contains the same 50 problems. These 50 problems are broken down into 10 types, with 5 problems of each type. The problem types are designed to be emblematic of common types of inference performed often by Cyc's inference engine.

  1. lookup
  2. genls
  3. isa
  4. disjointWith
  5. genlPreds / genlInverse + lookup
  6. relationAllInstance
  7. relationAllExists
  8. transitivity / transitiveViaArg
  9. symmetry
  10. arg-type

These problem types are ordered and contiguous; the first 5 problems in a segment will be lookup problems, the next 5 will be genls problems, and so on. For example, the relationAllExists problems will be problems CSR055+S through CSR059+S, for each segment S.

Note that most of these problems are "easy" to prove by the standards of the ATP community, at least when there is no chaff. Like NP-complete problems, checking a known solution is indeed easy, and the challenge comes from finding the known solution in a sea of distraction. This is the motivation for providing a corpus with increasing amounts of chaff.

None of the problems in this problem set rely on equality reasoning. This is because Cyc's inference engine makes the Unique Names Assumption by default, and default reasoning is outside the scope of this problem set.

Maintaining state across problems

Each segment shares a background theory stored in a single axiom file. Each problem in a given segment, with the exception of segment 1, consists of a single include directive (the background theory) and a single conjecture. Hence, ATP systems which can maintain state across problems will gain a 50x speedup by loading the background theory once, bookmarking that state, and posing multiple conjectures against the pre-loaded background theory. The meta-information that the background theory is static with respect to the segment is contained in a TPTP "pragma" implemented as a defined comment as specified in the TPTP BNF. An example of the relevant section of the TPTP file is as follows:

%$problem_series(cyc_scaling_1,[CSR025+1,CSR026+1,CSR027+1,CSR028+1,CSR029+1,CSR030+1,CSR031+1,CSR032+1,CSR033+1,CSR034+1,CSR035+1,CSR036+1,CSR037+1,CSR038+1,CSR039+1,CSR040+1,CSR041+1,CSR042+1,CSR043+1,CSR044+1,CSR045+1,CSR046+1,CSR047+1,CSR048+1,CSR049+1,CSR050+1,CSR051+1,CSR052+1,CSR053+1,CSR054+1,CSR055+1,CSR056+1,CSR057+1,CSR058+1,CSR059+1,CSR060+1,CSR061+1,CSR062+1,CSR063+1,CSR064+1,CSR065+1,CSR066+1,CSR067+1,CSR068+1,CSR069+1,CSR070+1,CSR071+1,CSR072+1,CSR073+1,CSR074+1])
%$static(cyc_scaling_1,include('Axioms/CSR002+0.ax'))
include('Axioms/CSR002+0.ax').

The first defined comment, beginning with %$problem_series, declares a problem series cyc_scaling_1, segment 1 of the Cyc Scaling Challenge Problem Set, which consists of the stated list of TPTP problems.

The second defined comment, beginning with %$static, declares that the stated directive, in this case an include directive, is static with respect to the stated problem series. In particular, this means that the include directive for the axiom file Axioms/CSR002+0.ax is static (i.e. will not change) with respect to each problem in the cyc_scaling_1 problem series.

The third directive is the actual include directive. A clever theorem prover could pay attention to the preceding defined comments and handle this include specially, since it has the meta-knowledge that it is static with respect to this problem series.

The Elaboration Challenge Problem Set

The Elaboration Challenge Problem Set consists of 300 problems with about 3,280,000 axioms each. The Elaboration Challenge Problem Set is designed to be more challenging than the Scaling Challenge Problem Set and to be even more representative of the problems Cyc's inference engine typically faces. Developers are advised to tackle the Scaling Challenge Problem Set before the Elaboration Challenge Problem Set. The Elaboration Challenge Problem Set tests everything in the Scaling Challenge Problem Set, and also tests a system's elaboration tolerance.

A formalism is elaboration tolerant to the extent that it is convenient to modify a set of facts expressed in the formalism to take into account new phenomena or changed circumstances. -John McCarthy[1]

The Elaboration Challenge Problem Set has not yet been released as part of the TPTP, but is available for download.

The Cyc knowledge base is dynamic, not static. It has a persistent axiom store that can be edited in real time -- axioms and terms can be created or deleted without need for any global recompilation. The Elaboration Challenge Problem Set simulates this process. To perform optimally on this corpus, a system will maintain persistent state and be elaboration tolerant to changes to that state.

Each TPTP problem in the Elaboration Challenge Problem Set consists of three parts:

  1. The static KB
  2. The dynamic KB
  3. The conjecture

Static KB

Every problem in the Elaboration Challenge Problem Set shares the same static KB. It is the first axiom file included, and it has the same pragmas as it does in the Scaling Challenge Problem Set (see above). It contains about 94% of the translated OpenCyc KB, including the autogenerated synthetic content. The static KB is very similar to the background theory in segment 6 of the Scaling Challenge Problem Set. The only differences are that 6% of the axioms are not present and the order is different.

Dynamic KB

Each problem in the Elaboration Challenge Problem Set includes a subset of the dynamic KB. The dynamic KB is chunked into many include files, each containing about 4500 axioms, some of which are duplications of axioms in the static KB and some of which are not. Some of these dynamic KB chunks will be included for a given problem and some will not. The deltas are small between adjacent problems in the series, so a clever theorem prover could take advantage of this by caching most of the dynamic KB and only adding or deleting the contents of those include files that differ between the current problem and the previous problem. This simulates a series of queries interspersed with a series of theory revisions.

Conjectures

The conjectures are of the same 10 inference types as in the Scaling Challenge Problem Set, with some additions such as reflexivity. The distribution is roughly equal between the inference types, but unlike the Scaling Challenge Problem Set the order of each problem is randomized. In addition, conjectures are often repeated at different points in the series; the same conjecture with a different subset of the dynamic KB. Often a conjecture will be provable with one subset of the dynamic KB and will not be provable with a different subset of the dynamic KB; the repetition of conjectures tests such minimal pairs.

Contradictions

There are contradictions in the Cyc knowledge base. Cyc has safeguards in place (such as WFF-checking and tightly resource-bounded inference upon theory revision) to avoid many types of obvious contradictions, but a few unintentional contradictions still slip through. With any knowledge base so large, especially one that is revised dynamically and asynchronously by multiple users, global consistency is an unrealistic expectation. This opens the possibility of theorem provers trivially solving all these challenge problems by finding one contradiction in the static KB, caching it, and appealing to reductio ad absurdum. While this is technically a solution to the problem, it in no way contributes to the advancement of research in large-scale common-sense inference. The existence of contradictions in large-scale common-sense knowledge bases has forced Cyc's inference engine to make a different design decision -- to studiously steer the search away from such contradictions and to isolate their effect on unrelated inferences as much as possible.

Cyc's inference engine isolates contradictions in three ways:

  1. Microtheories
  2. Avoiding arbitrary reductio ad absurdum (RAA) proofs
  3. Higher-order meta-reasoning about what inference rules to apply or forbid in certain situations

Microtheories

Microtheories are translated into TPTP, albeit with a loss of expressivity. Despite this, this method of avoiding contradictions does carry over to the challenge problem sets.

RAA proofs

Cyc's inference engine can perform RAA proofs, but it does not perform arbitrary RAA proofs. To perform well in a huge theory, theorem provers must necessarily do set-of-support reasoning to avoid being distracted by irrelevant background knowledge. If set-of-support is taken seriously, then most potential RAA proofs will not be encountered in the inference search.

Higher-order meta-reasoning

Cyc's inference engine performs several kinds of higher-order meta-reasoning, some of which help avoid contradictions. An important example of this is that Cyc performs WFF-checking on intermediate steps of the inference and disallows an inference step if it would result in an ill-formed result. This is part of how Cyc handles partial functions (in specific, arg-type constraints on functions). Since all functions are total in FOL, this may be a source of contradictions in these challenge problem sets, because partial functions in Cyc are translated as total functions in FOL.

Advisement

Given the limitations of FOL and the imperfect translation between Cyc and TPTP, theorem prover authors are advised to solve these challenge problems in a backward, query-focused way, and to take set-of-support seriously. This will help avoid irrelevant contradictions in the background theory whenever possible.

Future Work

Cycorp intends to create a similar but richer set of challenge problems for the ResearchCyc KB. ResearchCyc contains considerably more knowledge than OpenCyc, and its knowledge is much more entangled and varied. ResearchCyc also has a much wider variety of rules, allowing for much longer chains of common sense reasoning. The ResearchCyc Challenge Problem Set will be even more challenging and even more representative of the common sense inferences faced by Cyc.

Download

Download it now!

You can also download the Scaling Challenge Problem Set (but not yet the Elaboration Challenge Problem Set) as part of the TPTP.

The Cyc TPTP Challenge Problem Set is available under the following license:

  • Copyright Information Cyc(R) Knowledge Base Copyright(C) 1995-2008 Cycorp, Inc., Austin, TX, USA. All rights reserved.
  • OpenCyc Knowledge Base Copyright(C) 2001-2008 Cycorp, Inc., Austin, TX, USA. All rights reserved.

Other copyrights may be found in various files. The OpenCyc Knowledge Base The OpenCyc Knowledge Base consists of code, written in the declarative language CycL, that represents or supports the representation of facts and rules pertaining to consensus reality. OpenCyc is licensed using the Apache License, Version 2, whose text can be found at http://www.apache.org/licenses/LICENSE-2.0.html. The OpenCyc CycL code base is the "Work" referred to in the Apache license. The terms of this license equally apply to renamings and other logically equivalent reformulations of the Knowledge Base (or portions thereof) in any natural or formal language.

How to cite this documentation

Here is a suggested citation for this document.

Pace Reagan Smith, G. Sutcliffe, K. Goolsbey, R.C. Kahlert. (© 2007). The Cyc TPTP Challenge Problem Set. OpenCyc.org.
Retrieved [current date] from