the Cyc NL subsystem
The Cyc-NL system can be described in terms of its three primary
components, although in truth there are many other utilities that
contribute to its success. The three main components are the lexicon,
the syntactic parser, and the semantic interpreter.
the lexicon
The lexicon is the backbone of the NL system. It contains syntactic and
semantic information about English words. Each word is represented as a
Cyc constant. For example, the constant
#$Light-TheWord is used to
represent the English word "light". Assertions in the lexicon
specify that #$Light-TheWord has noun, verb,
adjective, and adverb forms
(as in "a bright light", "light a fire", "a light meal", and "touching
someone lightly", respectively). Further lexical assertions specify
which syntactic patterns the various forms of "light" can appear in (for
example, "light" can be a transitive verb, as in "he lit a fire"; it can
also appear with certain prepositions, as in "the whole house was lit
up"). Most importantly, the lexicon is where links between English words
and Cyc constants are stored. The noun "light", for example, has
denotation links to two Cyc constants: #$LightEnergy
and #$LightingDevice. The other parts of
speech of #$Light-TheWord have
denotation links to Cyc constants as well.
When Cyc-NL processes an input sentence, it first checks the lexicon to
assign possible parts of speech to words in the string. The lexicon
(along with our generative morphology component) would assign these parts
of speech to the following input string:

Notice that many of the words are ambiguous as to part of speech. It is
the job of the syntactic parser to decide which part-of-speech
assignments are appropriate, and to build a structure from the sentence
which can be passed along to the sematic component for interpretation.
the syntactic parser
The syntactic parser utilizes a phrase-structure grammar loosely based
on Government and Binding principles. Using a number of context-free
rules, the parser builds tree-structures, bottom-up, over the input
string. The parser outputs all trees allowed by the rule system, so
multiple parses are possible in cases of syntactic ambiguity.
In the case of the sentence above, the parser generates two tree
structures:
{:SENTENCE
{:NP
{:DETP {#$Determiner [the]}}
{:N-BAR {#$SimpleNoun [man]}}}
{:VP
{#$Verb [saw]}
{:NP {:DETP {#$Determiner [the]}}
{:N-BAR {#$SimpleNoun [light]}}
{:PP {#$Preposition [with]}
{:NP {:DETP {#$Determiner [the]}}
{:N-BAR {#$SimpleNoun [telescope]}}}}}}}}
{:SENTENCE
{:NP
{:DETP {#$Determiner [the]}}
{:N-BAR {#$SimpleNoun [man]}}}
{:VP
{#$Verb [saw]}
{:NP {:DETP {#$Determiner [the]}}
{:N-BAR
{:N-BAR {#$SimpleNoun [light]}}
{:PP {#$Preposition [with]}
{:NP {:DETP {#$Determiner [the]}}
{:N-BAR {#$SimpleNoun [telescope]}}}}}}}}
In the first tree, the prepositional phrase "with a telescope" attaches
to the verb phrase, corresponding to the interpretation "John used a
telescope to see the light". In the second tree, the prepositional
phrase attaches to the noun phrase, corresponding to the interpretation
"John saw the light which had a telescope". These structures are then
passed to the semantic component, where they are translated into CycL,
and spurious parses are discarded.
the semantic interpreter
Cyc-NL's semantic component transforms syntactic parses into CycL
formulas. The output of the semantic component is "pure" CycL: a parsed
sentence can immediately be asserted into the KB, for example, or a
parsed question can be presented to the SQL generator in order to pose a
database query.
Cyc's semantic interpreter incorporates principles of Montague
semantics. Semantic structures are built up piece-by-piece and combined
into larger structures. For each syntactic rule, there is a
corresponding semantic procedure which applies. Cyc-NL's clausal semantics is
basically "verb-driven". Verbs are stored in the lexicon with
"templates" for their translation into CycL. For example, the template
for "believe" when followed by a that-clause might look like this:
(#$believes :SUBJECT :CLAUSE). In translating a
sentence like "Mary
believes that the blue hat is pretty", we retrieve the appropriate
template for "believe", then build up the interpretations of the
arguments which will fill the :SUBJECT and
:CLAUSE slots.
Cyc-NL's semantic component makes use of knowledge in the KB at
virtually every level of the interpretation process. In the
example "the man saw the light with the telescope", the semantic
component would consult the KB to find out whether telescopes are
typically used as instruments in seeing, and whether lights are the
kinds of things that usually have telescopes. Based on the results of
asking the KB, the semantic component would reject the second parse as
invalid, and produce a CycL translation of the first parse.
Using commonsense knowledge to guide the interpretation process allows
us to deal with the ever-present problem of ambiguity in natural
language without having to rely solely on statistical techniques.