background logo
Home

the Cyc NL subsystem

The Cyc-NL system can be described in terms of its three primary components, although in truth there are many other utilities that contribute to its success. The three main components are the lexicon, the syntactic parser, and the semantic interpreter.

the lexicon

The lexicon is the backbone of the NL system. It contains syntactic and semantic information about English words. Each word is represented as a Cyc constant. For example, the constant #$Light-TheWord is used to represent the English word "light". Assertions in the lexicon specify that #$Light-TheWord has noun, verb, adjective, and adverb forms (as in "a bright light", "light a fire", "a light meal", and "touching someone lightly", respectively). Further lexical assertions specify which syntactic patterns the various forms of "light" can appear in (for example, "light" can be a transitive verb, as in "he lit a fire"; it can also appear with certain prepositions, as in "the whole house was lit up"). Most importantly, the lexicon is where links between English words and Cyc constants are stored. The noun "light", for example, has denotation links to two Cyc constants: #$LightEnergy and #$LightingDevice. The other parts of speech of #$Light-TheWord have denotation links to Cyc constants as well.

When Cyc-NL processes an input sentence, it first checks the lexicon to assign possible parts of speech to words in the string. The lexicon (along with our generative morphology component) would assign these parts of speech to the following input string:

Notice that many of the words are ambiguous as to part of speech. It is the job of the syntactic parser to decide which part-of-speech assignments are appropriate, and to build a structure from the sentence which can be passed along to the sematic component for interpretation.

the syntactic parser

The syntactic parser utilizes a phrase-structure grammar loosely based on Government and Binding principles. Using a number of context-free rules, the parser builds tree-structures, bottom-up, over the input string. The parser outputs all trees allowed by the rule system, so multiple parses are possible in cases of syntactic ambiguity.

In the case of the sentence above, the parser generates two tree structures:

{:SENTENCE
  {:NP 
    {:DETP  {#$Determiner  [the]}}
    {:N-BAR {#$SimpleNoun  [man]}}}
  {:VP 
    {#$Verb  [saw]}
        {:NP {:DETP {#$Determiner  [the]}}
             {:N-BAR  {#$SimpleNoun  [light]}}
        {:PP {#$Preposition  [with]}
             {:NP {:DETP {#$Determiner  [the]}}
                  {:N-BAR {#$SimpleNoun  [telescope]}}}}}}}}
  
{:SENTENCE
  {:NP
    {:DETP  {#$Determiner  [the]}}
    {:N-BAR {#$SimpleNoun  [man]}}}
  {:VP
    {#$Verb  [saw]}
    {:NP {:DETP {#$Determiner  [the]}}
         {:N-BAR
                {:N-BAR {#$SimpleNoun  [light]}}
                {:PP {#$Preposition  [with]}
                     {:NP {:DETP {#$Determiner  [the]}}
                          {:N-BAR {#$SimpleNoun  [telescope]}}}}}}}}
In the first tree, the prepositional phrase "with a telescope" attaches to the verb phrase, corresponding to the interpretation "John used a telescope to see the light". In the second tree, the prepositional phrase attaches to the noun phrase, corresponding to the interpretation "John saw the light which had a telescope". These structures are then passed to the semantic component, where they are translated into CycL, and spurious parses are discarded.

the semantic interpreter

Cyc-NL's semantic component transforms syntactic parses into CycL formulas. The output of the semantic component is "pure" CycL: a parsed sentence can immediately be asserted into the KB, for example, or a parsed question can be presented to the SQL generator in order to pose a database query.

Cyc's semantic interpreter incorporates principles of Montague semantics. Semantic structures are built up piece-by-piece and combined into larger structures. For each syntactic rule, there is a corresponding semantic procedure which applies. Cyc-NL's clausal semantics is basically "verb-driven". Verbs are stored in the lexicon with "templates" for their translation into CycL. For example, the template for "believe" when followed by a that-clause might look like this: (#$believes :SUBJECT :CLAUSE). In translating a sentence like "Mary believes that the blue hat is pretty", we retrieve the appropriate template for "believe", then build up the interpretations of the arguments which will fill the :SUBJECT and :CLAUSE slots.

Cyc-NL's semantic component makes use of knowledge in the KB at virtually every level of the interpretation process. In the example "the man saw the light with the telescope", the semantic component would consult the KB to find out whether telescopes are typically used as instruments in seeing, and whether lights are the kinds of things that usually have telescopes. Based on the results of asking the KB, the semantic component would reject the second parse as invalid, and produce a CycL translation of the first parse.

Using commonsense knowledge to guide the interpretation process allows us to deal with the ever-present problem of ambiguity in natural language without having to rely solely on statistical techniques.



Copyright © 2002-2009 Cycorp, Inc. All Rights Reserved. | privacy statement | contact us | home

natural language understanding