Lingua Franca Magazine Volume 11, No. 6 - September 2001 (COVER STORY)
The Know-It-All Machine - An audacious quest to teach a computer common sense -- one fact at a time
FLORENCE ROSSIGNOL HAS JUST FINISHED using an on-line travel site to plan a package tour across Europe. The site has prompted her for a few facts about herself: her date of birth, her education and nationality, her occupation. She has typed in that she was born in 1945 and trained as a nurse. She has also volunteered the fact that she is claustrophobic. As far as on-line shopping goes, it looks like an everyday event.
Except this Web site is smart -- unusually smart. It has been outfitted with a copy of Cyc (pronounced sike), artificial-intelligence software touted for its ability to process information with humanlike common sense.
At one point, Cyc detects a problem: The proposed tour involves taking the Channel Tunnel from London to France; Rossignol is claustrophobic. The Web site notes that Rossignol "may dislike" the Channel Tunnel, and Cyc justifies the assertion with a series of ten related statements, including:
31 miles is greater than 50 feet.
The Channel Tunnel is 31 miles long.
Florence Rossignol suffers from claustrophobia.
Any path longer than 50 feet should be considered "long" in a travel context.
If a long tunnel is a route used by a tour, a claustrophobic person taking the tour might dislike the tunnel.
At the same time, Cyc scours the list of various cities on the tour and takes special notice of Geneva, where one can visit the Red Cross Museum. This time, Cyc's thinking features the following steps:
The Red Cross Museum is found in Geneva.
Florence Rossignol is a nurse.
Nursing is what nurses do.
The Red Cross Museum (organization) has nursing as its "focus."
If an organization has a particular type of activity as its "focus," and a person holds a position in which they perform that activity, that person will feel significantly about that organization.
Bingo. The travel site tells Rossignol to make sure she catches the Red Cross Museum in Geneva -- but for God's sake, don't take the Channel Tunnel.
Doug Lenat is pleased. Though this impressive display happens to be a promotional demo (Rossignol is a fictitious character), it is, he explains, genuinely representative of his invention's unique abilities. Most computer programs are utterly useless when it comes to everyday reasoning because they don't have very much common sense. They don't know that claustrophobics are terrified of enclosed spaces. They don't know that fifty feet can sometimes be considered a "long" distance. They don't even know something as tautological as "Nursing is what nurses do."
Cyc, however, does know such things -- because Lenat has been teaching it about the world one fact at a time for seventeen long years. "We had to kick-start a computer, give it all the things we take for granted," he says. Ever since 1984, the former Stanford professor has been sitting in Cyc's Austin, Texas, headquarters and writing down the platitudes of our "consensus reality" -- all the basic facts that we humans know about the world around us: "Water is wet"; "Everyone has a mother"; "When you let go of things they usually fall." Cyc currently has a database of about 1.5 million of these key assertions. Taken together, they are helping Lenat create what he calls the first true artificial intelligence (AI) -- a computer that will be able to speak in English, reason about the world, and, most unnerving, learn on its own. Cyc is easily the biggest and most ambitious AI project on the planet, and by the time it's completed, it will probably have consumed Lenat's entire career.
Encoding common sense is so formidable a task that no other AI theorists have ever dared to try anything like it. Most have assumed it isn't even possible. Indeed, with Cyc, Lenat has tweaked the noses of legions of AI researchers who have largely given up on the rather sci-fi-like dream of creating humanlike intelligence and have focused instead on much smaller projects -- so-called expert systems that perform very limited intelligent tasks, such as controlling a bank machine or an elevator. "Doug's really one of the only people still trying to slay the AI dragon," says Bill Andersen, a Ph.D. candidate specializing in ontologies (information hierarchies) at the University of Maryland and a former Department of Defense researcher who has used Cyc in several Defense Department experiments.
For all his progress, Lenat still receives mixed responses from much of the academic AI community. Not only does Cyc's highly pragmatic approach fly in the face of much scholarly AI theory, but its successes have taken place at Lenat's Cycorp, which is developing Cyc as a for-profit venture. Incensing his critics, Lenat has published almost no academic papers on Cyc in recent years, raising suspicions that it may have many undisclosed flaws. "We don't really know what's going on inside it, because he doesn't show anyone," complains Doug Skuce, an AI researcher at the University of Ottawa.
Lenat, meanwhile, revels in his bad-boy image. He accuses academic AI experts of being theory-obsessed and unwilling to do the hard work necessary to tackle common sense. "They want it to be easy. There are people who'd rather talk about doing it than actually do it," he says, laughing.
Still, some skeptics think Lenat could benefit from more deliberation and less action. "It's kind of crazy," says the Yale computer science professor Drew McDermott about Lenat's ambition. "Philosophers have been debating common sense for years, and they don't even know how it works. Lenat thinks he's going to get common sense going in a computer?" Push Singh, a graduate student at the Massachusetts Institute of Technology who is building a rival upstart to Cyc, has his own doubts: "Lenat and his team have been going for fifteen years and have only one million rules? They'll never get enough knowledge in there at that rate."
Lenat plans to take a huge step toward silencing his critics this fall, when he begins to "open source" Cyc. That is, he intends for the first time to allow people outside of Cycorp to experiment with their own copies of Cyc -- and train it in their own commonsense knowledge. Eventually he proposes to let everyone in the world talk to Cyc and teach it new information, to elevate its knowledge to a level of near omniscience. "It'll get to the point where there's no one left for it to talk to," says Lenat, his eyes twinkling mischievously.
But when everyone can speak to Cyc, what exactly will they tell it? Can Cyc -- or any AI system, for that matter -- truly embody the common knowledge of humanity, with all its many layers, its contradictions and ambiguities?
SINCE THIS is 2001, Lenat has spent the year fielding jokes about HAL 9000, the fiendishly intelligent computer in Arthur C. Clarke's 2001: A Space Odyssey. On one occasion, when television reporters came to film Cyc, they expected to see a tall, looming structure. But because Cyc doesn't look like much -- it's just a database of facts and a collection of supporting software that can fit on a laptop -- they were more interested in the company's air conditioner. "It's big and has all these blinking lights," Lenat says with a laugh. "Afterwards, we even put a sign on it saying, CYC 2001, BETTER THAN HAL 9000."
But for all Lenat's joking, HAL is essentially his starting point for describing the challenges facing the creation of commonsense AI. He points to the moment in the film 2001 when HAL is turned on -- and its first statement is "Good morning, Dr. Chandra, this is HAL. I'm ready for my first lesson."
The problem, Lenat explains, is that for a computer to formulate sentences, it can't be starting to learn. It needs to already possess a huge corpus of basic, everyday knowledge. It needs to know what a morning is; that a morning might be good or bad; that doctors are typically greeted by title and surname; even that we greet anyone at all. "There is just tons of implied knowledge in those two sentences," he says.
This is the obstacle to knowledge acquisition: Intelligence isn't just about how well you can reason; it's also related to what you already know. In fact, the two are interdependent. "The more you know, the more and faster you can learn," Lenat argued in his 1989 book, Building Large Knowledge-Based Systems, a sort of midterm report on Cyc. Yet the dismal inverse is also true: "If you don't know very much to begin with, then you can't learn much right away, and what you do learn you probably won't learn quickly."
This fundamental constraint has been one of the most frustrating hindrances in the history of AI. In the 1950s and 1960s, AI experts doing work on neural networks hoped to build self-organizing programs that would start almost from scratch and eventually grow to learn generalized knowledge. But by the 1970s, most researchers had concluded that learning was a hopelessly difficult problem, and were beginning to give up on the dream of a truly human, HAL-like program. "A lot of people got very discouraged," admits John McCarthy, a pioneer in early AI. "Many of them just gave up."
Undeterred, Lenat spent eight years of Ph.D. work -- and his first few years as a professor at Stanford in the late 1970s and early 1980s -- trying to craft programs that would autonomously "discover" new mathematical concepts, among other things. Meanwhile, most of his colleagues turned their attention to creating limited, task-specific systems that were programmed to "know" everything that was relevant to, say, monitoring and regulating elevator movement. But even the best of these expert systems are prone to what AI theorists call "brittleness" -- they fail if they encounter unexpected information. In one famous example, an expert system for handling car loans issued a loan to an eighteen-year-old who claimed that he'd had twenty years of job experience. The software hadn't been specifically programmed to check for this type of discrepancy and didn't have the common sense to notice it on its own. "People kept banging their heads against this same brick wall of not having this common sense," Lenat says.
By 1983, however, Lenat had become convinced that commonsense AI was possible -- but only if someone were willing to bite the bullet and codify all common knowledge by brute force: sitting down and writing it out, fact by fact by fact. After conferring with MIT's AI maven Marvin Minsky and Apple Computer's high-tech thinker Alan Kay, Lenat estimated the project would take tens of millions of dollars and twenty years to complete.
"All my life, basically," he admits. He'd be middle-aged by the time he could even figure out if he was going to fail. He estimated he had only between a 10 and 20 percent chance of success. "It was just barely doable," he says.
But that slim chance was enough to capture the imagination of Admiral Bobby Inman, a former director of the National Security Agency and head of the Microelectronics and Computer Technology Corporation (MCC), an early high-tech consortium. (Inman became a national figure in 1994 when he withdrew as Bill Clinton's appointee for secretary of defense, alleging a media conspiracy against him.) Inman invited Lenat to work at MCC and develop commonsense AI for the private sector. For Lenat, who had just divorced and whose tenure decision at Stanford had been postponed for a year, the offer was very appealing. He moved immediately to MCC in Austin, Texas, and Cyc was born.
LENAT BEGAN building Cyc by setting himself a seemingly modest challenge. He picked a pair of test sentences that Cyc would eventually have to understand: "Napoleon died in 1821. Wellington was greatly saddened." To comprehend them, Cyc would need to grasp such basic concepts as death, time, warfare, and France, as well as the sometimes counterintuitive aspects of human emotion, such as why Wellington would be saddened by his enemy's demise. Lenat and a few collaborators began writing these concepts down and constructing a huge branching-tree chart to connect them. They produced a gigantic list of axiomatic statements -- fundamental assumptions -- that described each concept in Cyc's database: its properties, how it interacted with other things. "We took enormous pieces of white paper," Lenat remembers, "and filled walls, maybe 150 feet long by about 8 feet high, with little notes and circles and arrows and whatnot."
Over the next few years, those axioms ballooned in number -- eventually including statements as oddly basic as:
You should carry a glass of water open end up.
The U.S.A. is a big country.
When people die, they stay dead.
The axioms aren't written in everyday English, which is too ambiguous and nuanced a language for a computer to understand. Instead, Cyc's "Ontological Engineers" -- Lenat's staff of philosophers and programmers, who call themselves Cyclists -- express each axiom in CycL, a formal language that Lenat's team devised. Based on the sort of symbolic notation that logicians and philosophers use to formalize claims about the world, CycL looks like this:
(forAll ?X (implies (owns Fred ?X) (objectFoundInLocation ?X FredsHouse)))
This expression states that if Fred owns any object, ?X, then that object is in Fred's house. In other words, as Cyclists put it, "all Fred's stuff is in his house." (Of course, as with all Cyc's knowledge, this claim becomes useful only in conjunction with other truths that Cyc knows -- such as the fact that a person's car or beachfront property is too large to fit in his house.)
Cyc's inventory of the world, however, is only one part of its setup. The other part is its "inference engine," which allows Cyc to deploy its immense store of factual knowledge. This engine includes Cyc's "heuristic layer" -- a collection of more than five hundred small modules of software code that perform logical inferences and deductions, as well as other feats of data manipulation. One module, for example, implements traditional modus ponens logic: If Cyc knows a fact of the form "If X, then Y," and Cyc knows "X," then it will conclude "Y." Other modules have the ability to sort facts by, say, chronological order.
On the one hand, the inference engine is what actually gives Cyc its innate smarts; without it, Cyc wouldn't be able to do anything with the information at its disposal. But on the other hand, as Lenat emphasizes, a computer can have state-of-the-art powers of data manipulation and still be worthless from a practical point of view; no machine can help you reason about the real world if it does not have commonsense knowledge to work with. Data manipulation, in Lenat's view, is the comparatively easy part. It's the data themselves that are devilishly difficult.
From the perspective of computing power, commonsense knowledge presents an additional difficulty in its sheer mass. As Cyc's knowledge base grew, the program had to sort through thousands of facts whenever it tried to reason. It began to slow down. "If you're trying to talk about the weather today, you don't want Cyc worrying about whether spiders have eight legs," Lenat explains. So the Cyclists have created "contexts" -- clumpings of like-minded facts that help speed up inferencing. If Cyc is told a fact about tour trips to Greece, for example, it begins with its existing knowledge about Europe, travel, trains, and the like. In this sense, Cyc's processing strategies are akin to human cognition; we can discuss any given topic only by ignoring "99.999 per cent of our knowledge," as Lenat has written.
BY THE EARLY 1990s, Cyc had acquired hundreds of thousands of facts about the world and could already produce some startlingly powerful results. For example, Cyc could search through databases to find inconsistent information, such as twins who were listed with different birth dates. Cyc didn't need to be specially programmed to look for that sort of error -- it just "knew" the commonsense idea that twins are always the same age. Pleased with Cyc's progress, Lenat spun his venture off from MCC to form Cycorp, a free-standing company.
Still, teaching Cyc new knowledge remains an excruciatingly slow process, filled with trial and error. If the Cyclists make a mistake, or forget to articulate explicitly some aspect of a concept, Cyc can reach some wholly implausible conclusions about the world.
A few months ago, for instance, Charles Klein, Cycorp's thirty-three-year-old director of ontological engineering, was asking Cyc a few test questions when he discovered something odd. Cyc apparently believed that if a bronze statue were melted into slag, it would remain a statue. What had gone wrong? Why was Cyc making such a basic mistake?
After a bit of forensic work, Klein found the problem. The Cyclists hadn't completely distinguished the concepts of bronze and statue. Cyc had been told that bronze was a material that retained its essential property -- its "bronzeness," as it were -- no matter what state it was in, solid or liquid. But now Cyc was trying to apply that fact to the statue aspect of "bronze statue." Cyc hadn't been told anything about statues that would invalidate its conclusion; nobody had ever thought it necessary to tell Cyc, for example, that statues are only statues if they're more or less in their original form. It's common sense, sure -- but who would bother to meditate on it? "Trying to think of everything," Klein quips, "is quite daunting."
This is the chief problem that all Cyclists face: Commonsense knowledge is invisible. It's defined as much by what we don't say as by what we do say. Common knowledge is what we assume everyone has, because it's, well, obvious. This is precisely what makes commonsense knowledge so powerful as an intellectual tool -- but it's also what makes it so hard to identify and codify.
"There's no book that you could read about common sense," Lenat points out, "that says things like, If you leave a car unlocked for ten seconds, then turn around, the odds of it still being there are really high. But if you leave a car unlocked for ten years, the odds are really low that it'll still be there when you come back."
The example of the unlocked car seems to raise a troublesome theoretical question: Is this common sense because it is a basic fact about the world that most everyone knows? Or is it common sense because it demonstrates sound reasoning? Just as important: Is this sort of conceptual concern a genuine threat to Cyc, or just more ivory-tower quibbling?
IN ACADEME, Cyc has always been a black sheep. Everyone in AI knows about it, and virtually everyone views it with skepticism. Lenat's critics have lambasted him for lacking a coherent theory of how intelligence and knowledge representation work -- and for rushing ahead with an ill-thought-out project. When they look at Cyc, they see nothing but an ad hoc jumble of facts about the world picked in an overly idiosyncratic way by Lenat's team. For them, making human-level AI requires a better theoretical understanding of human intelligence and the fundamentals of reasoning and representation -- and those are areas that are still, many argue, in their infancy.
In a 1993 issue of Artificial Intelligence, several reviewers sharply critiqued Lenat's 1989 book about Cyc. Yale's Drew McDermott led the charge, arguing that it was impossible to build a commonsense database without solving such philosophical problems as "the nature of causality." "We've been thinking about things like that for millennia," he points out.
McDermott suspects that it may not yet be possible to represent real-world common knowledge in logical, orderly languages such as Cyc's CycL -- or any other language, for that matter. After all, humans don't always store and manipulate knowledge logically or in language. "If you go through a room and you don't bump into things, is that common sense?" he wonders. Nils Nilsson, an AI pioneer and Stanford professor emeritus, shares that concern. "You can describe in words how to swing a golf club," he concedes. "But can that really tell you how to do it? We still don't really know how to represent knowledge."
Critics and fans of Cyc both recognize that the goal of producing a complete inventory of commonsense facts is almost embarrassingly open to theoretical objections. Because so many philosophical issues about how to represent knowledge remain unsolved, building large knowledge bases is "something of a backwater," according to Ernest Davis, a professor of computer science at New York University and the author of Representations of Common Sense Knowledge. Starting in the 1980s, for example, much of the excitement in AI began to center around the use of narrowly focused self-learning systems -- like neural nets -- to crunch enormous bodies of data. Such systems are intended to "learn" to recognize patterns on their own, instead of being painstakingly taught them by humans. "People like it because it's a lot faster," Davis explains. "You'll never be able to get commonsense AI out of it, but you can do some pretty neat things," such as develop programs that can recognize visual images.
Lenat, however, is unmoved. He bashes right back at the naysayers every chance he gets -- often in searingly witty prose. In his response to the 1993 reviews by McDermott and others, he argued that theory-heavy AI experts were suffering from the Hamlet syndrome -- unwilling to take action and stuck in a circle of philosophizing. Too many AI theorists, he sneered, were "researchers with 'physics envy,' insisting that there must be some elegant 'free lunch' approach to achieving machine intelligence."
For Lenat, having a watertight theory isn't necessary for building useful AI. Quite the contrary: He argues that building a body of commonsense knowledge can only be done in a down-and-dirty engineering style. You put together a bunch of facts, test the system, see where it breaks, fix the bugs, and keep on adding more knowledge. Each day, when Cyclists talk to Cyc, they discover new erroneous assumptions that Cyc has -- or new information that it doesn't have. "It's iterative. You have to do it every day, keep at it," says the Cyclist Charles Pearce.
Lenat compares building up Cyc to building a bridge. "You know, you have a stream and you want to build a bridge over it -- you can either do your best, experiment, and build the bridge. Or you can work out the physics of bridge building," he says. "It's only in the last hundred years or so that the theory of bridge building has been understood. Before that, it was almost like apprenticing to a master. There would be someone who would just intuitively know how to build a bridge -- and every now and then, he would build one that would fall down."
Lenat's critics also complain that by developing Cyc in the private sector, he is forced to keep his cards too close to his chest. Aside from publishing his now-dated 1989 book, he and his staff have produced barely a handful of papers about Cyc in academic or trade magazines. In fact, many AI academics -- even Lenat's supporters -- say they don't know enough of what's going on inside Cyc to be able to approve or disapprove. "He doesn't publish, because he doesn't need to," says one critic. "When I suggest things they ought to do, they just say, Well, we're not being paid to do that."
Compounding this resentment is the fact that in the world of large-scale knowledge bases, Cyc is the biggest -- perhaps the only -- game in town. As a result, some academics worry that Cyc's example discourages other AI researchers from building competing knowledge bases. Lenat's seventeen years of painstaking labor has shown exactly how difficult and involved the project is. For his part, Lenat quietly relishes the monopoly he has on commonsense knowledge. Some of his funders, including the Department of Defense and GlaxoSmithKline, already use experimental versions of Cyc to help "scrub" data in databases, cleaning out data-entry errors. Ultimately Lenat hopes to license Cyc to all software makers worldwide -- as a layer of intelligence that will make their systems less brittle. It could become like "Intel inside," he suggests, or the Microsoft Windows of the AI world.
In the meantime, Lenat cheerily admits to having "almost no relationship" with the academic world. It has rejected him, he argues -- not the other way around. Ten years ago, he sent some of his staff to speak, on invitation, at a major academic conference on common sense. Their papers were "really practical stuff," he says. "All this really hard-core work we'd done on how to represent knowledge about concepts like fluids and countries and people." But the conference organizers confined the Cyc papers and speakers to a single panel, Lenat recalls, and few people showed up. "It was like they'd rather talk about doing things than actually hear from people who are doing things," he complains.
WHATEVER the theoretical inelegance of Cyc, Lenat can always fall back on one powerful defense: It works well -- or at least better than anything comparable so far. During a visit to Cycorp, I watch a videotaped demonstration of a biological warfare expert teaching Cyc about a new weapon similar to anthrax. Cyc demonstrates its grasp of common knowledge -- not just about the physical world, but about the rather more ephemeral world of pop culture.
At one point, the expert asks Cyc what it knows about anthrax. Cyc pauses for a second, then asks: "Do you mean Anthrax (the heavy metal band), anthrax (the bacterium), or anthrax infection (the infection)?" The official notes that it's the bacterium, not the band. Cyc asks what type of organism the agent infects. The military official types: "People."
Cyc thinks again for an instant, then responds: "By People, I assume you mean persons, not People magazine."
By the end of the exchange, Cyc has successfully absorbed various facts about the agent -- how it is destroyed (by encasing it in concrete), its color (green), and which terrorists possess it (Osama bin Laden). But the demo also illustrates some of Cyc's limitations. Obviously, it wasn't easily able to figure out the military context of the exchange -- otherwise it wouldn't have needed to ask whether "anthrax" signified a heavy metal band.
An even bigger limitation lurks behind that one: the fact that common sense is almost infinite. This is a problem that still threatens to doom Lenat's grandest ambitions. Sure, you could eventually input the billions of bits of common knowledge worldwide. But at the rate the Cyclists are going, that would take millennia; the limited resources of Cycorp's programmers are not enough. Even Cyc's supporters see this as a major stumbling point. "The amount of knowledge you need will easily outpace the ability of the builders to input it," as Nilsson says.
Part of the problem, Lenat concedes, has been that nobody except for scientists and philosophers trained in formal-logic languages can master CycL well enough to input knowledge reliably into Cyc. "This isn't a skill the average person has," he says. Moreover, Cyc's early development was particularly sensitive. Because one small piece of wrong information could cause enormous problems later, Lenat had to be careful, he says, to prevent erroneous material from getting in. Even though he gave out restricted parts of Cyc's knowledge base to academics to examine and experiment with, he didn't accept new knowledge from them.
But Lenat remains optimistic. As Cyc grows larger and more robust, he says, it is becoming less fragile and more likely to detect nonsense inputs. Starting this fall, Lenat will release parts of Cyc -- for free -- to anyone who wants them. Technically sophisticated users will be able to add knowledge to their own copies of Cyc and, if they choose, send the new information back to Cycorp to be integrated into the master copy -- the "master" Cyc, as it were. For the first few years, Lenat's team will carefully scan the outside knowledge to make sure it catches any problems or nonsense information that Cyc doesn't flag on its own. But if all goes well, Cyc will be able to harness the worldwide labor of those who want to input new facts -- allowing Cyc's knowledge base to grow at a far more rapid clip than ever before.
Several years down the line, when Lenat has sufficiently improved Cyc's understanding of ordinary English, anyone -- nonscientists, average Joes, whoever -- will be able to talk to it. Eventually Cyc could even be turned loose on the Web, allowing it to read and absorb the mind-boggling collection of information on the Internet. When Cyc becomes an open-source brain, conversant in everyday English, the pace of its growth could be explosive. "The number of people who can help it grow will increase from a few hundred to a billion," Lenat marvels.
But if Cyc becomes open for anyone to talk to, what will it learn? Will people lie, dissemble, or try to delude it? Last fall, MIT's Push Singh assembled a similar experimental on-line project, Open Mind, to collect everyday intelligence. He was a fan of Cyc and wanted to try building his own knowledge database, but he didn't want to spend decades crafting commonsense statements. He met with David Stork, chief scientist at Ricoh Silicon Valley, who has been working on projects that allow multiple people to collaborate on-line. Singh and Stork realized that an open-source approach would be useful for gathering common knowledge and quickly assembled an on-line database at www.openmind.org to solicit facts from nonspecialists. It was up and running by late last summer; by the summer of 2001, nearly seven thousand people had input more than 300,000 facts.
I look over a handful of the entries. Some are obviously uncontroversial, such as "A square is a closed shape with four equal sides at right angles" or "An adult male human is called a man." Others veer into the realm of custom and opinion, such as "The first thing you do when you hang out at a bar is order a drink" and "Christmas is a commercialized holiday."
For now, Singh has no way of checking the information's veracity, other than reading each input himself. "At this point, we're just seeing whether this is a viable way to collect commonsense knowledge," he says. But after examining several thousand of the information pieces that have been input, he's found that contributors are by and large honest. Their mistakes are not so much sneaky as inadvertent, the result of unclear writing. A better trained user, for example, might have modified the example above to say "Many people feel Christmas is a commercialized holiday." As Singh notes: "People really want to be of help! But they're untrained in how to express information clearly."
Though Open Mind is a collection of plain-language statements that does not include a formal logical language, Lenat is watching Singh's project with interest. When he opens up Cyc to the layperson, he'll face the same challenges. Cyc's basic common sense will have to be robust enough to recognize clearly erroneous information and reject it. More problematically, it will have to recognize when it encounters a subjective belief and categorize it as such. For example, Cyc might read some information about a car on a General Motors Web site. Lenat says Cyc ought to trust basic facts on the site, such as the identifying numbers of specific car parts, since "General Motors is actually the best expert on that." But other material -- such as "supposedly third-party reports that just happen to favor General Motors," Lenat notes sardonically -- ought to be disregarded.
The potential for mistakes is serious: Cyc's accepting a belief as a fact would be akin to an impressionable young kid's absorbing a dubious bit of information from an adult. At that point, the only way to fix it would be to do "brain surgery" -- have a Cyclist go in and manually rewrite the fact in CycL. If Cyc is open and there are millions and millions of new facts being input every day, that could be virtually impossible.
"You begin to see just how complex this could be," Lenat says.
STILL, he's willing to try. After so many years of pounding away at Cyc, Lenat has nothing to lose in pressing ahead. He draws me a graph that shows Cyc's learning curve. From 1985 to 2000, the line curves upward gradually -- the "brain surgery" phase during which the Cyclists input knowledge by hand. But then at 2001, the curve steepens dramatically as the open-source phase takes over, and thousands -- or millions -- more inputters join in. Lenat extends the curve maybe ten years into the future. As the curve reaches the point where Cyc has read everything there is to read and spoken with everyone willing to tell it facts, it will begin to flatten out. "It'll know all there is to know," he says. "At that point, the only way it could learn more is by doing experiments itself."
Will 2001, then, be as talismanic as some would hope -- the year that HAL-like intelligence is born? Lenat is optimistic. "I'm very, very excited," he says. But he's made rash predictions in the past: In the late 1980s, he confidently forecast that 1994 would be the year Cyc would begin learning via natural-language conversations with average people.
Intelligence is unruly stuff -- which makes the behavior of artificial intelligence sometimes hard to predict. Lenat tells me a cautionary tale from his days as a young professor in the late 1970s and early 1980s. Back then, he designed a self-learning program called . It was intended to generate new heuristics -- new types of search strategies -- all on its own by slightly mutating bits of LISP computer code. And it did successfully manage to produce unique new rules to parse data sets.
But then trouble struck. In the mornings when Lenat arrived at work, he'd find the program had mysteriously shut down overnight. This happened again and again, puzzling him. What was causing it to crash?
Finally he discovered that it wasn't a crash at all; it was a strange and unexpected new strategy. At some point, EURISKO had altered its rules so that "making no errors at all" was as important as "making productive new discoveries." Then EURISKO realized that if it turned itself off, it wouldn't make mistakes. It was bizarre behavior -- but it was logical. It made sense. "Just not common sense," Lenat says with a laugh.