Automated Intelligent Data Integration

SKSI logoCyc is not restricted to the knowledge in the KB. Just like humans, Cyc can be taught to access external data sources, understanding data in days, not weeks or months.

In recent years, companies have realized that data is a valuable asset that can be used to derive insights and improve business outcomes. The Cyc Knowledge Base is a source of power that can be leveraged to enrich your data and uncover its latent value.

There are two ways that Cyc integrates with external knowledge and data. For external ontologies — sources that provide relatively static, general, re-usable, type level information (“knowledge” rather than “data”) — we typically import their content into the knowledge base.

Cyc has automated tools for importing external ontologies. Once an ontology is imported, that knowledge can be used by the Cyc inference engine. Since CycL is a much more flexible and expressive representation language than ontology standards like OWL / RDF, importing an ontology into Cyc can allow you to create abstractions over the knowledge therein that make it easier to work with and reason about.

But for most data sources, Cyc uses a powerful system called Semantic Knowledge Source Integration (SKSI) to access the data where it lives on demand. The ability to access a variety of data sources “in situ” without the need for ETL processes or data warehouses sets Cyc apart for machine learning techniques which require that move all of your data into a curated dataset before you can start generating value.

Cyc supports a variety of data formats and we continue to add more as standards change and our customers’ needs evolve.

The Cyc platform includes a suite of automated and semi-automated tools to make mapping data sources easy.

Cyc can even reason efficiently using data from multiple disparate sources.

To illustrate the power of reasoning over multiple in-situ data sources, consider a the following query.

We can ask Cyc which US cities are at greatest risk of an anthrax attack, as part of an effort to adequately prepare and defend against hostile threats.

This question can only be answered by combining general knowledge of the sort you can only find in the Cyc KB with data, much of which is publicly available.

Cyc is able to use knowledge about what it means to be a “major US city”, what factors contribute to risk of anthrax attack, and which animals are zoonautic hosts for anthrax to break this question down into its component parts.

Having applied its knowledge to transform one big question into a bunch of small questions, Cyc can recognize that these small questions are all gro unded in data. In fact, Cyc has access to all the relevant public data sources necessary to answer these small questions. It can find all of the major US cities by using the Geographic Names Information Service, find information about livestock populations from a USDA database, hospital bed information from NIH reports, and weather from a NOAA API.

Cyc’s inference engine is able to put all of this data together with the relevant knowledge in the KB to answer the original query. If the data in the underlying data sources changes, that’s okay — Cyc will just query them again the next time we ask it to solve this question. Cyc appropriately factors this problem into a knowledge component in the KB, which is fairly stable, reusable, and general, and a data component which may be highly dynamic (and does not need to be managed or curated by Cycorp).

As with any Cyc query, Cyc is able to show its work and explicitly explain the reasoning behind its conclusion.

Imagine what insights Cyc could produce using your data together with the power of the Knowledge Base.


Cyc's Knowledge BaseCyc's Inference EnginesIntelligent Data SelectionActionable Output

7000 North Mopac Expressway, Suite #200
Austin, TX 78731, USA