Linked Data - Entities

Calais is now officially part of the Linking Open Data (LOD) Cloud.

Linked Data is a method of exposing, sharing, and connecting data via dereferenceable URIs (Uniform Resource Identifier) on the Web.
The method includes four principles:

  • Use URIs to identify things that you expose to the Web as resources.
  • Use HTTP URIs so that people/machines can locate and dereference these things.
  • Provide useful information about the resource when its URI is dereferenced.
  • Include links to other, related URIs in the exposed data as a means of improving information discovery on the Web

The Calais ecosystem is exposed via Linked Data endpoints. When Calais extracts an entity from a given text it also returns an entity URI. This URI is dereferenceable – you can submit an HTTP request, programmatically or via a browser, and get in response useful information and links to other Linked Data and Web assets – all relevant to the entity that’s described by the URI.

The assets Calais currently links to are:

  • DBpedia
  • Wikipedia
  • Freebase
  • Reuters.com
  • GeoNames
  • Shopping.com
  • IMDB
  • LinkedMDB

The breadth and depth of information provided in each URI vary based on how well Calais can unambiguously identify this entity.

As an example, “Paris” is an ambiguous city name, because there is Paris, France, Paris, Texas and others. The URI of the ambiguous “Paris” is:
http://d.opencalais.com/genericHasher-1/56fc901f-59a3-3278-addc-b0fc69b283e7
.
This URI offers disambiguation options, each leading to another resource that is known to Calais and can disambiguate “Paris”. If you follow any of these options, you’ll find the resource that includes useful information such as geographical coordinates, links to other Linked Data or Web assets and more. For example, “Paris, France” is one of these options and its URI endpoint is:
http://d.opencalais.com/er/geo/city/ralg-geo1/797c999a-d455-520d-e5cf-04ca7fb255c1.

The richest endpoints belong to the entity type Company. For disambiguated companies Calais provides rich information, such as ticker symbol, officers and directors, corporate website, industry codes and much more.
Here are few examples for useful company endpoints:

International Business Machines Corporation

Starbucks Corporation

Alcoa Inc.

We are expanding the coverage of Calais Linked Data ecosystem on an on-going basis. To date, we have populated meaningful data and/or links to other assets to the following entity types:

 

Entity Example
City Sacramento
Company Dow Jones & Company, Inc.
Country Canada
EntertainmentAwardEvent the Oscars
Holiday Memorial Day
MarketIndex S&P
MedicalCondition pancreatic cancer
MedicalTreatment acupuncture
Movie Iron Monkey
MusicAlbum Dark Side of the Moon
MusicGroup Pink Floyd
OperatingSystem Ubuntu
ProgrammingLanguage Java
ProvinceOrState Saskatchewan, Canada
PublishedMedium the Boston Globe
RadioStation WBUR
SportsEvent the FIFA World Cup
SportsGame basketball
SportsLeague Scottish Football Alliance
TVShow Lost
TVStation WGBH

Important: For a subset of entities (currently, Companies, Electronic products and Geographies), entity disambiguation is provided with the RDF response (see details here). In such cases, it is recommended that you access the URI of the disambiguated entity because it will include more information and relevant links.
It’s pretty easy to find disambiguated entities in the RDF response – their types are always of the form: http://s.opencalais.com/1/type/er/<EntityType>
while for other entities, the types are of the form:
http://s.opencalais.com/1/type/em/e/<EntityType>.

Note: See a description of the RDF schema for a Linked Data endpoint in OWL