Linked Data - Entities
Calais is now officially part of the Linking Open Data (LOD) Cloud.
Linked Data is a method of exposing, sharing, and connecting data via dereferenceable URIs (Uniform Resource Identifier) on the Web.
The method includes four principles:
- Use URIs to identify things that you expose to the Web as resources.
- Use HTTP URIs so that people/machines can locate and dereference these things.
- Provide useful information about the resource when its URI is dereferenced.
- Include links to other, related URIs in the exposed data as a means of improving information discovery on the Web
The Calais ecosystem is exposed via Linked Data endpoints. When Calais extracts an entity from a given text it also returns an entity URI. This URI is dereferenceable – you can submit an HTTP request, programmatically or via a browser, and get in response useful information and links to other Linked Data and Web assets – all relevant to the entity that’s described by the URI.
The assets Calais currently links to are:
The breadth and depth of information provided in each URI vary based on how well Calais can unambiguously identify this entity.
As an example, “Paris” is an ambiguous city name, because there is Paris, France, Paris, Texas and others. The URI of the ambiguous “Paris” is:
This URI offers disambiguation options, each leading to another resource that is known to Calais and can disambiguate “Paris”. If you follow any of these options, you’ll find the resource that includes useful information such as geographical coordinates, links to other Linked Data or Web assets and more. For example, “Paris, France” is one of these options and its URI endpoint is:
The richest endpoints belong to the entity type Company. For disambiguated companies Calais provides rich information, such as ticker symbol, officers and directors, corporate website, industry codes and much more.
Here are few examples for useful company endpoints:
We are expanding the coverage of Calais Linked Data ecosystem on an on-going basis. To date, we have populated meaningful data and/or links to other assets to the following entity types:
|Company||Dow Jones & Company, Inc.|
|MusicAlbum||Dark Side of the Moon|
|PublishedMedium||the Boston Globe|
|SportsEvent||the FIFA World Cup|
|SportsLeague||Scottish Football Alliance|
Important: For a subset of entities (currently, Companies, Electronic products and Geographies), entity disambiguation is provided with the RDF response (see details here). In such cases, it is recommended that you access the URI of the disambiguated entity because it will include more information and relevant links.
It’s pretty easy to find disambiguated entities in the RDF response – their types are always of the form: http://s.opencalais.com/1/type/er/<EntityType>
while for other entities, the types are of the form:
Note: See a description of the RDF schema for a Linked Data endpoint in OWL .