FAQ - Linked Data

Q: What is EM?

A: EM is "Entity Markup". EM refers to an entity as it appears in text, disregarding the current context. Meaning if text contains "Moscow in Texas and Moscow in Maine", the resulting RDF will have only one EM for "Moscow". EM provides some basic info about the entity, but most importantly it provides link to all repository entities called " Moscow ".

<rdf:Description rdf:about="http://d.opencalais.com/genericHasher-1/0c44e300-49f5-39ab-84ac-fbdced5c31ec"> 
  <rdf:type rdf:resource="http://s.opencalais.com/1/type/em/e/City" />
  <c:name>Moscow</c:name>
  </rdf:Description>
The green link will lead to you to a Repository page which can contain one of the following:
  • Redirection Page: If entity is unique (there is no another entity by the same name in the Repository), there will be a redirection link to entity page in repository.
  • Disambiguation Page: If entity is not unique (there are other entities by the same name in the Repository), there will be a list of links to all entities in the Repository by the same name.
  • Info Page : If there is no such entity in the Repository, basic data (which is derived from current document) about the entity will be displayed.

Q: What is ER?

A: ER refers to an entity from a text, regarding its context (in other words entity is disambiguated). Meaning if text contains "Moscow in Texas and Moscow in Maine", the resulting RDF will have both ER for "Moscow, Texas" and "Moscow, Maine" (assuming that Repository does contain those entities). ER provides info about the entity as well as link to entity Repository page which contains extended info about the current entity.

<rdf:Description rdf:about="http://d.opencalais.com/er/geo/city/ralg-geo1/636f6e15-44b1-0d89-017f-e6356385c6f9">
  <rdf:type rdf:resource="http://s.opencalais.com/1/type/er/Geo/City" />
  <c:docId rdf:resource="http://d.opencalais.com/dochash-1/2797640c-b7d3-3c52-a9db-092ea7ebb1b3" />
- <!-- Moscow  -->
  <c:subject rdf:resource="http://d.opencalais.com/genericHasher-1/0c44e300-49f5-39ab-84ac-fbdced5c31ec" />
  <c:name>Moscow,Texas,United States</c:name>
  <c:shortname>Moscow</c:shortname>
  <c:containedbystate>Texas</c:containedbystate>
  <c:containedbycountry>United States</c:containedbycountry>
  <c:latitude>30.9131</c:latitude>
  <c:longitude>-94.825</c:longitude>
  </rdf:Description>

Q: How can I retrieve my document from Open Calais repository ?

A: Just follow the link of your document:
<rdf:Description c:calaisRequestID="0ae66255-e4e9-4902-a060-15dc98c9c993" c:id="http://id.opencalais.com/BxkiXhlcYkXgXdUFhx790A"
rdf:about="http://d.opencalais.com/dochash-1/2797640c-b7d3-3c52-a9db-092ea7ebb1b3">

Q: Can I view entity/document as HTML/RDF?

A: Yes (LinkedData supports both RDF and HTML browsers). Just change the URL ending from ".html" to ".rdf" and vice versa.
http://d.opencalais.com/dochash-1/2797640c-b7d3-3c52-a9db-092ea7ebb1b3.html to
http://d.opencalais.com/dochash-1/2797640c-b7d3-3c52-a9db-092ea7ebb1b3.rdf and save the file.

Q: How do I know if the current entity is ER or EM?

A: If  "rdf:type rdf:resource" contains /em/ it is EM

<rdf:Description rdf:about="http://d.opencalais.com/genericHasher-1/d896ede1-c911-378c-be1f-f764c04dd725">

 

  <rdf:type rdf:resource="http://s.opencalais.com/1/type/em/e/NaturalFeature" />
  <c:name>Rhode Island</c:name>
  </rdf:Description>
if rdf:resource contains /er/ it is ER
<rdf:Description rdf:about="http://d.opencalais.com/er/geo/city/ralg-geo1/d7bc4e8b-cbf3-b49e-f3e3-a56709eb6e71">
  <rdf:type rdf:resource="http://s.opencalais.com/1/type/er/Geo/City" />
  <c:docId rdf:resource="http://d.opencalais.com/dochash-1/2797640c-b7d3-3c52-a9db-092ea7ebb1b3" />
- <!-- Moscow  -->
  <c:subject rdf:resource="http://d.opencalais.com/genericHasher-1/0c44e300-49f5-39ab-84ac-fbdced5c31ec" />
  <c:name>Moscow,Maine,United States</c:name>
  <c:shortname>Moscow</c:shortname>
  <c:containedbystate>Maine</c:containedbystate>
  <c:containedbycountry>United States</c:containedbycountry>
  <c:latitude>45.0706</c:latitude>
  <c:longitude>-69.8911</c:longitude>
  </rdf:Description>

Q: What entity types can be resolved?

A: Currently the resolvable types are: Geo (Country, City, ProvinceOrState), Company, Product (electronics).

Q: The Calais Linked Data System is a repository of linked data assets. What happens when OpenCalais extracts a new entity from a submission?

A: Short answer: We automatically build a stubbed-out XML page.

What happens from there depends. For some entities, we perform entity resolution (also called disambiguation), documented here.

If the entity is for a type (e.g., company, geography, product) that OpenCalais disambiguates, the system automatically creates a new XML page and a pointer between the new instance and the disambiguated entity. For instance, the first time we extract "IBM France Ltd", OpenCalais automatically creates a new XML page and points this to the disambiguated XML page on IBM.

Here is the HTML rendering of the IBM page in the Calais Linked Data system: http://d.opencalais.com/er/company/ralg-tr1r/9e3f6c34-aa6b-3a3b-b221-a07aa7933633.html. Notice that the page for IBM has some content assets. These were obtained from within Thomson Reuters.

The disambiguated page may have pointers to other pages in the Linked Data cloud, such as DBpedia, CIA World Fact Book, and Musicbrainz. An application can use these links to traverse the linked data cloud and retrieve a composite of content assets. Conversely, if no disambiguated page exists, OpenCalais can query our internal databases to retrieve some content assets, populate the XML page with assets similar to what we show for IBM. This generally happens for large publicly traded companies. The content in the XML page is generally a subset of the content that can be obtained on Reuters.com.

In the case of small companies, for which we do not have content assets within Thomson Reuters to populate the XML page, OpenCalais builds a stubbed-out page.

Here is the page for Kupel’s Bakery, a local bakery in Brookline, Massachusetts. While Kupel’s makes good bagels, it’s not likely to make the New York Stock Exchange in the near future. The existence of the Kupel’s page shows that OpenCalais has extracted it from at least one document processed through the service.

The same process occurs for entities for which we do not disambiguate, for instance, people. If "John Jones" appears in content, Calais does not know which John Jones has been extracted. OpenCalais creates an XML entry that identifies John Jones as a person.

At present, there is no way to query the Calais Linked Data System to see if an entry for a particular entity exists.