RDF

A general guideline on how to interpret the Calais response in RDF is presented here.

To view the full RDF schema visit the RDFS page.

For extracted metadata elements, the RDF includes the following:

Document Information

The RDF response includes general document and transaction information, such as the document language, submission date and time and request ID. It also includes the input content after it has been converted into valid XML for the actual processing by the Calais backend server (except for the TEXT/RAW option).

By default, the original body is returned in any RDF response. The original body can be excluded from the returned RDF output using by setting the omitOutputtingOriginalText parameter in the processing directives section of paramsXML to TRUE:

c:omitOutputtingOriginalText="TRUE"

The RDF header, which includes a summary of all entities extracted from the text, is sorted alphabetically based on the Entity type (the same sorting used in Simple Format).

Metadata Element (Entity/Event&Fact) Information:

For each unique metadata element, the information includes the element type (for example, Company, Person, Acquisition), attribute values, and an ID (hash) of this unique element.

If the Relevance feature is turned on, the RDF also includes the relevance score for this unique entity.

When an attribute value is referred to by its ID (hash) and not as a literal string, it includes a comment containing the actual value for easier readability.

Examples:

The Entity Company for "ClearForest Ltd." may look like this:

<rdf:Description rdf:about="http://d.opencalais.com/comphash-1/899a2db3-ce69-3926-ba4f-6dea099c3fc9">
<rdf:type rdf:resource="http://s.opencalais.com/1/type/em/e/Company"/>
<c:name>ClearForest Ltd.</c:name>
<c:nationality>N/A</c:nationality>
</rdf:Description>

The Event&Fact Acquisition between "Reuters" and “ClearForest Ltd.” may look like this:

<rdf:Description rdf:about="http://d.opencalais.com/genericHasher-1/e83cd693-2146-32a2-b1fe-c4a73615dbf0">
<rdf:type rdf:resource="http://s.opencalais.com/1/type/em/r/Acquisition"/>
<!--Reuters-->
<c:company_acquirer rdf:resource="http://d.opencalais.com/comphash-1/48344864-ce62-3064-ae05-a3b41fab186c"/>
<!--ClearForest Ltd.-->
<c:company_beingacquired rdf:resource="http://d.opencalais.com/comphash-1/9dd2192a-4cd2-3b9a-ac2f-b6a0d1fed773"/>
<c:status>planned</c:status>
</rdf:Description>

Metadata Element (Entity/Event&Fact) Instances:

One or more individual instances (mentions) for each unique metadata element. Each element instance includes the following:

  • c:docId: URI of the document this mention was detected in.
  • c:subject: URI of the unique entity. 
  • c:detection: snippet of the input content where the metadata element was identified
  • c:prefix: snippet of the input content that precedes the current instance
  • c:exact: snippet of the input content in the matched portion of text
  • c:suffix: snippet of the input content that follows the current instance
  • c:offset: the character offset relative to the input content after it has been converted into XML
  • c:length: length of the instance. 

Examples:

An instance for the unique company ClearForest Ltd. may look like this:

rdf:Description rdf:about="http://d.opencalais.com/dochash-1/00b00ecd-7e8b-3773-b30f-2169abd75efe/Instance/47">
  <rdf:type rdf:resource="http://s.opencalais.com/1/type/sys/InstanceInfo" />
  <c:docId rdf:resource="http://d.opencalais.com/dochash-1/00b00ecd-7e8b-3773-b30f-2169abd75efe" />
  <c:subject rdf:resource="http://d.opencalais.com/comphash-1/899a2db3-ce69-3926-ba4f-6dea099c3fc9" />
<!-- Company: ClearForest Ltd.;   -->
  <c:detection>[Reuters to acquire text search firm ]ClearForest[ </TITLE><DATE> Mon Apr 30, 2007 7:00am EDT]</c:detection>
  <c:prefix>Reuters to acquire text search firm</c:prefix>
  <c:exact>ClearForest</c:exact>
  <c:suffix></TITLE><DATE> Mon Apr 30, 2007 7:00am EDT</c:suffix>
  <c:offset>54</c:offset>
  <c:length>11</c:length>
</rdf:Description>

An instance for the unique Acquisition element between "Reuters" and "ClearForest Ltd." may look like this:

 <rdf:Description rdf:about="""http://d.opencalais.com/dochash-1/00b00ecd-7e8b-3773-b30f-2169abd75efe/Instance/24"">http://d.opencalais.com/dochash-1/00b00ecd-7e8b-3773-b30f-2169abd75efe/Instance/24">
<rdf:type rdf:resource="="http://s.opencalais.com/1/type/sys/InstanceInfo">http://s.opencalais.com/1/type/sys/InstanceInfo" />
<c:docId rdf:resource="http://d.opencalais.com/dochash-1/00b00ecd-7e8b-3773-b30f-2169abd75efe">http://d.opencalais.com/dochash-1/00b00ecd-7e8b-3773-b30f-2169abd75efe" />
<c:subject rdf:resource="="http://d.opencalais.com/genericHasher-1/5ecd6782-6163-3514-b0ca-98bbb29090c9">http://d.opencalais.com/genericHasher-1/5ecd6782-6163-3514-b0ca-98bbb29090c9" />
<!--Acquisition: company_acquirer: ReutersGroup Plc; company_beingacquired: ClearForest Ltd.; date: 2007-04-30; datestring: Monday; status: announced; -->
<c:detection>[April 30 (Reuters) - News and ]information supplier ReutersGroup Plc (RTR.L: Quote Profile, Research) said on Monday the company would acquire ClearForest Ltd., a maker of software used to search vast archives of news[, Web pages and documents for relevant facts.]</c:detection>
<c:prefix>April 30 (Reuters) - News and</c:prefix>
<c:exact>information supplier ReutersGroup Plc (RTR.L: Quote, Profile, Research) said on Monday the companywould acquire ClearForest Ltd., a maker of software used to search vastarchives of news</c:exact>
<c:offset>200</c:offset>
<c:length>185</c:length>
</rdf:Description>

Disambiguation Elements

Disambiguation results are integrated in RDF output in the following manner.

For Companies: Resolution nodes are added to the output RDF.
Each such node contains the following information:

  • c:type - the type of a resolved company (http://s.opencalais.com/1/type/em/e/Company)
  • c:subject - URI of the referred company entity. A resolution node may contain multiple subject properties; one for each company entity which was resolved to this single company.
  • c:docId – URI of the document this resolution was created in.
  • c:score – a score representing the certainty with which the company was resolved.
  • c:name – formal English legal name of resolved company.
  • c:ticker – company’s ticker.

The RDF example below shows a Company entity (top RDF node) and the respective resolution node.

<rdf:Description rdf:about="http://d.opencalais.com/comphash-1/64136b2b-cb4e-36ac-9f32-f58f4c1f1c8a">
  <rdf:type rdf:resource="http://s.opencalais.com/1/type/em/e/Company" />
  <c:name>British Airways</c:name>
  <c:nationality>British</c:nationality>
</rdf:Description>
<rdf:Description rdf:about="http://d.opencalais.com/er/company/ralg-tr1r/58ad4ecb-2df0-3d46-8333-2d25dcb364d9">
  <rdf:type rdf:resource="http://s.opencalais.com/1/type/er/Company" />
  <c:docId rdf:resource="http://d.opencalais.com/dochash-1/88096fc6-9ea2-3c9f-a0a0-c29a0a5fdced" />
  <c:subject rdf:resource="http://d.opencalais.com/comphash-1/64136b2b-cb4e-36ac-9f32-f58f4c1f1c8a" />
  <c:score>1.0</c:score>
  <c:name>British Airways PLC</c:name>
  <c:ticker>BAY</c:ticker>
</rdf:Description>

For Geographies: Resolution nodes will be added to the output RDF.

Each such node contains the ID of the referred entity (city or province or state or country found in input) in its "c:subject" property, the resolved name in its "c:name" property, latitude of the location in its "c:lat" property and longitude of the location in its "c:long" property. It also includes the type of the resolved geography in its "c:type" property and the URI of the referenced document in its "c:docId" property.

The RDF example below shows a city entity (top RDF node) and the respective resolution node.

<rdf:Description rdf:about="http://d.opencalais.com/genericHasher-1/96e9e28b-f95c-3f9c-a374-b3bcfbc02cfd">
  <rdf:type rdf:resource="http://s.opencalais.com/1/type/em/e/City"/>
  <c:name>Golden</c:name>
</rdf:Description>
<rdf:Description rdf:about="http://d.opencalais.com/er/geo/ralg-geo1/e3f5b88c-f2f2-6e4f-7e2c-f0452221c341">
  <rdf:type rdf:resource="http://s.opencalais.com/1/type/er/Geo"/>
  <c:docId rdf:resource="http://d.opencalais.com/dochash-1/3508bef0-f669-3dec-829c-d3344507f857"/>
  <!--Golden-->
  <c:subject rdf:resource="http://d.opencalais.com/genericHasher-1/96e9e28b-f95c-3f9c-a374-b3bcfbc02cfd"/>>
  <c:name>Golden,Colorado,United States</c:name>
  <c:lat>39.7556</c:lat>
  <c:long>-105.2206</c:long>
</rdf:Description>

For Electronic Products: Resolution nodes will be added to the output RDF.
Each such node contains the ID of the referred entity in its "c:subject" property and the full resolved name in its "c:name" property. It also includes the type of the resolved product in its "c:type" property (http://s.opencalais.com/1/type/er/Product/Electronics) and the URI of the referenced document in its "c:docId" property.

The RDF example below shows a product entity (top RDF node) and the respective resolution node.

<rdf:Description rdf:about="http://d.opencalais.com/genericHasher-1/1b96dd47-f951-30a4-b8c1-2f9ee4a9a9ac">
<rdf:type rdf:resource="http://s.opencalais.com/1/type/em/e/Product"/>
<c:name>Canon PowerShot SD100 / IXUS II Digital Camera</c:name>
<c:producttype>Electronics</c:producttype> </rdf:Description>
<rdf:Description rdf:about="http://d.opencalais.com/er/product/electronics/ralg-prd1/e73d293e-4d4b-3eae-9fa7-16902bc6a7c3">
<rdf:type rdf:resource="http://s.opencalais.com/1/type/er/Product/Electronics"/>
<c:docId rdf:resource="http://d.opencalais.com/dochash-1/2c42d62e-d212-3e0d-bf56-9440d8c7bbd6"/>
<!--Canon PowerShot SD100 / IXUS II Digital Camera-->
<c:subject rdf:resource="http://d.opencalais.com/genericHasher-1/1b96dd47-f951-30a4-b8c1-2f9ee4a9a9ac"/>
<c:name>Canon PowerShot SD100 / IXUS II Digital Camera</c:name>
<c:score>1</c:score>
</rdf:Description>

System Messages

Timeout Message: A timeout of 20 seconds is applied if large input content is submitted to Calais, however, instead of dropping the transaction, Calais will return the metadata results extracted so far, and will also indicate the occurrence of a timeout for the submitted content.
The message will be as follows:

<c:message>
<rdf:Description>
  <rdf:type rdf:resource="http://s.opencalais.com/1/type/sys/Message" />
   <c:messageCode>201</c:messageCode>
   <c:text>Partial metadata extraction due to timeout </c:text>
 </rdf:Description>
</c:message>

 

Examples

Attached are two example files: input in the form of a TEXT/XML document submitted to OpenCalais, and its resulting RDF output file.


AttachmentSize
RDF-input_09Jun15.txt3.07 KB
RDF-output_09Jun15.txt137 KB