API Calls

API Usage Quotas

Currently, the API default usage quotas are 40,000 transactions per day, 4 transactions per second. If you have a reasonable case for why you need a larger quota, please drop us a note at questions@opencalais.com and we’d be happy to talk.

API Invocation

SOAP

The Calais Web Service currently provides one web method - Enlighten. The web method signature is as follows:

String Enlighten(String licenseID, String content, String paramsXML)
Field Name Type Definition Notes
licenseID String API access key Obtained through registration
content String Content to be annotated Max input length is 100,000 characters
paramsXML String Processing and user directives and external metadata (see next topic - Input Parameters) Max parameters length is 16,000 characters

Your starting point is the wsdl, which can be found at http://api.opencalais.com/enlighten/?wsdl.

The Web Service URL for SOAP requests is http://api.opencalais.com/enlighten/.

Visit the Gallery and download our Calais sample client that includes sample code for your reference.

Back to Top

REST

Another way to invoke the Web Service is REST. The Web Service URL for REST requests is http://api.opencalais.com/enlighten/rest/.

When constructing the HTTP request, you must URL-encode all of your arguments (no other escaping is necessary, such as xml/html characters: "<", "&").

The structure of the argument line is:

licenseID=url encoded string&content= url encoded string &paramsXML= url encoded string

You can use REST with HTTP GET or HTTP POST. 

When using HTTP GET, append the argument line to the REST URL.

When using HTTP POST, include the argument line in the body of the HTTP request and always use the 'application/x-www-form-urlencoded" content type.

You can download a very simple HTML form that posts to our web channel here.

HTTP POST - Obsolete

In older versions you could also invoke the service via an HTTP POST request using the following URL: http://api.opencalais.com/enlighten/calais.asmx/Enlighten.

When using HTTP POST, the argument line needs to be built in a way described above in the REST subsection. The REST option described above replaces the old HTTP POST requests.

The exact representation of the HTTP requests and responses for SOAP, REST and HTTP POST can be found here.

HTTP Traffic Compression

In order to get a Gzipped response, the client must ask for a Gzipped response. It is assumed that the client knows how to handle Gzip.

When requesting a Gzip response from the server, the client should add the following header to the web request: "Accept-encoding: gzip" to tell the server that the client can handle a Gzip response.

Important note when unGzipping a response stream: Try to pass the response stream directly to the Gzip stream for decompression. Otherwise, the data might get corrupted, creating a problem in decompressing it.

Sample source code to demonstrate using HTTP traffic compression is included in the attached file (at the bottom of this page).

Input Parameters

Description

The input content sent to Calais is accompanied by parameters XML that includes processing directives, user directives and external metadata. Please note that paramsXML must be HTTP encoded (escaped).

The input parameters for the API call are summarized here:

Parameter Section Definition Values Default
contentType Processing Directives Format of the input content (see details below)

"TEXT/XML", "TEXT/TXT" or "TEXT/HTML" or "TEXT/RAW"

TEXT/RAW
outputFormat Processing Directives Format of the returned results "XML/RDF", "Text/Simple" or "Text/Microformats" XML/RDF
reltagBaseURL Processing Directives Base URL to be put in rel-tag microformats <the base URL>, for example "http://myblog.com/tag" (none)
calculateRelevanceScore Processing Directives Indicates whether the extracted metadata will include relevance score for each unique entity "true" or "false" true
enableMetadataType Processing Directives Indicates whether the output (RDF only) will include Generic Relation extractions "GenericRelations" - to enable this metadata type (none)
discardMetadata Processing Directives Indicates whether the output will exclude Entity Disambiguation resultstd>

"er/Company;er/Geo" - to exclude disambiguation results (none)
allowDistribution User Directives Indicates whether the extracted metadata can be distributed "true" or "false" false
allowSearch User Directives Indicates whether future searches can be performed on the extracted metadata "true" or "false" false
externalID User Directives User-generated ID for the submission any string (none)
submitter User Directives Identifier for the content submitter any string (none)

In addition, users can add external metadata that will be returned in the response. This can be done by embedding an RDF representation of the user's metadata in the externalMetadata node. Please make sure you embed RDF-compliant metadata (will be supported in future versions).

Check the example in the next section that shows how the parameters XML should look. Of course, you may change the parameter values and the user's external metadata as explained above. Please note that paramsXML must start with the root element "<c:params>" ("c" can be replaced with any other prefix but the namespace must still be "http://s.opencalais.com/1/pred").

Example Parameter File

<c:params xmlns:c="http://s.opencalais.com/1/pred/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

<c:processingDirectives c:contentType="text/txt" c:enableMetadataType="GenericRelations" c:outputFormat="xml/rdf">

</c:processingDirectives>

<c:userDirectives c:allowDistribution="true" c:allowSearch="true" c:externalID="17cabs901" c:submitter="ABC">

</c:userDirectives>

<c:externalMetadata>

</c:externalMetadata>

</c:params>

Back to Top

Input Content

Language

Calais today supports English only. To ensure non-English content is not processed, Calais applies a Language Identification module before processing the text for entities, events and facts. This module will fail to recognize the language if the submitted content is small.

If the submitted content is less than 100 characters and the language cannot be recognized, Calais will assume the language is English by default and will process the text for entities, events and facts. In addition, in such cases, Calais will return "Input Text Too Short" as the language code in the RDF.

Format

As described in the previous section, Calais supports four formats of content: TEXT/XML, TEXT/HTML, TEXT/TXT and TEXT/RAW.

TEXT/HTML: will apply cleansing of HTML tags and scripts, hence entity and event detection will be relative to the cleaned text. For optimal results it is recommended to use this contentType when submitting HTML content.

TEXT/XML: will apply the XML converter for escaping the necessary characters, hence entity and event detection will be relative to the cleaned text. For optimal results it is recommended to use this contentType when submitting XML content. The XML converter also supports the NewsML standard.

If the content is submitted as TEXT/XML, Calais will process the following XML sections:

Document Section Supported XML Tag Names
Document Title TITLE, HEADLINE, HEADER
Document Body BODY, DESCRIPTION, CONTENT
Document Date DATE, DATETIME, DATEANDTIME, PUBDATE

Document Title and Document Body should contain the content that will be processed by Calais.

The importance of Document Date is that once detected, it is used to resolve relative date mentions (e.g., "yesterday") when such mentions appear in Calais's events and facts. If Document Date is not provided, relative dates will be resolved based on the "date of today".

Please make sure your XML content conforms to these tag names in order for the metadata to be extracted optimally. 

TEXT/TXT: will apply Calais' legacy text converter; entity and event detection will be relative to the cleaned text. You can use this contentType when submitting plain text.

TEXT/RAW (default): will not apply any conversion to the submitted content; entity and event detection (offset/length) will match the submitted content exactly. You can use this contentType when submitting plain text. Note that this is the only contentType option that works exactly on the submitted input content without modifying/cleansing it at all.

Back to Top

Security

Calais supports SSL security of traffic to and from Calais. GoDaddy is the authority for SSL certification. Simply use https:// instead of http://.

Back to Top

 

AttachmentSize
HTMLform.zip343 bytes