API Calls
API Usage Quotas
Currently, the API default usage quotas are 40,000 transactions per day, 4 transactions per second. If you have a reasonable case for why you need a larger quota, please drop us a note at questions@opencalais.com and we’d be happy to talk.
API Invocation
SOAP
The Calais Web Service currently provides one web method - Enlighten. The web method signature is as follows:
String Enlighten(String licenseID, String content, String paramsXML)| Field Name | Type | Definition | Notes |
| licenseID | String | API access key | Obtained through registration |
| content | String | Content to be annotated | Max input length is 100,000 characters |
| paramsXML | String | Processing and user directives and external metadata (see next topic - Input Parameters) | Max parameters length is 16,000 characters |
Your starting point is the wsdl, which can be found at http://api.opencalais.com/enlighten/?wsdl.
The Web Service URL for SOAP requests is http://api.opencalais.com/enlighten/.
Visit the Gallery and download our Calais sample client that includes sample code for your reference.
REST
Another way to invoke the Web Service is REST. The Web Service URL for REST requests is http://api.opencalais.com/enlighten/rest/.
When constructing the HTTP request, you must URL-encode all of your arguments (no other escaping is necessary, such as xml/html characters: "<", "&").
The structure of the argument line is:
licenseID=url encoded string&content= url encoded string ¶msXML= url encoded stringYou can use REST with HTTP GET or HTTP POST.
When using HTTP GET, append the argument line to the REST URL.
When using HTTP POST, include the argument line in the body of the HTTP request and always use the 'application/x-www-form-urlencoded" content type.
You can download a very simple HTML form that posts to our web channel here.
HTTP POST - Obsolete
In older versions you could also invoke the service via an HTTP POST request using the following URL: http://api.opencalais.com/enlighten/calais.asmx/Enlighten.
When using HTTP POST, the argument line needs to be built in a way described above in the REST subsection. The REST option described above replaces the old HTTP POST requests.
The exact representation of the HTTP requests and responses for SOAP, REST and HTTP POST can be found here.
HTTP Traffic Compression
In order to get a Gzipped response, the client must ask for a Gzipped response. It is assumed that the client knows how to handle Gzip.
When requesting a Gzip response from the server, the client should add the following header to the web request: "Accept-encoding: gzip" to tell the server that the client can handle a Gzip response.
Important note when unGzipping a response stream: Try to pass the response stream directly to the Gzip stream for decompression. Otherwise, the data might get corrupted, creating a problem in decompressing it.
Sample source code to demonstrate using HTTP traffic compression is included in the attached file (at the bottom of this page).
Input Parameters
Description
The input content sent to Calais is accompanied by parameters XML that includes processing directives, user directives and external metadata. Please note that paramsXML must be HTTP encoded (escaped).
The input parameters for the API call are summarized here:
| Parameter | Section | Definition | Values | Default |
| contentType | Processing Directives | Format of the input content (see details below) |
"TEXT/XML", "TEXT/TXT" or "TEXT/HTML" or "TEXT/RAW" |
TEXT/RAW |
| outputFormat | Processing Directives | Format of the returned results | "XML/RDF", "Text/Simple" or "Text/Microformats" | XML/RDF |
| reltagBaseURL | Processing Directives | Base URL to be put in rel-tag microformats | <the base URL>, for example "http://myblog.com/tag" | (none) |
| calculateRelevanceScore | Processing Directives | Indicates whether the extracted metadata will include relevance score for each unique entity | "true" or "false" | true |
| enableMetadataType | Processing Directives | Indicates whether the output (RDF only) will include Generic Relation extractions | "GenericRelations" - to enable this metadata type | (none) |
| discardMetadata | Processing Directives | Indicates whether the output will exclude Entity Disambiguation resultstd> | "er/Company;er/Geo" - to exclude disambiguation results | (none) |
| allowDistribution | User Directives | Indicates whether the extracted metadata can be distributed | "true" or "false" | false |
| allowSearch | User Directives | Indicates whether future searches can be performed on the extracted metadata | "true" or "false" | false |
| externalID | User Directives | User-generated ID for the submission | any string | (none) |
| submitter | User Directives | Identifier for the content submitter | any string | (none) |
In addition, users can add external metadata that will be returned in the response. This can be done by embedding an RDF representation of the user's metadata in the externalMetadata node. Please make sure you embed RDF-compliant metadata (will be supported in future versions).
Check the example in the next section that shows how the parameters XML should look. Of course, you may change the parameter values and the user's external metadata as explained above. Please note that paramsXML must start with the root element "<c:params>" ("c" can be replaced with any other prefix but the namespace must still be "http://s.opencalais.com/1/pred").
Example Parameter File
<c:params xmlns:c="http://s.opencalais.com/1/pred/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<c:processingDirectives c:contentType="text/txt" c:enableMetadataType="GenericRelations" c:outputFormat="xml/rdf">
</c:processingDirectives>
<c:userDirectives c:allowDistribution="true" c:allowSearch="true" c:externalID="17cabs901" c:submitter="ABC">
</c:userDirectives>
<c:externalMetadata>
</c:externalMetadata>
</c:params>
Input Content
Language
Calais today supports English only. To ensure non-English content is not processed, Calais applies a Language Identification module before processing the text for entities, events and facts. This module will fail to recognize the language if the submitted content is small.
If the submitted content is less than 100 characters and the language cannot be recognized, Calais will assume the language is English by default and will process the text for entities, events and facts. In addition, in such cases, Calais will return "Input Text Too Short" as the language code in the RDF.
Format
As described in the previous section, Calais supports four formats of content: TEXT/XML, TEXT/HTML, TEXT/TXT and TEXT/RAW.
TEXT/HTML: will apply cleansing of HTML tags and scripts, hence entity and event detection will be relative to the cleaned text. For optimal results it is recommended to use this contentType when submitting HTML content.
TEXT/XML: will apply the XML converter for escaping the necessary characters, hence entity and event detection will be relative to the cleaned text. For optimal results it is recommended to use this contentType when submitting XML content. The XML converter also supports the NewsML standard.
If the content is submitted as TEXT/XML, Calais will process the following XML sections:
| Document Section | Supported XML Tag Names |
| Document Title | TITLE, HEADLINE, HEADER |
| Document Body | BODY, DESCRIPTION, CONTENT |
| Document Date | DATE, DATETIME, DATEANDTIME, PUBDATE |
Document Title and Document Body should contain the content that will be processed by Calais.
The importance of Document Date is that once detected, it is used to resolve relative date mentions (e.g., "yesterday") when such mentions appear in Calais's events and facts. If Document Date is not provided, relative dates will be resolved based on the "date of today".
Please make sure your XML content conforms to these tag names in order for the metadata to be extracted optimally.
TEXT/TXT: will apply Calais' legacy text converter; entity and event detection will be relative to the cleaned text. You can use this contentType when submitting plain text.
TEXT/RAW (default): will not apply any conversion to the submitted content; entity and event detection (offset/length) will match the submitted content exactly. You can use this contentType when submitting plain text. Note that this is the only contentType option that works exactly on the submitted input content without modifying/cleansing it at all.
Security
Calais supports SSL security of traffic to and from Calais. GoDaddy is the authority for SSL certification. Simply use https:// instead of http://.
| Attachment | Size |
|---|---|
| HTMLform.zip | 343 bytes |
