Calais 4.0 has arrived!

KristaThomas | January 14, 2009

On the one year anniversary of our debut, we are extremely pleased to announce the debut of Calais 4.0. With more than 9,000 of you processing 1+million documents per day, it was time to take Calais to the next level.

Effective today, Calais 4.0 goes beyond metatagging to help you automatically integrate your content with Linked Data assets from Wikipedia, DBpedia, GeoNames, the Internet Movie Database (IMDB), and more.

It also introduces a global metadata transport layer that makes it easy for you to share rich semantic metadata with such content consumers as search engines, news aggregators, 'related stories' recommendation services, etc. to reach downstream readers.

Calais 4.0 in practice:

  1. With Calais 4.0, each document - and every significant semantic element within that document - is assigned a unique identifier (a 'uniform resource identifier or URI). These identifiers are returned to the content owner along with the rest of the metadata that Calais discovered.
  2. Unlike simple "tagging" solutions, the rich semantic metadata Calais returns can be used to enhance publishers' content for improved search, navigation, ad placement and syndication. Each of the semantic elements Calais 4.0 discovers also provides a key to unlocking additional content assets in the Linked Data ecosystem. 
  3. Finally, the unique document and entity identifiers returned by Calais can be shared with content partners and content consumers like search engines, etc. to enable the seamless transfer of not just content, but of the underlying meaning and relevance of that content.

To see Calais 4.0 in action, use the Calais Viewer Technology Preview tool. To see an example of a Linked Data entity, see the URI for IBM.

With this release, we are also publishing the Calais schema in RDFS. This will enable you to access a growing toolkit of schema-aware tools to work with Calais' metadata output.

Finally, in keeping with our commitment to 'connect everything,' additional advances in version 4.0 include:

  • Entity identification in French; the first step in an aggressive plan to incorporate the world's major languages that will continue throughout 2009.
  • Significant enhancements to the semantic metadata generation capabilities in the areas of product identification, competitive intelligence and judicial events.
  • Significant enhancements to automated document-level categorization in the areas of recreation, environment, weather and legal.

How to get started:

As is our practice, Calais 4.0 will be in technology preview for a period of two months, so that you can do testing and provide feedback on any issues you may encounter. To get started: 

  • Use the same Calais API key that you use today when submitting documents to
  • Use the Web address instead of
  • We strongly recommend that you follow the attached RDFS documents when parsing the RDF output files. The RDFS will help resolve any changes in the RDF files.
  • After a few weeks of evaluating the R4 Technology Preview, we recommend that you move to R4 production candidate release found at

If you encounter any problems, you can revert back to the Calais 3.1 service by using Web address instead of

Please share your feedback on Calais 4.0 in the R4 forum (adding link this afternoon).

See you in the Linked Data cloud!
-The Calais team


Listed below are links to other sites that reference this page.
Trackback URL:


Login or Register to post a comment.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Schema Links

Try this page on Calais 4.0, when we originally released the Schema:

In the links below:

Here is the Calais Ontology in OWL:

The OpenCalais initiative



just remembered another thing, it would be good if we can have a podcast to the lauch and a live channel,maybe twitter updates! am looking forward to the development.

nice post

I am looking to use this system in my site too, looks great ,the tagging would definitly enhance my site ,thanks for the post,cheers

The RDF schema in the PDF

The RDF schema in the PDF for the release notes, but I haven't found a link to the schema on the site.  Could you point me in the right direction . Thanks. 

It is a great fact that it

It is a great fact that it introduces a global metadata transport layer that makes it easy for you to share rich semantic metadata. It is very helpful for users.

Have to agree its useful

Especially with words and phrases being added increasingly these days in CMS systems, one might want to consider these. 

-Gerals, API zone

Looking good! Great

Looking good! Great site!


It sounds like you guys

It sounds like you guys worked extremely hard on this project, and all the effort has paid huge dividends. 

When are other languages coming on board?

I wish you the very best for the future.

Very useful is there

Very useful is there currently an open calais extension for Joomla, something similar to the Drupal Module thats in the pipeline?

great info

This is exactly what I need for one of my SEO clients thank you.


Any chance there's an available webcast for that one? Would have loved to see the presentation in NYC. I was too busy attending a conference on status production in Toronto - interesting stuff too.

I saw a presentation of

I saw a presentation of Calais at the NY Semantic Meetup last night; very impressive. Seems that their team has been hard at work on some very interesting items. And having Thomson Reuters behind is tremendous. Great to see continued innovation from the Calais team. We evaluated Calais a few months ago. Extraction accuracy was okay, but didn't meet the reqs for our use case by that time. The Linked Data support makes it a lot more attractive now.

great work

great release with GREAT upgrades
i hope italian recognition is  going to come soon!

Categorization for French ?

Hi Calais team,

I've tried French and it works but is there any categorization for french language ? New Api always returns "None" as a category when sending a french extract.



Unable to reproduce your results

I see the example has  a page for IBM. I have a few questions. What was the input that was used to create the webpage I would like to see the rdfs that was produced. I assume that some other code parsed the output or was the output direct option from calais


The page you're linking to is a URI - our Linked Data endpoint for the company entity IBM.

These types of pages (for a wide range of entities) are produced by Calais when we detect the entity in the text you send us. If for example you sent a news article mentioing IBM (and this was the first time we'd seen IBM) we would create this linked data page.

The page follows the Linked Data (see Wikipedia for a good overview) standard. If you retrieve The system will return HTML if you are calling from a browser and RDF if you are calling with a type of RDF. If you want to force it into one or the other you can retrieve or

Hope that answers the question.


RDF Schema?

Hi -
I saw the embedded icon for the RDF schema in the PDF for the release notes, but I haven't found a link to the schema on the site.  Could you point me in the right direction?

Seconding this...

We have also struggled finding the RDF schema - is it just not published yet?

release notes?

I can't "find" the release notes that are linked to at the bottom on the page, neither on my mac nor pc. Anything wrong there?
Also - what about the spec for the schema - is that published yet?
thanks for all the nice work!

release notes

Hi Bob -- sorry for the bum link.  Try the release notes link in Tom's blog post here:

-Krista The OpenCalais initiative