The Calais Web Service Roadmap

Overview

We think we have a pretty good idea of what the big picture for 2008 looks like and we'll share that here. But, it's early days yet, and we'll absolutely be listening carefully to the Calais community for ideas and suggestions and will be modifying our roadmap accordingly.

Calais/2008 in Four Movements

R1 – January 2008

R1 will allow users to submit text and receive back rich semantically tagged content. During this period we will also be sponsoring a number of contests and bounties for applications developed using the Calais API. R1.1 and R1.2 will be released in February and March and will extend the number and types of entities, facts and events extracted. The R1 release will support English language content and will work best on content such as news, press releases, blog entries and other well-written prose. Future releases will incorporate specialty capabilities for patents, blogs, entertainment and sports news, scientific documents and financial filings.

R2 – April 2008

Calais R2 is a big step forward. In addition to the functionality of R1, R2 will provide users with a persistent GUID allowing anyone with the GUID to call the Calais service and access the original metadata. For example – an RSS reader may have only a snippet of the original article but by using the Calais GUID the reader has the ability to filter, aggregate and present information based on the rich semantic content of the original document. R2.1 and R2.2 will focus on the normalization of extracted entities such as company names and the incorporation of selected industry-standard ontologies.

The second significant feature of R2 is the ability to support user-generated metadata. At the time content is submitted to Calais for processing the user has the ability to attach their own “bottom up” metadata – which will be available to all downstream consumers via the Calais GUID.

R3 – July 2008

Calais R3 begins our journey to incorporate a number of additional languages within Calais. On the roadmap for R3 through R3.2 are Japanese, Spanish and French with additional languages coming in the future.

R4 – September 2008

Calais R4 is the next big step – providing users with a development environment that will allow them to create new extraction capabilities unique to their needs. Want to analyze movie reviews to extract ratings? Automatically process the latest detailed weather forecasts from NOAA? This is the place where users will be able to create these capabilities and share them with the rest of the Calais community.

Iterative Releases

Throughout the year, we’ll be adding new entities, facts and events to the Calais extraction capability based on user input. Additionally, in the second half of the year we hope to begin an experimental project allowing the submission of image, audio or video content for automated analysis and processing.