Calais R3 Now Available
We completed our two-week Technology Preview of Calais Release 3, and it's now fully available to users. The details are located here.
This is a long post, so I’ll highlight the significant changes right here:
- Many new entities and events
- A REST interface to the Calais web service
- Document level categorization into standard news categories
- Exhaustive extraction
- A variety of miscellaneous bug fixes
- Higher performance
Some details on R3….
What’s in R3?
First, as with every release we are expanding and enhancing the universe of entities and relationships extracted by Calais. While the details are located in the R3 Forum – a few highlights:
- New entities include Sports League, Programming Language, Operating System, Medical Treatment and Company Ticker
- New events include Movies Releases, Album Releases and a variety of business-related items such as Bonus Shares Issuances, Types of Business Relationship and others.
Second, after many requests we have implemented a REST interface to Calais. This should simplify access to the service from a variety of environments.
Third, our new document categorization capability.
Categorization examines your text and attempts to place the document as a whole into one of a number of news related categories. This capability will be significantly expanded in the future – but will provide immediate benefit to anyone aggregating news content today. The initial categories supported are Business, Sports, Entertainment, Health, Politics and Technology.
Fourth, depending on what you’re using Calais for this could be a big deal. R3 includes a Generic Relations capability. Generic Relations will expose all relationships in your document as long as one of the members of the relationship is a known entity type. Generic Relations is sometimes called Exhaustive Extraction – extracting all the relationships that involve at least one entity, even if the relationship type hasn’t been predefined. This capability is designed for semantic processing experts
who know what they are doing. The volume of output can be quite large – but the ability to do in-depth information discovery is enormous.
And finally, we’ve done our best to solve any extraction related issues that have been reported to us. We can’t promise 100% - but you should see significant improvement.
What’s Coming?
Let’s limit ourselves to the very short term – things you can expect to see in the next month or less.
- Company Disambiguation. This is a big deal and the first step toward richer entity disambiguation throughout Calais. With company disambiguation we will use everything from the name of the company to the names of people to the geographies mentioned to return a single authoritative name for the company. A simple example: “IBM”, “International Business Machines”, “IBM Professional Services” will all be detected as companies – and will all be linked back to a single definitive reference for “IBM”.
- Geo Disambiguation. The same effort as applied to geographies. No longer will we be confused whether we talking about Paris, TX or Paris, France.
- A super secret skunkworks project. Just think of it as putting a semantic layer on top of the web. The whole web. Right now.
Trackback URL for this post:
- Tom's blog
- Login or register to post comments





