in

Many thanks to all who participated in testing Calais 3.1.  Thanks to your help and your feedback, 3.1 is now live.

The new release includes several important new capabilities, including company and geographic disambiguation -- the ability to resolve textual ambiguities around company names (e.g. IBM vs. IBM Corporation vs. International Business Machines) and geographies (e.g. Calais, Maine vs. Calais, France).

Company Disambiguation: Calais 3.1's sophisticated company disambiguation capability features a reference database of tens of millions of company names and their variations. This database is primarily focused on public companies, but is being expanded to contain a broader range of companies on an ongoing basis. In addition to cross-referencing variations on the name, Calais 3.1 uses hints found in the text as evidence, such as location or industry, etc.

Geographic Disambiguation: Calais 3.1 uses elements of Metaweb's Freebase and other public data assets to determine to which town, city, state and/or country a given document is referring. As with company disambiguation, this capability uses hints in the surrounding text to refine results. Calais 3.1 also returns geographic coordinates, which can help jump start developers working on mapping applications.

Calais 3.1 also features the following improvements:

Increased efficiency and scalability: 

  • In response to demand from large publishers and others high-volume users, Calais 3.1 gives users the option to received just their semantic metadata in return (i.e. without the redundant copy of their original text). This reduces bandwidth utilization.  Note that Calais does not retain the original text.
  • Calais 3.1 offers support for HTTP traffic compression, which can dramatically reduce the size of a content transaction and similarly reduce bandwidth consumption.

New output formats and integrations:

Additional semantic entities:

  • New elements in the Calais 3.1 vocabulary include PatentFiling, PatentIssuance, FDAPhaase, PersonEmailAddress and PersonEmployment, as well as new elements for PersonAttributes and SecondaryIssuance.
  • Of particular interest is PersonRelation, an entity that extracts references to symmetric relationships between people in the areas of business, academics, military service or politics. It can even detect friendships, marital status, etc.

All of the new entities & improvements in the Calais 3.0 release, including:

  • Exhaustive Extraction / Generic Relations, a capability that exposes all of the relationships in a document, provided that one of the entities in the relationship is a known entity type -- even if the relationship type has not been predefined.
  • Document-level Categorization, a capability that places a given document into one of a number of news-related categories, including Business, Sports, Entertainment, Health, Politics and Technology.
  • A REST interface that simplifies access to the service from a variety of environments.

Keep the feedback coming and thanks again for your help.

Trackback URL for this post:

http://www.opencalais.com/trackback/9133
Login or Register to post a comment.