OpenCalais Release 4.1 Available Today
- Introducing Social Tags – a knowledgebase-driven tagging solution
- Entity extraction now supported in English, French and Spanish
- Significant improvements to Linked Data depth and breadth
- Introducing the “Recession Pack” of topical fact and event extraction
Over the last several months we’ve been hard at work in the boiler room doing a fair amount of engineering work on OpenCalais. While not exciting – we’ve made significant improvements to the system’s reliability and scalability.
Now it’s time to release some user-visible enhancements. OpenCalais 4.1 is released today and Release 4.2 will follow in just two to three weeks. Unlike our last several releases, these will be rolled out as simple updates to the existing web service. If you’re not interested in any of the new features – no changes are required to your application.
Here’s what’s coming in Release 4.1 and 4.2:
Folksonomies, Ontologies, Vocabularies and Stuff (4.1)
OpenCalais is a great semantic data extraction engine. If you write an article about the relative merits of Porsche and BMW at the test track in Leipzig, we’ll diligently identify Porsche and BMW as companies and Leipzig as a geography. We’ll create Linked Data URIs to represent these things and open up access to the Linked Data ecosystem so you can enhance your article with other content assets.
But… sometimes you just want a great description. The kind of tags a human would put on the article. Like “Car racing” or “Automobiles”. The kind of tag that would, for example, be very searchable and therefore …. SEO’able (that is definitely is not a word).
In 4.1 we’re introducing OpenCalais Social Tags. Social Tags is our attempt to emulate how a human might tag the document. Social Tags does some fairly sophisticated analysis of your entire document and maps it to a knowledgebase based on Wikipedia and other assets. From that process we generate Social Tags.
We’d suggest you experiment with using them for content tagging and navigation – we’d also really like to see some experimentation around using the Social Tags as keywords for ad placement and meta-tags for HTML pages. Sounds like an opportunity for SEO and improved ad placement to us.
Because it’s a new approach Social Tags is going to require ongoing refining and tuning. You can expect some strange results in its first few months out in the real world. When you see them – we’d love to hear about them in our Forum.
New Granularity in News Categories (4.2)
We’ve added a number of additional news categories. Our news topic categorization capabilities have been expanded and now map quite closely to 17 top levels of the IPTC.
OpenCalais entidad extracción en español (4.2)
Beginning with release 4.2 OpenCalais will support entity extraction in Spanish. The detailed list of supported entity types are covered in the release notes.
Linked Data Breadth, Depth and Access (4.2)
We’ve significantly upgraded the Linked Data URIs for Company. The content is refreshed more frequently, company competitors are cross-linked to the appropriate OpenCalais Company URI and links to new information sources such as CrunchBase are now included. If you’re investigating Linked Data – particularly around Companies – you’ll find the changes especially useful. With one call to Opencalais and a few HTTP fetches you can build a complete picture of a company, its industry, its competitors, its location and many other items.
We’re also exposing our Linked Data endpoints in a new format: JSON. In addition to HTML and RDF you can now retrieve companies, geographies and any other Linked Data URI as JSON by appending .json to the URI or calling us with an appropriate caller type.
Opt-In Publishing of Document Metadata URIs (4.1)
By default OpenCalais stores all document level metadata as a Linked Data URI and makes is accessible via a (secret) identifier. This is useful if you want to share document metadata with someone else by providing them with a list of URIs rather than a massive file. Beginning with Release 4.1 this will shift from our default behavior to an opt-in function. No reason to make many millions of document-level URIs accessible if you don’t plan to share them later.
New and Enhanced Events and Facts (4.1)
Given the current…. environment many of our new events and facts focus on company performance and actions. We’ve added a wide range of event types including company accounting changes, labor issues, layoffs, earnings restatements, delayed filings and quite a few others. Go wild. We’re calling it the OpenCalais Recession Pack – and we hope it proves quite useless in the near future.
Yes, it’s true. We have had some bugs. Release 4.1 improves accuracy and reliability of a wide range of extractions and addresses a few specific processing errors we’ve discovered.
Release 4.1 is a small but important milestone for OpenCalais. Over the past 18 months we’ve dramatically expanded the range and improved the quality of our entity and event extraction capabilities. We’ve achieved our goal of providing the highest-quality entity extraction toolkit available and done it at a scale that’s surprised even us. In addition to entity extraction we’ve invested in the heavy lifting necessary for sophisticated fact and event extraction – which sets the stage for a whole new class of significantly more sophisticated semantically enabled applications.
With the release of Social Tags we are taking this capability into a whole new arena. While our current Social Tags database is based on Wikipedia and other public assets, we don’t plan to stop there. Social Tags is a general-purpose solution that can apply to any knowledge domain. As we move forward we’ll be investigating opportunities to provide tagging in areas ranging from economics to environment to politics. Social Tags is just the first step down an exciting path.