Calais Yahoo! Pipes Web Service - Overview and Operation
Introduction
The OpenCalais Web service can analyze text and provide rich semantic data for submitted text. With Yahoo! Pipes users can easily create custom RSS feeds. The Calais Pipes Service allows Yahoo! Pipes users to enrich their custom RSS feeds with semantic metadata.
Users have two options for using the Calais Pipes Service:
- Install and configure the Pipes Service on your own server.
- Select the "Web Service" Pipe template from Yahoo! Pipes and paste the link below. Replace "[Your Calais API Key]" with a valid Calais API key and be sure to remove the square brackets).
http://pipes.opencalais.com/CalaisPipes/CalaisPipes?licenseID=[Your Calais API Key]&richLinks=true
This document describes the Calais Pipes Service and offers guidance on its operation and optimization, regardless of the option you select. Companion documentation provides instruction if you choose to install and run it on your own server.
About Calais
The OpenCalais Web service analyzes text and provides semantic data such as:
- Names of persons mentioned in the text
- Names of companies mentioned in the text
- Events described in the text such as bankruptcies and mergers
The results of the analysis can later be used to evaluate the text, by humans or machines. For example, a person interested in news stories relating to a specific merger, might find the Open Calais semantic analysis useful in determining the relevance of a specific article. More information about Calais can be found at http://www.opencalais.com.
About Yahoo! Pipes
Yahoo! Pipes allow users to create custom RSS feeds by combining information from various sources. For example, a user can combine two different RSS feeds into a single feed, or filter out specific items from an existing feed.
The customized feeds are created using the Yahoo! Pipes editor and are free to use for anyone with a Yahoo! ID. It is possible to extend Yahoo! Pipes functionality by providing Web services that introduce functionality not available in the editor.
For more on Yahoo! Pipes visit http://pipes.yahoo.com.
Calais Yahoo! Pipes
Calais Yahoo! Pipes Web Service extends Yahoo! Pipes' functionality by allowing the following:
- Add brief semantic data to feed items' descriptions in human readable form
- Redirect feed items' links to the Calais Semantic Proxy, where the original item text will be presented with full semantic data information
For example, consider the following RSS feed item from CNN as it is presented in Google Reader:
Using Calais Pipes Service the item will be changed to the following:

Clicking on the item will lead either to the original story on CNN.com or to an enriched version of it on the Calais Semantic Proxy. This depends on the pipe creator.
Technical Background
Understanding the technical background of the Calais Pipes Service is important, so that you may make full use of its features and understand its limitations.
Service Operation
The Calais Pipes Service is invoked from within a Yahoo! Pipe. When called it receives a list of RSS feed items, typically news stories. For each item there's a link to the full story, a title and a description. The title and description are usually displayed in the news reader for the user to examine. The link is followed when the user clicks on the item in the news reader.
When Calais Pipes Service receives the list of items it does the following for each item:
- Retrieve the original page by following the provided item's link
- Call the OpenCalais Web service with the page contents and retrieve semantic data
- Append
a short list of entities and events identified by OpenCalais - the chosen items are those that appeared most frequently in the text - Optionally replace the item's link with a link to the Semantic Proxy for a visual presentation of the semantic data
The result is then sent back to the calling Yahoo! Pipe.
Benefits
The resulting information is based not only on the information sent from the pipe. It is based on the full text of the story represented by the item. This means that the new item description, after the service processed the data, can include information that was not available before.
The pipe creator can then make use of the full range of the Yahoo! Pipes editor's features on the new data. For example, suppose you want to give special attention to items referencing Washington State. The word Washington may not have appeared at all in the item's title or in the short description. It may have appeared in the description referring to the city - Washington DC - or as a person's last name.
If you place a filter (a module available in the Yahoo! Pipes editor) before the call to Calais you may miss items you intended your filter to get. Or you may find that your filter handles irrelevant items. If you place the filter after the Calais call, you can make the filter more accurate and your Yahoo! Pipes more relevant.
Limitations
The benefits come at a price. The process of fetching the original story text from the Web and then sending it over to Calais for processing consumes time and bandwidth.
As a result, the retrieval of an RSS feed that contains many items may take as long as 60 seconds, sometimes more.
Keep this in mind when you design your pipe. It might make more sense to call Calais on fewer items than the entire feed.
Using Calais Pipes Service
The use of the Calais Pipes Service is described in this section. It is assumed you spent some time creating pipes with the Yahoo! Pipes editor.
The Web Service Module
To invoke the Calais Pipes Service from within a Yahoo! Pipe you will need to use the Web service module. This module appears in the Operators section of the editor. It looks like this:
To invoke the Calais Pipes Service enter the following URL into the top edit box:
http://pipes.opencalais.com/CalaisPipes/CalaisPipes?licenseID=[Your Calais API Key]
You need to replace "[Your Calais API Key]" with a valid Calais API key and be sure to remove the square brackets). To obtain an API key see http://www.opencalais.com.
The value of the bottom edit box depends on previous options you used in the pipe. Usually you should just enter the word 'items' in this box. The Calais Pipes Service doesn't change the general structure of the feed (and therefore the path to the item list). If it was changed before, enter the changed value instead.
Connections
The new module needs to be connected to other elements in your pipe. At its entry point (the top) it expects a list of feed items, such as what's returned by the Fetch Feed source. The output of the module is a list of feed items, and can be connected to any element in the pipe that can process such a list (e.g., Pipe Output for presentation of the results).
A resulting pipe may look like this:
In this example the above pipe would attach semantic data to the description of each item (in this case from the CNN World News RSS feed). The links of each item will remain the same, so clicking on such an item within a news reader will lead to the original story.
If you wish to replace links to point to the Calais Semantic Proxy, so that clicking on the item will lead to the original text with rich semantic data attached, add '&richLinks=true' at the end of the Web service URL. The URL should now look like this:
http://pipes.opencalais.com/CalaisPipes/CalaisPipes?licenseID=[Your Calais API Key]&richLinks=true
You need to replace "[Your Calais API Key]" with a valid Calais API key and be sure to remove the square brackets).
Optimization
The pipe constructed in the previous section may take a long time to execute, as the number of items in the feed can be large. You can use the Sort and Truncate operators to speed things up. Attach the Fetch Feed element to a Sort element. Sort by item.pubdate in descending order to get the newest items. Connect the Sort element to a Truncate element which removes any item after the 5th. Connect the truncate element to the Web service element.
Your new pipe should look like this:
This means that your new feed will return the five newest items in the feed, after processing by Calais. Note that this does not necessarily mean that subscribers to your pipe will miss any items. This depends on how often the feed you entered in the Fetch Feed is updated, and on the implementation of particular news readers. Google Reader, for example, will keep record of items published on your pipe and accumulate them over time.
Publication
The final step is to publish your pipe. Use the Properties button in the Yahoo! Pipes editor and click publish. You and your audience can now access the custom RSS feed.
Go to My Pipes and click View Results on the new pipe. You should see a list of items with the Calais information appended to the description of each item. Click More Options and then Get as RSS. This is the link that should be added to news readers in order to use the new pipe.
Advanced Uses
The pipes above show the most basic integration of the Calais Pipes Service into Yahoo! Pipes. Naturally, you can use the flexibility of Yahoo! Pipes (and perhaps call other Web services as well) to construct your own pipes.
This section provides some useful examples, but naturally covers the tip of the iceberg.
Post Calais Filtering
One advanced use is to introduce a filter after the Calais Pipes Service call in the pipe. In this example you would connect the output of the Web service module in your pipe to a Filter element.
It makes sense to filter based on the description, since this is the part changed by the Calais Pipes Service. If you want to filter based on other elements, it is better to do it before the Web service module, so
that your pipe runs faster.
Suppose you are interested in a particular athlete and want your pipe to include stories that mention this athlete. There may be stories that do not mention the athlete in the title or within the short description available in the RSS feed. You can construct a pipe similar to the ones presented earlier, and add a filter element in the following way:
By noting 'Person - Kobe Bryant' in the filter rule, your filter matches the syntax of information appended by the Calais Pipes Service. This would result in filtering based on this information.
Regular Expressions
Using regular expressions you can modify the appended information in items' descriptions. The format of the appended information does not change, and therefore can be easily manipulated with regular expressions. The format is:
(OpenCalais found: <Entity1> - <Name1> [N times], <Entity2> - <Name2> [N times], ...,
<Entity5> - <Name5> [N times], Event - <Event1> [N times], Event - <Event2> [N times], ...)
Note that 'N times' is replaced by 'once' when N is 1. Also there could be less than 5 entities, no entities at all, no events, etc.
Using the Yahoo! Pipes editor Regex module you can change the way information is presented to your feed user, and remove information. For example, suppose your custom feed users would not benefit from information about persons appearing in the text. However, you would like them to see information about other entities, such as Companies, Countries, etc.
Attaching the following Regex module after the Calais Pipes Service module in your pipe will remove any persons from the appended text:
Note that the 'with' edit box in the first rule is empty. The word text in italics is added by the editor itself.
Start Exploring...
To get started quickly you can use the Calais Sample Pipe. This pipe is publicly available to all Yahoo! Pipes users. You can clone it, put in your Calais API key, and start using Calais Pipes.
Note that the pipe will not work until you enter a valid Calais API key. The Web service module's URL in the sample pipe is:
http://pipes.opencalais.com/CalaisPipes/CalaisPipes?licenseID=[Your Calais API Key]
Replace '[Your Calais API Key]' with a valid Calais API key (be sure to remove the square brackets '[]' as well.) Now you can start exploring...
