Technical Background
Understanding the technical background of the Calais Pipes Service is important, so that you may make full use of its features and understand its limitations.
Service Operation
The Calais Pipes Service is invoked from within a Yahoo! Pipes. When called it receives a list of RSS feed items, typically news stories. For each item there's a link to the full story, a title and a description. The title and description are usually displayed in the news reader for the user to examine. The link is followed when the user clicks on the item in the news reader.
When Calais Pipes Service receives the list of items it does the following for each item:
- Retrieve the original page by following the provided item's link
- Call the OpenCalais Web service with the page contents and retrieve semantic data
- Append
a short list of entities and events identified by OpenCalais - the chosen items are those that appeared most frequently in the text - Optionally replace the item's link with a link to the Semantic Proxy for a visual presentation of the semantic data
The result is then sent back to the calling Yahoo! Pipes.
Benefits
The resulting information is based not only on the information sent from the pipe. It is based on the full text of the story represented by the item. This means that the new item description, after the service processed the data, can include information that was not available before.
The pipe creator can then make use of the full range of the Yahoo! Pipes editor's features on the new data. For example, suppose you want to give special attention to items referencing Washington State. The word Washington may not have appeared at all in the item's title or in the short description. It may have appeared in the description referring to the city - Washington, DC - or as a person's last name.
If you place a filter (a module available in the Yahoo! Pipes editor) before the call to Calais you may miss items you intended your filter to get. Or you may find that your filter handles irrelevant items. If you place the filter after the Calais call, you can make the filter more accurate and your Yahoo! Pipes more relevant.
Limitations
The benefits come at a price. The process of fetching the original story text from the Web and then sending it over to Calais for processing consumes time and bandwidth.
As a result, the retrieval of an RSS feed that contains many items may take as long as 60 seconds, sometimes more.
Keep this in mind when you design your pipe. It might make more sense to call Calais on fewer items than the entire feed.
