User offline. Last seen 1 year 25 weeks ago. Offline
Joined: 05/19/2008

There's a lot that I like about the Drupal module (especially the fact that it's done!) but I wonder about the information that gets thrown away when the tags are inserted into the taxonomy system, and about how the module could support semantic markup of the original text.

As I understand it, Calais is associating a different GUID with every discernably different "John Smith," but to Drupal's taxonomy system, all John Smiths are alike. Is that correct?

And what about supporting inline markup? The Drupal Way is to store the original text unmolested and apply processing on output, but that might not be the smart thing to do in this case, especially if we've lost some of the data that Calais generated.

As background for my question: We're working on implementing Drupal as the primary consumer-facing component in our newspaper websites. We're contemplating applying Calais to both legacy content (loaded either as a pull-feed or through XMLRPC push) from newsroom management systems, and to Web-first content from community bloggers and staffers.

 

 

Trackback URL for this post:

http://www.opencalais.com/trackback/2735

Login or Register to post a comment.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
User offline. Last seen 2 years 9 weeks ago. Offline
Joined: 12/31/1969

Just wanted to provide an update.

The Calais GUID has been integrated with Taxonomy terms as of 6.x 2.0. It is in the term_data table and is loaded with the terms.

As far as Microformats & RDFa, I am working with some of those pushing it forward in D7 to get a good, usable (and specifically compatible) implementation that we can try to put in place for D6.

User offline. Last seen 2 years 9 weeks ago. Offline
Joined: 12/31/1969

Hey Steve,


You are right in that all John Smith's currently would be views as the same, we are not entirely sure how to get around that, but it seems like more of a problem with Drupal's Taxonomy system than with the module (I'm one of the authors). The reason we wanted to make use of the taxonomy system is because of all the goodies already built upon it, but you are right, we should investigate some way around that. I can think of a few ways to add the GUID to the taxonomy term and that would be beneficial.


As far as markup is concerned, that is one of the next things on our list. To provide Microformats/RDFa markup of entities. We are investigating the proper way to take advantage of this and willb e rolling it out in our next big release. Any input in the process is, of course, great appreciated.


Many of our clients are in the publishing/journalism world and things that help one, typically will help all.


Thanks,
Frank