The initial set of 6 Document Categorizations in the Calais Categorization Taxonomy map in a very obvious way onto 6 of the 17 IPTC Descriptive NewsCodes Taxonumy subjects as shown below. IPTC currently defines 17 such top-level Subject codes which are further subdivided into SubjectMatter codes and further into SubjectDetail codes (totalling about 1300 terms).
Can you comment on the intended relationship between the Calais Categorization Taxonomy and the IPTC Descriptive NewsCodes Taxonumy going forward? Will new Calais categories necessarily map to IPTC Descriptive NewsCodes? Will there be further sub-divisions (and if so, will these map to the IPTC SubjectMatter and SubjectDetail codes)?
The following is the obvious mapping between current Calais and IPTC taxonomies:
Calais: Business_Finance
IPTC: economy, business and finance (04000000)
Calais: Entertainment_Culture
IPTC: arts, culture and entertainment (01000000)
Calais: Health_Medical_Pharma
IPTC: health (07000000)
Calais: Politics
IPTC: politics (11000000)
Calais: Sports
IPTC: sport (15000000)
Calais: Technology_Internet
IPTC: science and technology (13000000)
Calais: Other

Comments
John -
I guess this shouldn't come as a surprise that the Calais taxonomy has similarities to the IPTC NewsCodes - after all, news content (publishers, aggregators etc.) is one of our target audiences.
We can't commit to exact mapping between the Calais taxonomy and NewsCodes in all future releases, but it is very likely that many high-level topics in Calais will map to NewsCodes top-level subjects.
Regards,
Thanks for the info. And it is nice to see the categorization in action in R3. Do you plan to offer any form of user-defined categories or will document categorization remain limited to the (expanding) predefined Calais taxonomy?
John -
Down the road we do plan to have some sort of user-defined elements in OpenCalais. Whether these are document-level categories or new entity types (or both) -- we still don't know, but the idea is to provide a certain level of "flexibility" for users to define what they need.
Regards,
Michal