Topic Codes / Classification – four different ways for "Aboutness"

Open Calais has four ways to tell us what an unstructured document is about. We call it – what is the “aboutness” of a document.

 

The first and very simplistic and high level are the IPTC topics. This is the first level of 17 Subject codes shown here.

 

The second is the SocialTags. This is derived from Wikipedia taxonomy. Very broad taxonomy. Not finite and static list, but a dynamic one that is being updated as Wikipedia topics are being updated. It is a great representation by topics, that are “user generated” topics, always up to date, and covers major current events that are happening in the world. For example: The article would be tagged by new topics like “November 2015 Paris attacks”, and other relevant topics created in proximity to the actual events.

Many users praise the value of SocialTags to an unstructured document.

 

The third is the Reuters Classification Schema used by Reuters Editorial News Agency  and Reuters.com. This taxonomy is in very high quality and available as part of Intelligent Tagging.

 

The forth is Industry codes taxonomy, derived from the intelligence gathered by Thomson Reuters Analysts. These Industry codes are based on the high quality curated data from Thomson Reuters, to automatically make an intelligent decision on – how to associate relevant Industries to an unstructured document. I.e. what industries a document is mostly about? This is taken from the Thomson Reuters Business Classification taxonomy.

 

I encourage you to play with the API and with the viewer and check this out.

 

Hope this helps to understand how Open Calais and Intelligent Tagging help us classify documents by using different taxonomies – from high level (IPTC) to professionals (Reuters Classification Schema and Business Schema) to a more dynamic and general taxonomy from Wikipedia.

 

For more questions please contact us at questions@opencalais.com

Ofer Harari

Join the Discussion

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Be the first to leave a comment. Don’t be shy.