Open Calais has four ways to tell us what an unstructured document is about. We call it – what is the “aboutness” of a document.
The first and very simplistic and high level are the IPTC topics. This is the first level of 17 Subject codes shown here.
The second is the SocialTags. This is derived from Wikipedia taxonomy. Very broad taxonomy. Not finite and static list, but a dynamic one that is being updated as Wikipedia topics are being updated. It is a great representation by topics, that are “user generated” topics, always up to date, and covers major current events that are happening in the world. For example: The article would be tagged by new topics like “November 2015 Paris attacks”, and other relevant topics created in proximity to the actual events.
Many users praise the value of SocialTags to an unstructured document.
The forth is Industry codes taxonomy, derived from the intelligence gathered by Thomson Reuters Analysts. These Industry codes are based on the high quality curated data from Thomson Reuters, to automatically make an intelligent decision on – how to associate relevant Industries to an unstructured document. I.e. what industries a document is mostly about? This is taken from the Thomson Reuters Business Classification taxonomy.
Hope this helps to understand how Open Calais and Intelligent Tagging help us classify documents by using different taxonomies – from high level (IPTC) to professionals (Reuters Classification Schema and Business Schema) to a more dynamic and general taxonomy from Wikipedia.
For more questions please contact us at email@example.com