It's about understanding, (not the keywords…)

We want the automation NLP tools to help us improve search of documents, generate accurate and relevant alerts, help us filter out what is not relevant and quickly find what we need. In other cases also find what we didn’t know we need.


To do that we need the automation to really understand what those documents are about. Not only extract keywords.


Let’s take a simple example (financial domain is area Open Calais is strong at, so let’s take a more generic one: a non financial document). .


If you look at the topics, and on SocialTags – very relevant to the content and help us understand what the document is about. These are not keywords extracted. This is about understanding what the document is about.

Company properly is extracted, normalized (different names for the same company normalized to a single correct company name) and resolved.

People correctly identified as people and their full names.

It takes a LOT of years of research, constant enhancement and maintenance, to leap from keywords extraction to understanding.


Of course there will be (many) cases where Open Calais will provide incorrect metadata.

The point is to show the differentiation between simple superficial keywords extraction vs. a real attempt to understand what the content is about, and the context. This is key to help us achieve game changing experience – a significant better relevant and focused search, relevant alerts, filter out irrelevant content and find insights we are looking for (and those we didn’t know we need to look for).

Ofer Harari

See also

What’s in it for me

Relevance vs confidence

Join the Discussion

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Be the first to leave a comment. Don’t be shy.