1. Yes – semantic metadata is critical to enhance search accuracy and relevance. It is also key to generate relevant alerts on entities , events and topics you care about (issuers in my financial portfolio or people I am tracking, etc).
2. A subject matter experts have to be involved in building metadata extraction platform. Not only technologists.
3. Metadata extracted and applied to content/documents – has to be based on high quality curated data, to be correct and relevant. For example: Knowing a person is working at a company is important to identify the identity of the person. Knowing what industries a company is active in is key to understand the context of a document, etc.
4. The world is changing. Companies change names, companies being acquired. Metadata has to be based on a high quality data that is always correct and up to date. There are so many NLP software out there. When you apply metadata to your content – you need to have an ongoing service and maintenance that maintain the currency and accuracy of the data. A standalone NLP software is nice for few days, but soon you will find you need a lot of daily maintenance and operations to maintain, correct and fix quality issues on a daily basis. Make sure you allocate enough resources to do that or have someone do that for you or use a service that has this ongoing maintenance and service build in.
5. Build vs. buy – building such capabilities, high quality data, and daily maintenance and service is very costly. Do the math and make a calculated decision if you really want to build such a big and expensive team (maybe you have to, due to specific domain of expertise or other), or acquire a service that does all that in a lower cost.
6. Partnership – if you plan to partner and acquire such a service, consider all the above first. Does the provider has all the above capabilities? If yes – great start. Then try to go for a service where the provider has stakes in it, has skin in the game – i.e. the provider cares to correct keep the quality high because the provider needs this for other use cases and platforms used by the provider regardless whether you buy it from them or not. Find a vendor that “eats their own dog food” or “drink their own champagne” . This is almost an insurance certificate that the quality will most likely be the best possible.
7. Metadata that attempts to understand the content – sometimes simple keywords are used as metadata. This might be sufficient for some use cases. If you want to make your platform more effective and relevant, try to find metadata that attempts to understand the content.
8. Deployed options – flexibility to decide if to use through API calling a cloud web service, or a service installed behind your firewall.