User offline. Last seen 1 year 3 weeks ago. Offline
Joined: 06/11/2009

Hi,
I am an academic and am currently using entity enrichment for some machine learning problems. I have found the "Handbook of Data Mining" by Ye to have a wonderfull chapter on Clear-Forest and the engine that open-calais has evolved from, since in academic work I have to account for all system element details. 
However I would like to know how the system (at least roughly) computes the RELEVANCE SCORE!!? I woul very much appreciate if you could get back to me on this.
I presume it would most likely be some TF-IDF value of the Entity Term, normalised into 0-1 range. However for counts of 1 the score varies quite a bit so I wonder what this score is based on? Does it maybe take into consideration the relationships and types of relationships found with the entity in question?
Thanks a lot,
Martin

Trackback URL for this post:

http://www.opencalais.com/trackback/102567

Login or Register to post a comment.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
User offline. Last seen 1 year 3 weeks ago. Offline
Joined: 12/31/1969

We use some clues from the whole content of the document when we calculate the relevance. I hope this helps.
If you have some ideas to share with us we will be happy to discuss.

User offline. Last seen 1 year 3 weeks ago. Offline
Joined: 06/11/2009

I see, so you use maybe some heuristic rules to check for certain clues in the document in relation to a given entity, in order to up or down the entity's score... is that something you meant or am I missinterpretting you answer?!

I don't think I have any suggestions, at least not at this point, I'd simply like to know a little more about your current algorithm to work out the score. Even if you do not want to disclose too much, I'd appreciate just a taster of the cues your scoring routine would look for.

Originally I was going on about Ronen Feldman's "Mining Text Data" (21st) Chapter (on pp. 481-518) in Nong Ye's Handbook of Data-Mining, which gave a great overview of what Open Calais does behind the scene's to detect Entities and work out relationships. But I couldn't find any details on the Score feature, can you point me maybe to some literature or divulge a little bit more detail about how it's computed / what clues it might use??

Thanks very much for your time, I do appreciate this!
(Feel free to contact me directly via email if you prefer)

Martin