User offline. Last seen 2 years 17 weeks ago. Offline
Joined: 01/11/2010

I've been looking through the documentation and tried a few web searches, but couldn't find {dochash, genericHasher, comphash}-1 algorithms.  Are these publicly (i.e., openly) defined?  (These are the magic numbers used to uniquely, lexico-semantically ground entities and such in the OpenCalais RDF output format.)

Please advise.

- Luke

Trackback URL for this post:

http://www.opencalais.com/trackback/62308

Login or Register to post a comment.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
User offline. Last seen 2 years 17 weeks ago. Offline
Joined: 01/11/2010

Hello, Sumit -

Wanted to follow up here with a specific example from the documentation. What specific part of the text from the example person entity instance description below was md5sum'd to get : 39ff7fbd-7150-3225-be2d-a15d6fce1d34 ?

I tried:
echo "Alexander Graham Bell" | md5sum | sed 's/(........)(....)(....)(....)(............)/\1-\2-\3-\4-\5/' | awk '{print $1}'
which returns:
  8eeaefcb-cace-0e79-8bb0-cbbf6dc0615e


<rdf:Description rdf:about="http://d.opencalais.com/pershash-1/39ff7fbd-7150-3225-be2d-a15d6fce1d34">
   <rdf:type rdf:resource="http://s.opencalais.com/1/type/em/e/Person"/>
  <c:name>Alexander Graham Bell</c:name>
  <c:persontype>N/A</c:persontype>
  <c:nationality>N/A</c:nationality>
</rdf:Description>

Thanks in advance,
- Luke

User offline. Last seen 2 years 17 weeks ago. Offline
Joined: 01/11/2010

Hello, Sumit -

Thanks for the info!  So what specific text do you take the MD5 hash of when creating:

  • dochash-1
  • comphash-1
  • genericHasher-1

Thanks,

- Luke

User offline. Last seen 1 year 27 weeks ago. Offline
Joined: 12/15/2008

Hi Luke,
We use MD5 hashing to generate the unique keys for Open Calais extractions. Wikipedia can give you more details on MD5 hashing.

The different prefixes specify the type of entity. For example, docHash is appended in the URI to identify it is a document URI. Similarly comphash is for company and pershash is for person names. Currently for all other entities we use the genericHasher.

Hope this helps.
Thanks.
sumit