Open hash functions ? Can't find {dochash, genericHasher, comphash}-1 algorithms
Open hash functions ? Can't find {dochash, genericHasher, comphash}-1 algorithms
Posted on: Mon, 01/11/2010 - 16:34
I've been looking through the documentation and tried a few web searches, but couldn't find {dochash, genericHasher, comphash}-1 algorithms. Are these publicly (i.e., openly) defined? (These are the magic numbers used to uniquely, lexico-semantically ground entities and such in the OpenCalais RDF output format.)
Please advise.
- Luke
Trackback URL for this post:
http://www.opencalais.com/trackback/62308

Hello, Sumit -
Wanted to follow up here with a specific example from the documentation. What specific part of the text from the example person entity instance description below was md5sum'd to get : 39ff7fbd-7150-3225-be2d-a15d6fce1d34 ?
I tried:
echo "Alexander Graham Bell" | md5sum | sed 's/(........)(....)(....)(....)(............)/\1-\2-\3-\4-\5/' | awk '{print $1}'which returns:
8eeaefcb-cace-0e79-8bb0-cbbf6dc0615e<rdf:Description rdf:about="http://d.opencalais.com/pershash-1/39ff7fbd-7150-3225-be2d-a15d6fce1d34">
<rdf:type rdf:resource="http://s.opencalais.com/1/type/em/e/Person"/>
<c:name>Alexander Graham Bell</c:name>
<c:persontype>N/A</c:persontype>
<c:nationality>N/A</c:nationality>
</rdf:Description>
Thanks in advance,
- Luke
Hello, Sumit -
Thanks for the info! So what specific text do you take the MD5 hash of when creating:
Thanks,
- Luke
Hi Luke,
We use MD5 hashing to generate the unique keys for Open Calais extractions. Wikipedia can give you more details on MD5 hashing.
The different prefixes specify the type of entity. For example, docHash is appended in the URI to identify it is a document URI. Similarly comphash is for company and pershash is for person names. Currently for all other entities we use the genericHasher.
Hope this helps.
Thanks.
sumit