Two c:subject
Two c:subject
Posted on: Sun, 04/19/2009 - 13:53
Hi!
If the text from http://geimint.blogspot.com/2009/04/dragons-fire-plas-2nd-artillery-corp... is sent to Calais the following appears in the RDF response:
<rdf:Description rdf:about="http://d.opencalais.com/er/geo/country/ralg-geo1/8a7d7ba2-88ca-0f0e-a1ec-f975b026e8e1">
<rdf:type rdf:resource="http://s.opencalais.com/1/type/er/Geo/Country"/>
<c:docId rdf:resource="http://d.opencalais.com/dochash-1/2b42292b-e25f-34f8-a632-4012663fd2a2"/>
<c:subject rdf:resource="http://d.opencalais.com/genericHasher-1/3d1db47c-b851-32ab-af0b-0bffa4dad5d1"/>
<c:name>China</c:name>
<c:shortname>China</c:shortname>
<c:latitude>32.9042932784</c:latitude>
<c:longitude>110.467708512</c:longitude>
<c:subject rdf:resource="http://d.opencalais.com/genericHasher-1/f37b4fb4-c237-3f84-80ce-8ee520395b8c"/>
</rdf:Description>
Note that it has two <c:subject> items. What does it mean to have two <c:subject> like this?
Cheers,
Lars
Trackback URL for this post:
http://www.opencalais.com/trackback/20394

Hello,
Having two (or more) subject in a resolution node is very-much OK:
Resolution nodes hold in their c:subject field the ID of the entity/entities they refer to.
Having more than one such, means that this node serves as the resolution node for more than one entity.
This situation is rare, but still must be taken into account:
a resolution node refers both to 'China' and to 'Republic of China' .
You could argue that these two entities should have been normalized into a single entity, and you would be absolutely right, but you should note this:
The semantic extaction and the dis-ambiguation occur on seperate and independent code layers, and so it is possible the the semantic extraction misses the normalization of two entities, and then the dis-ambiguation would "normalize" them into one, by having a single resolution node for them.
All in all, it seems that these cases are rare and should mostly be eliminated by semantic-extraction normalization, but we cannot guarantee "zero" count of such; I will forward this case to the semantic-extraction guys, to check why normalization did not kick-in in this case, but the general say here is: it could (and probably will) surface again, and users should be aware of this possibility.
HTH
Meir