User offline. Last seen 26 weeks 4 days ago. Offline
Joined: 08/25/2008

(I intended to write this as a reply to your "Diffs Explained" entry but for some reason when I composed a reply there was no link for adding File Attachments, so I created this as a new topic.)

Thank you for your explanation and am glad to hear that we should expect the same results from REST and SOAP. However, I resubmitted to SOAP then REST then SOAP again (all within a few minutes) and confirmed that I got different relevance scores between REST and SOAP but the same relevance scores from both SOAP submissions.

The attached files illustrate this as well as a more disturbing result:  Calais finds "The Atlanta Journal-Constitution" as a Published Medium when submitted via SOAP but not when submitted via REST POST.

I've attached the input text and the resulting SOAP rdf (includes a Published Medium) and REST rdf (no Published Medium). Note: my REST submission goes through a translation from windows-1250 to UTF-8 as well as url-encoding before being posted. Also note that the REST rdf does identify The Atlanta Journal-Constitution as a company but not as a Published Medium.

Thanks for looking into this,

John

Trackback URL for this post:

http://www.opencalais.com/trackback/27525
AttachmentSize
Georgia Techs quarterback depth established.txt4.63 KB
Georgia Techs quarterback depth established.txt - SOAP.rdf122.34 KB
Georgia Techs quarterback depth established.txt - REST.rdf122.16 KB

Login or Register to post a comment.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
User offline. Last seen 2 years 32 weeks ago. Offline
Joined: 06/01/2008

Hi John,

Thank you   -   this is great news.

 

I can confirm that opencalais replaces each '\r' char in the content with the space char.

It is up to you to decide whether you wish to strip those before sending, or to get them transformed into spaces.

Obviously, they are replaced, and not removed, in order to maintain the correctness of the offset-length values of the extraction results.

Regards

Meir

 

User offline. Last seen 2 years 32 weeks ago. Offline
Joined: 06/01/2008

Hello John,

I think I got it  :  it is the newlines issue, in a reversed form:  your REST requests somehow drop the newlines, and so the results are different.

Usually we face such issues with SOAP requests, due to the '\r' char, but in your case all newlines are "\r\n",
and so they 'survive' the SOAP transmission, appearing just as '\n' on our server end, which is enough.

I'm now getting the PublishedMedium through both SOAP and REST, if I make sure newlines appear in the text.
Also, if I omit them, I'm back to the former results, without the PublishedMedium.

So, it is definitly the newlines thing you should be investigating in your REST code:
Somehow they seem to get lost.  (which may also account for the relevance score diffs)

HTH
Meir
 

User offline. Last seen 26 weeks 4 days ago. Offline
Joined: 08/25/2008

Meir,

Thank you very much for your time and insight on this issue. The UrlEncode mechanism I was using translated all "isspace()" characters to '+' (including '\r' and '\n'). Changing it to only translate actual 0x20 space characters to '+' and to drop any '\r' characters entirely and translate all other "isspace()" characters to their appropriate %Hex encodings seems to give me exactly the same results via REST as via SOAP.

Can you confirm that I should strip '\r' when sending via REST. If I instead send '\r' through as its %Hex equivalent, it gets returned as a space when it appears within a detection (which doesn't happen via SOAP since SOAP strips them).

Thanks,

John

User offline. Last seen 2 years 32 weeks ago. Offline
Joined: 06/01/2008

Hello again,

I forgot to ask:

Are you passing a paramsXML, or are you using the default (by supplying it as null) ?

If you are passing it, could you please:

1.  Double-check and make sure it is the same for soap and for rest.

2.  Post it here, removing any private info.

Tx

Meir

 

 

 

User offline. Last seen 2 years 32 weeks ago. Offline
Joined: 06/01/2008

Hello John,

I tried to reproduce this here, but failed again:

I get a consistent reply that matches the one you get through your REST call.

Just to be on the safe side:

Could you please double-check the URLs, make sure they all point to api.opencalais.com ,

and not, by chance, to beta.opencalais.com ?

I must say I'm puzzled by this:

Is there any option a cache is used somewhere locally on your side ?

Can you resbmit to soap, adding a few extra non-whitespace chars at the end of the doc ?

This should rule-out any caching mechanism.

Tx,

Will keep you posted on anything new I have on this,

Meir

 

 

 

 

User offline. Last seen 26 weeks 4 days ago. Offline
Joined: 08/25/2008

I added a couple word to end and the SOAP response still includes the PublishedMedium entry and different relevance scores from the REST response submitted just minutes apart.

The SOAP url in the .net web reference is

Here is the ParamsXml:

"<c:params xmlns:c="http://s.opencalais.com/1/pred/">
  <c:processingDirectives  c:outputFormat="xml/rdf" c:enableMetadataType="GenericRelations,SocialTags" docRDFaccessible="false" />
  <c:userDirectives c:allowDistribution="false" c:allowSearch="false" c:externalID="1" c:submitter="TestCalais" />
  <c:externalMetadata />
</c:params>"

 

http://api.opencalais.com/enlighten/calais.asmx