Text too long : error
Text too long : error
Posted on: Tue, 04/28/2009 - 02:21
Hey there, using Semantic Proxy I get this error maybe 5% of the time :
"SemanticProxy processing error: Internal Exception when trying to use Calais WS (RemoteException)) - Text length has exceeded the allowed size ."
Is it possible to know what is the maximum lenght, or if there is a way around this issue?
This is running with api1.opencalais.com using the latest release of the Drupal module.
Patchak
Trackback URL for this post:
http://www.opencalais.com/trackback/20963

Unfortunately, breaking the document into smaller chunks isn't going to cut it. OpenCalais returns the occurrence positions of extracted entities, and relevance ranking for entities, both of which only pertain to the uploaded document -- which in this case is just a chunk of something larger. It would be possible, albeit unwieldy, to offset all the occurrence positions by the proper amount. The relevances are the bigger problem, as none of them make sense in relation to the document as a whole.
Any workarounds?
Hey there, thanks for this! It's great! We'll tell you how it goes soon, we should launch our site really soon , and it's going to be huge ;)
Patchak
Not sure how you use the api1.opencalais.com and how you use Semanticproxy. Semanticproxy uses the Open Calais api.opencalais.com (which is the same as api1.opencalais.com) directly - all you need is to provide in Semanticproxy is the web link of the news article plus Calais API key (and specify the output format - RDF, JSON or HTML).
Anyway, I will check the length issue in more deapth and update.
Thanks,
Ofer
Hi,
We have increased the size of HTML document Semanticproxy can process. The limit still stays 100KB for the actual meanigful part of the HTML page. I.e. - the main article of this HTML page (after extracting all HTML sections that are not related to the main article/content) must be smaller than 100KB.
Let us know if this works for you.
Ofer
I want to submit close to a megabyte worth of data, how can I do this for a single document?
Hi,
The best way to get this done is break your document into smaller chunks (somewhere in the 25-40 kb range), submit the smaller docs individually and merge the returned results.
Please note that you will have to use the Open Calais API for this and cannot use Semantic Proxy.
sumit