User offline. Last seen 1 year 46 weeks ago. Offline
Joined: 09/23/2008

A large proportion of the pages I'm sending  are returning error " Text length has exceeded the allowed size ."

 

This even includes longer Wikipedia pages. 

 

Is it recommended that I break pages up first before sending them? (that would be annoying)

 

I'd rather just pay to increase my text limit. 

 

  

Trackback URL for this post:

http://www.opencalais.com/trackback/7464

Login or Register to post a comment.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
User offline. Last seen 1 year 1 week ago. Offline
Joined: 03/08/2009

I have created a function in php which allows you to make chunks of the data based on the parameter provided by programmer. The function can be found at following address:

http://blog.leverlogic.com/userpages.php?postid=24

 

User offline. Last seen 15 min 36 sec ago. Offline
Joined: 04/30/2008

Thanks so much. It's great when the developer community helps each other.

User offline. Last seen 42 weeks 3 days ago. Offline
Joined: 05/16/2008

Currently submitted content is limited to 100,000 characters per "transaction" (otherwise you get that error message). We may increase this limit in the future.
If you have specific needs for submitting larger texts, please drop us a note at questions@opencalais.com.

User offline. Last seen 1 year 46 weeks ago. Offline
Joined: 09/18/2008

For a service like SemanticProxy, it's a pain to have to break it up manually. Could you just automatically analyze the first 100,000 characters and return what you can, instead of failing out entirely?

Tom
User offline. Last seen 12 weeks 4 days ago. Offline
Joined: 05/07/2008

While I like the idea of just processing the first 100K characters - I'm worried about truncating the analysis without a mechanism for informing the user that it wasn't a full analysis.

Thoughts? Ideas?

User offline. Last seen 1 year 46 weeks ago. Offline
Joined: 09/23/2008

Raise an exception, or insert a message. So long as its consistent, it doesn't matter too much how the error is added to the response.



I went ahead and wrote a script to cutoff webpages at 100k characters and perform a semanticproxy analysis. I'm getting Parsing Error returned each time. That's not much of a surprise considering that I'm cutting off HTML documents in arbitrary places, but it does mean that this issue needs to be addressed before I can consider using SemanticProxy for my service.

User offline. Last seen 1 year 12 weeks ago. Offline
Joined: 10/05/2008

I encountered this today working on a project that will use Open Calais. One thing I found that helped was to strip out all javascript and css before submitting my html content. This might be a good idea for the Semantic WebProxy. In Python all I did to cut down the character count for the urls I was pulling is

# typed from memory, working on a separate machine, excuse any typos please.
html = re.sub('\n', '', html)
p = re.compile('<script.*?</script>|<noscript.*?</noscript>|<style.*?</style>', re.IGNORECASE)
html = p.sub('', html)

After doing this I haven't hit the limit on any pages yet.

Tom
User offline. Last seen 12 weeks 4 days ago. Offline
Joined: 05/07/2008

We're working on it. We plan to extend the size limit in the near future to deal with content such as Wikipedia articles - we'll also improve the error messaging in our next release.