User offline. Last seen 1 year 48 weeks ago. Offline
Joined: 12/31/1969

When processing the following URL:

http://news.google.com/news/url?sa=T&ct=us/2-0&fd=R&url=http://www.bloomberg.com/apps/news%3Fpid%3D20601068%26sid%3DamWount.ezqk%26refer%3Dhome&cid=1319023909&ei=K2HLSdPGMoecMpCp0L8M&usg=AFQjCNHOPoxYZ9AWVlS42Rv4QGMgDHUOKw

Which is obviously a google news redirect, I get the following error.

Content Permissions Validator Exception ! Domain: news.google.com Disallows: /news

What does this mean? Why do I get this? And is the only way around this problem to extract the destination URL from the query string and submit that? Or is there a better way? (I'm doing my processing automatically off of RSS feeds)

Thanks,

Frank

Trackback URL for this post:

http://www.opencalais.com/trackback/18883

Login or Register to post a comment.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
User offline. Last seen 41 weeks 2 days ago. Offline
Joined: 12/31/1969

Hi Frank,

We comply with robots.txt and meet this robots.txt contract for every site. Unfortunately Google do not permit robots to read their news.

Look at:

  
User-agent: *
Allow: /searchhistory/
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /catalogs
Disallow: /catalogues
Disallow: /news

Thanks,

Ofer

User offline. Last seen 1 year 48 weeks ago. Offline
Joined: 12/31/1969

I see. Thank you Ofer, better to abide than be banned.
 
I guess I will have to invent a few patterns to help me extract the source URL and request it directly via SP.  Thanks for the crystal clear explanation.