Hello, I am processing some texts with enlighten called via Perl thusly:
$calais->enlighten($buffer, contentType => 'TEXT/TXT', outputFormat => "application/json", enableMetadataType => 'GenericRelations,SocialTags');
The socialTags response in some cases contains unprintable characters despite the fact that the input text is plain ASCII. One example is at text containing "fatwa." The name value of the socialTag returned looks like this, broken down into characters:
Char
Decimal Byte Value
F
70
a
97
t
116
w
119
Ã
195
132
Â
194
129
The associated references are:
id: http://d.opencalais.com/dochash-1/50093719-3bea-3028-9bf8-1dde9d930d65/SocialTag/5
socialTag: http://d.opencalais.com/genericHasher-1/461dee2e-0177-3b46-9d10-384939c504f
I suppose the last four cases could represent two unicode chars, but I don't know why that would be given that there is only one more letter in the word, the first "a" got encoded fine, and the original text was sent in plain ASCII.
Does anyone know what is going on here?
Thanks,
Steve

Hi Steve,
Open Calais uses UTF-8 encoding for unicode characters. Since UTF-8 is backward compatible with ASCII (see more at http://en.wikipedia.org/wiki/UTF-8), you see the ASCII value for some and non-ASCII for others.
To preserve character encoding when sending requests (in case your text has unicode characters) and understanding responses (when Open Calais response has unicode characters), its best to use UTF-8 encoding. In JAVA this is done by specifying the encoding for input and output streams used to communicate with the web service, should be similar for Perl.
sumit