User offline. Last seen 3 years 18 weeks ago. Offline
Joined: 09/04/2008

I just heard Tom talk at the SDForum SIG.  So I got an account, a key, downloaded the python API and the rdflib and ran some examples.

calais.analyze("Zenith National Insurance Corp. (NYSE:ZNT) reported net income of $20.8 million for the fourth quarter of 2003.")
gave:

5071Ffef-A83B-32D4-8Ee3-A1B0Ecceed9C :: Zenith National Insurance Corp. (Company)
736A403D-157B-3E80-86A7-Acc404607Cb2 :: Usd (Currency)

however a lot seemed to lose info, e.g.  calais.analyze("Health Web site drkoop.com Inc. said on Tuesday it received $20 million in equity financing, a new management team and a reconfigured board.")
just gives:

736A403D-157B-3E80-86A7-Acc404607Cb2 :: Usd (Currency)

Am I missing something?  Do the examples work, but the python port is old? Or is some stuff not implemented yet?

Thanks,

Bill

  

Trackback URL for this post:

http://www.opencalais.com/trackback/5940

Login or Register to post a comment.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
User offline. Last seen 3 years 18 weeks ago. Offline
Joined: 09/04/2008

Thanks.  A few other suggestions: 

1) in print summary:

        if not hasattr(self, "doc"): return None   #BP
 

2) replacing

not content

with

(0 == len(content.strip()))

lets me handle the case where two spaces are passed in.

 

3) in init

        self.raw_response = json.load(StringIO(raw_result))
fails when an XML error string is passed in, I did this, but there is probably a better way.

        if "<Error" in raw_result: #BP
            #print raw_result
            self.raw_response = None
            self.simplified_response = None
            return
 

 

User offline. Last seen 8 weeks 1 day ago. Offline
Joined: 12/31/1969

Bill,

Thanks  again for your suggestions.  I've integrated 1) and 2) into the code.  For 3) I throw a ValueError exception, passing it the error XML as a parameter, so that when it does break, people will know what error was returned by Calais. 

~ Jordan

P.S. Send me an email if you'd like to be added to the python-calais project members on Google Code. 

User offline. Last seen 8 weeks 1 day ago. Offline
Joined: 12/31/1969

I have just released python-calais v.1.0, which is a complete rewrite.  This will fix your problems.  Download from: http://code.google.com/p/python-calais/

 

~ Jordan

User offline. Last seen 3 years 18 weeks ago. Offline
Joined: 09/04/2008

Thanks, this produces a much better result.  A few things I noticed.

1) in the example file, you probably want to replace:
calais.print_summary()
with
result2.print_summary()

2) sometimes calais returns with no relation (or topic or entities, etc.).  In that case, the attribute does not exist in the object and the print routine (print_relations for example) fails.  As such I patched my copy of your calais.py by making the print routine's first line something like:
        if not hasattr(self,"relations"): return ""   #BP

3) putting in an empty string to calais.analyze, e.g.
result = calais.analyze("")
fails in CalaisResponse.init.

<Error Method="ProcessText" calaisRequestID="66bc6511-717f-4bce-b004-d2aa1700ae38" CreationDate="2009-01-10 23:41:09" CalaisVersion="R3.1_7.1.1160.5"><Exception>content is invalid: 'Empty' .</Exception></Error>

I patched mine by checking for an empty string in analyze input.

Thanks again,
Bill

User offline. Last seen 8 weeks 1 day ago. Offline
Joined: 12/31/1969

Thank you very much for the feedback, bill.  I've made the changes you suggested.

~ Jordan

User offline. Last seen 3 years 18 weeks ago. Offline
Joined: 09/04/2008

I think I misplaced the reply.  Please see above.

Thanks,

Bill

User offline. Last seen 2 years 27 weeks ago. Offline
Joined: 05/16/2008

Bill - I tried the two examples with the Calais Viewer. It's a nice tool that shows you visually which entities, events and facts are extracted from a given text. For the first examples I got the two entities (Company, Currency), but I also got two events: CompanyEarningsAnnouncement and CompanyTicker. Perhaps the Python API isn't configured to show event types, just entities? (there are other user-developed tools that highlight entity extraction only, so it might be the case here) Regarding the second example, Calais indeed extracted just the currency. I believe there's some problem with the identification of lower-case company names. When I changed the company name to capital 'D' (Drkoop.com), I did get some more interesting results (company name extracted, and the event CompanyInvestment extracted as well). We will look into the issues around identification of company names in lower-case and will fix it in one of the upcoming versions. Regards, Michal

User offline. Last seen 3 years 18 weeks ago. Offline
Joined: 09/04/2008

Thanks Michal, the tool is helpful. The Dr. Koop example is from the "CompanyInvestment" pattern here (http://opencalais.com/node/2514). Besides the company name issue, I had hoped to see "Status" as well. Does the viewer show that also somewhere?
I also tried the example "Expertcity, Inc. is the leading provider of ... Investors include Sun Microsystems" and was unable to see either Sun or status.

I put in a few of my examples from Thursday. E.g.
Distance - "10 miles/16 km northwest of San Francisco."
Time - "Best times: April through June, September through October"
and a few others. However, it looks like there is no special support for time or distance concepts yet. Is that correct?

Thanks,
Bill

User offline. Last seen 2 years 27 weeks ago. Offline
Joined: 05/16/2008

Bill - you can see the extracted attributes (such as Company, Status etc.) in a tool-tip if you hover over a highlighted mention in the text area of the Calais Viewer, or if you hover over the specific mention on the left pane - under the relevant event/fact type.
We will look into why these examples aren't extracted properly (with the lower-case drkoop.com).
Regarding Distance and Time - true we still don't have metadata elements to identify these concepts, but we're always happy to add user wishes to our development pipeline. So these will be added and prioritized based on other wishes submitted.

Regards.

Tom
User offline. Last seen 1 year 7 weeks ago. Offline
Joined: 05/07/2008

Bill:

Also - you definitely want to give it a try with some large blocks of text. Try a news story or something similar and see how it works for you.

Tom

User offline. Last seen 3 years 18 weeks ago. Offline
Joined: 09/04/2008

Tom,
Thanks for the reply. I actually ran 345 examples Thursday from a travel DB, using both opencalais api and beta urls, each example segmented by location, cost, time and description so 345 x 2 x 4 examples altogether. Results are not at the point where I can use the service, but that is probably due to the unsupported python interface. I look forward to the python update, or if that is not on your path, I will try to fix it myself myself in a month or so.

Thanks, and good presentation,
Bill

Tom
User offline. Last seen 1 year 7 weeks ago. Offline
Joined: 05/07/2008

Bill:

We'll take a look and also try and get the author to bring his code up to date - everything's working - it just looks like the python stuff hasn't been updated in a bit.

Tom