I'm finally about to get serious implementing some ideas around the latest calais linked data support. However, dealing with RDF, especially when dereferencing URIs, can come with a high price tag for large scale deployment: bandwidth. Some RDF documents can become insanely huge! XML based RDF shoud be a good candidate for applying compression to take care of this issue. However, I wasn't very successful getting a compressed response from calais as well as other sites such as freebase and dbpedia.
I'm sending the following headers along with my request:
Accept: application/rdf+xml
Accept-Encoding: gzip
Can you confirm that you support gzip encoding in your linked data RDF response? Also, do you have any insights on how other sites, such as dbpedia, handle this issue?

Hi,
Sorry, I'm not having much luck with my queries, I'm trying to output a list of Person names (c:name)'s. An example of the data:
<rdf:Description rdf:about="http://d.opencalais.com/pershash-1/4204cc2a-9de4-3731-a7ff-0763c69ba583">
<rdf:type rdf:resource="http://s.opencalais.com/1/type/em/e/Person"/>
<c:name>D.M. Campbell</c:name>
<c:persontype>N/A</c:persontype>
<c:nationality>N/A</c:nationality>
</rdf:Description>
The query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX c: <http://s.opencalais.com/1/pred/>
SELECT ?personname ?person WHERE {
?person rdf:type <http://s.opencalais.com/1/type/em/e/Person>
?person c:name ?personname
}
was suggested below for a user who had the same issue as me, but this query actually returns all entities with c:name, as if the query line " ?person rdf:type <http://s.opencalais.com/1/type/em/e/Person>" has no effect.
Any ideas? Thanks
I have verified that the query is correct using JRDF. Which SPARQL engine are you using?
Rafi
We are currently having performance issues with large objects so I am trying to enable transport compression with Tomcat. I've enabled compression in Tomcat.
Hi Rafi,
Thank you, for the help.It worked.I have one more question.Iam trying to run the following query but it is failing.Basicaly what I want is to fetch the state name from/Geo/ProvinceorState which is not in Geo/City.
Please let me know how can I get the results. Thanks in advance.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> "
PREFIX c: <http://s.opencalais.com/1/pred/>
SELECT ?shortname WHERE {
?state rdf:type http://s.opencalais.com/1/type/er/Geo/ProvinceOrState> .
?city rdf:type cityhttp://s.opencalais.com/1/type/er/Geo/> .
?state c:shortname ?shortname
?city c:containedbystate ?statename. Filter(?shortname != ?statename)}
Thanks,
Sachin
Hi Sachin,
I apologize for the delayed response.
As far as I understand you would like to retrieve all states in a document except those that have cities. Here is the query that returns those states:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX c: <http://s.opencalais.com/1/pred/>
SELECT ?state ?statename WHERE {
{?city rdf:type <http://s.opencalais.com/1/type/er/Geo/City>. ?city c:containedbystate ?citystate. }
{?state rdf:type <http://s.opencalais.com/1/type/er/Geo/ProvinceOrState>. ?state c:shortname ?statename. }
FILTER (?statename != ?citystate).
}
Best,
Rafi
Hi Rafi,
Thanks for all the responses.
The following query returns results if the ?city resource is available in the rdf. How to write a query to return results even if the ?city resource (<http://s.opencalais.com/1/type/er/Geo/City>. ) is not available in the rdf.
that is if ?city resource (<http://s.opencalais.com/1/type/er/Geo/City>. ) is null or does not exist.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX c: <http://s.opencalais.com/1/pred/>
SELECT ?state ?statename WHERE {
{?city rdf:type <http://s.opencalais.com/1/type/er/Geo/City>. ?city c:containedbystate ?citystate. }
{?state rdf:type <http://s.opencalais.com/1/type/er/Geo/ProvinceOrState>. ?state c:shortname ?statename. }
OPTIONAL {?city c:containedbystate ?statename. }
FILTER (!bound(?statename) || ?statename != ?citystate).}
Thanks,
Sachin
Hi Rafi,
Thanks for your help.That worked.But one doubt.I have an RDF something like this.When I query using the following query no results are showing up but actually the geo/ProvinceOrState has one result. I noticed one thing it is not returning results if in Geo/City the "<c:containedbystate></c:containedbystate>" element is not present.How can I fet results back if that particular element is not present in the rdf.
Thanks in advance.
Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX c: <http://s.opencalais.com/1/pred/>
SELECT ?stateName ?country ?latitude ?longitude ?state WHERE {
{?geoCityResource rdf:type <http://s.opencalais.com/1/type/er/Geo/City>.
?geoCityResource c:containedbystate ?geoStateName.}
{?geoStateResource rdf:type <http://s.opencalais.com/1/type/er/Geo/ProvinceOrState> .
?geoStateResource c:shortname ?stateName .
?geoStateResource c:containedbycountry ?country .
?geoStateResource c:latitude ?latitude .
?geoStateResource c:longitude ?longitude .}
FILTER (?geoStateName != ?stateName).}
RDF:
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:c='http://s.opencalais.com/1/pred/'>
<rdf:Description rdf:about='http://d.opencalais.com/er/geo/city/ralg-geo1/f497898f-2b9b-7cda-ec7b-85d896acbe3e'>
<rdf:type rdf:resource='http://s.opencalais.com/1/type/er/Geo/City'/><c:docId rdf:resource='http://d.opencalais.com/dochash-1/3f598527-7794-3eab-97e8-6b9594a3db38'/><c:name>Washington,United States</c:name><c:shortname>Washington</c:shortname><c:containedbycountry>United States</c:containedbycountry> <c:latitude>38.89</c:latitude><c:longitude>-77.03</c:longitude></rdf:Description><rdf:Description rdf:about='http://d.opencalais.com/er/geo/provinceorstate/ralg-geo1/fd9e1e90-96a9-04ff-1397-445c4e93208b'><rdf:type rdf:resource='http://s.opencalais.com/1/type/er/Geo/ProvinceOrState'/> <c:name>California,United States</c:name> <c:shortname>California</c:shortname> <c:containedbycountry>United States</c:containedbycountry> <c:latitude>36.4885198674</c:latitude> <c:longitude>-119.701379437</c:longitude> </rdf:Description> </rdf:RDF>
Thanks,
Sachin
Hi Sachin,
You can try using the OPTIONAL keyword (see http://www.w3.org/TR/rdf-sparql-query/#optionals).
Regrads
Rafi
I have an rdf open in jrdf sparql GUI. Following is the sample rdf block
<rdf:Description
rdf:about="http://d.opencalais.com/pershash-1/4c786afa-7ff9-3dfb-ac94">
<rdf:type rdf:resource="http://s.opencalais.com/1/type/em/e/Person"/>
<c:name>Sachin Reddy</c:name>
</rdf:Description>
<rdf:Description
rdf:about="http://d.opencalais.com/genericHasher-1/c524eab1-888a-37c4">
<rdf:type rdf:resource="http://s.opencalais.com/1/type/em/e/Organization"/>
<c:name>Some Company</c:name>
</rdf:Description>
I need to write a query to get results for field name c:name for all rdf:resource
="http://s.opencalais.com/1/type/em/e/Person"/>
Please help me how to write a query for this scenario and fetch the rdf
results.
thanks in advance,
Sachin
The SPARQL query below lists person URIs and their corresponding names. A good source to learn about the SPARQL query language is this page on the W3C web-site: http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX c: <http://s.opencalais.com/1/pred/>
SELECT ?personname ?person WHERE {
?person rdf:type <http://s.opencalais.com/1/type/em/e/Person>
?person c:name ?personname
}
Rafi
Oh, I was refering to dereferencable URIs not the API. E.g.
http://d.opencalais.com/er/company/ralg-tr1r/9e3f6c34-aa6b-3a3b-b221-a07...
We currently do not support compression from our Linked Data RDF.
Do you need to retrieve the document RDF using the DocID URI received in the api.opencalais.com response RDF? This can be large RDF and we are checking options for compression.
As for entity URIs - the RDFs are relatively small (less than 10K).
Do you need to retrive entity RDFs or document RDFs?
Ofer
I'd like to work with entity URIs. You're right, each rdf is pretty small. However, it sums up when you like to retrieve like a couple of entity URIs for one document.
What I'd like to do is to collect additional informations for a document by going through most of the found entities, retrieving the linked data entity rdfs and fetching rdfs from dbpedia and freebase. After I'm done I end up with a couple of pretty interesting facts, but also with lots of http roundtrips and megabytes of bandwith consumption.
Maybe http compression wouldn't help alot, just wanted to give it a try. After thinking about it, a sparql endpoint would probably make more sense for my use case. Any plans on that? :)
If SPARQL was supported, what types of queries would you use?
Can you give few examples (and also the use case behind that example)?
Thanks,
Ofer
Sure. E.g. lets say I'd like to look up some facts through dbpedia for all people mentioned in a given document. This could be done using something like the following query:
SELECT ?name ?uri WHERE { ?instance c:docId <http://d.opencalais.com/dochash-1/be5ac6e1-c387-30ab-a967-9dc0de51835b> . ?instance c:subject ?person . ?person rdf:type <http://s.opencalais.com/1/type/em/e/Person> . ?person c:name ?name . ?person owl:sameAs ?uri }The URI can be used to further query the dbpedia.org endpoint. Thats two round trips and minimal overhead for this case.
To make this even more efficient I could run this query for a whole set of documents by using a constraint like:
Hi,
I've checked the compression against api/api1/beta and it works fine for me.
In my mind the most common problem with the compression is that somewhere along the
way from target server to your pc the resonse gets decompressed by firewall (i.e. firewall)-
check with the sysAdmin (provided that you are working from a pc station in a network).
Here is my suggestion: find a pc with direct connection to internet and try to send the request.
Keep us updated.
Ruslan.