- From: Jörn Hees <j_hees@cs.uni-kl.de>
- Date: Thu, 5 Aug 2010 02:37:44 +0200
- To: public-lod@w3.org
Hi all, I have a question related to IRI dereferencing best practices. For a project I need to ask a couple of cross domain SPARQL-Query like this one: SELECT ?s ?p ?o WHERE { <http://dbpedia.org/resource/Barack_Obama> ?p ?o. OPTIONAL { ?p rdfs:label ?lp. } OPTIONAL { ?o rdfs:label ?lo. } } (I need a human readable version of all triples starting at dbpedia:Barack_Obama.) For now I need this not only for Barack_Obama but for the 1000 dbpedia resources corresponding to the 1000 most visited pages of this month in the English Wikipedia. Actually quite simple I thought: I don't need the whole DBpedia, but only a reasonably small subset. For every distinct ?p and ?o I should be able to acquire the information I need simply by looking up the URI and retrieving the data. But there are some problems: 1. DBpedia still uses skos:subject quite often, even though it's deprecated. If you look the URI http://www.w3.org/2004/02/skos/core#subject I'm silently redirected to the current skos definition http://www.w3.org/TR/skos- reference/skos.html#subject, but there is no #subject in it anymore. This means: no rdfs:label for a property which is ubiquitous in DBpedia. Am I missing out some Header option for the content negotiation or is this a problem of the w3.org end? 2. When dereferencing DBpedia URIs I repeatedly found a suspiciously equal number of triples per fetched IRI in the local cache: 2001 triples, sometimes 2002. I remembered: ah, yes... you don't have to return all triples, but just "usful ones". I think what currently happens is that on the DBpedia side a cut is made after 2000 triples, probably to reduce all the traffic overhead. Still, please go to http://dbpedia.org/data/United_States.rdf (you get there by content neg. from http://dbpedia.org/resource/United_States -- the HTML- page shows different content). Notice that you nearly only get inverse triples (?s ?p <http://dbpedia.org/resource/United_States>). There is no rdfs:label, no rdf:type, etc. in it, while all these useful things are in the HTML version. I'm not pointing this out to say that there is a problem in DBpedia. I think this is a serious problem of scale. How do you decide what is useful for someone dereferencing your URIs? How do you keep unnecessary traffic low at the same time? I think maybe a few standard triples should be included in any case (e.g., rdfs:label, rdf:type, ...), luckily I'm not the one to decide about these 1st class properties ;) . But is there perhaps a property or Header field called "responsibleSPARQLendpoint"? With such a property one could at some point say "this is what you get, if you need anything else, please contact this SPARQL- endpoint". Would love to get some feedback on these problems or on whether my whole approach is on the wrong track. Cheers, Jörn
Received on Thursday, 5 August 2010 00:38:29 UTC