Re: Dereferencing a URI vs querying a SPARQL endpoint from Leigh Dodds on 2009-05-21 (semantic-web@w3.org from May 2009)

From: Leigh Dodds <leigh.dodds@talis.com>
Date: Thu, 21 May 2009 08:57:26 +0100
To: Daniel Schwabe <dschwabe@inf.puc-rio.br>
Cc: public-lod@w3.org, semantic-web@w3.org
Message-ID: <f323a4470905210057g71220f0fsee4acc1a8f8793d0@mail.gmail.com>
Hi Daniel,

>From my own experience, there is often a very different set of results
for dereferencing a linked data URI and a DESCRIBE on that URI in a
SPARQL endpoint.

The variance depends on how the linked data is being generated. For
applications that are generating the RDF as an alternate view of the
human-readable page, then the returned results may just be whatever is
currently on that page. I believe there was some recent discussion
about whether the RDF returned should be enough to "reconstitute" the
human-readable description or whether it might a slightly different
set of results. E.g. whether it contains information about referenced
resources, that might be presented to a human user on the same page,
or just references to those URIs.

I'm not sure whether anyone has done a proper survey of the default
DESCRIBE algorithms for common endpoints. Anecdotally, I was under the
impression that a CBD was the most common form. So this would give
<?uri ?p ?o> for your chosen URI, and closure over referenced bnodes;
but not <?s ?p ?uri>. This means that a DESCRIBE may not return all
the data either.

Without a common way to discover the capabilities of the endpoint, so
you can determine what DESCRIBE algorithm is being used, or a means to
select one in the request, this means you're left with the option of
doing a CONSTRUCT query to ensure you get all of the data that you
require.

Its being able to start to address issues like this that I proposed
this feature [1] to the SPARQL WG, there's also the related proposed
feature [2]. Due to other limitations [3], CONSTRUCT isn't a complete
workaround either.

Based on my initial monitoring on WG discussion, I don't think that
any of these are likely to make into SPARQL 2, so I think its left to
the community to try and draw together some recommendations and
implementation experience.

Cheers,

L.

[1]. http://www.w3.org/2009/sparql/wiki/Feature:ControlOfDescribeQueries
[2]. http://www.w3.org/2009/sparql/wiki/Feature:DefaultDescribeResult
[3]. http://www.w3.org/2009/sparql/wiki/Feature:Constructing_containers_and_collections

2009/5/20 Daniel Schwabe <dschwabe@inf.puc-rio.br>:
> Dear all,
>
> while designing Explorator [1], where one can explore one or more triple
> repositories that provide SPARQL enpoints (as well as direct URI
> dereferencing), I found the following question, to which I don't really know
> the answer...
>
> For the sake of this discussion, I'm considering only such sites, i.e.,
> those that provide SPRQL enpoints.
> For a given URI r, is there any relation between the triples I get when I
> dereference it directly, as opposed to  querying the SPARQL enpoint for all
> triples <r, ?p, ?o> ?  Should there be (I could also get <?s, ?p, r>, for
> example) ?
> For sites such as dbpedia I believe that I get the same set of triples. But
> I believe this is not a general behavior.
> Should there be a good practice about this for LoD sites that provide SPARQL
> endpoints?
> At the very least, perhaps this could also be described in the semantic
> sitemap.xml, no?
>
> Cheers
> D
>
> [1] http://www.tecweb.inf.puc-rio.br/explorator
> --
> Daniel Schwabe
> Tel:+55-21-3527 1500 r. 4356
> Fax: +55-21-3527 1530
> http://www.inf.puc-rio.br/~dschwabe Dept. de Informatica, PUC-Rio
> R. M. de S. Vicente, 225
> Rio de Janeiro, RJ 22453-900, Brasil
>
>
> Please consider the environment before printing this email.
>
> Find out more about Talis at www.talis.com
>
> shared innovationTM
>
> Any views or personal opinions expressed within this email may not be those
> of Talis Information Ltd or its employees. The content of this email message
> and any files that may be attached are confidential, and for the usage of
> the intended recipient only. If you are not the intended recipient, then
> please return this message to the sender and delete it. Any use of this
> e-mail by an unauthorised recipient is prohibited.
>
> Talis Information Ltd is a member of the Talis Group of companies and is
> registered in England No 3638278 with its registered office at Knights
> Court, Solihull Parkway, Birmingham Business Park, B37 7YB.
>
>
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>



-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.dodds@talis.com
http://www.talis.com
Received on Thursday, 21 May 2009 07:58:10 UTC