Re: Dereferencing a URI vs querying a SPARQL endpoint from Pierre-Antoine Champin on 2009-05-21 (semantic-web@w3.org from May 2009)

From: Pierre-Antoine Champin <swlists-040405@champin.net>
Date: Thu, 21 May 2009 10:51:12 +0100
To: Peter Ansell <ansell.peter@gmail.com>
CC: semantic-web@w3.org, public-lod@w3.org
Message-ID: <4A152410.2050805@champin.net>

Peter Ansell a écrit :
> Hi,
> 
> If you have a dataset that is very large and highly interlinked on
> particular URI's, the DESCRIBE response may be too large to reasonably
> transmit to a user over the internet (and to expect a sparql endpoint
> to give out in one chunk). This is assuming the typical DESCRIBE
> behaviour that sparql vendors implement which picks out r ?p1 ?o
> (forward) and ?s ?p2 r (reverse) .

I do not make the assumption above. According to the SPARQL spec, "the
data [returned by a DESCRIBE query] is not prescribed by a SPARQL query,
[...] but, instead, is determined by the SPARQL query processor". This
may not be the whole lot of incoming and outgoing arcs; this is exactly
the point of DESCRIBE: let the server decide what is relevant,
especially regarding the amount of triples to be returned.

So my expectation, from a service where both dereferenceable URIs and a
SPARQL endpoint are available, that DESCRIBE <uri> would return the same
graph as dereferenceing the uri. More precisely, I think the semantics
of both queries (SPARQL: DESCRIBE <uri>/ HTTP: GET <uri> requiring rdf
data) have roughly the same.

Of course, this assumes that the whole set of data used to dereference
the uri is available through a single SPARQL endpoint, but I thought
this was one of Daniel's assumptions.

  pa

> 
> If you know that you want both forward and reverse behaviour then to
> be you should probably utilise a SPARQL endpoint and page through the
> possible results with OFFSET and LIMIT until you don't get anymore
> results.
> 
> In relation to the Bio2RDF results, the URI that you dereference with
> the federated queries is a mixture of what you could get at a
> particular set of endpoints, with some forward and some reverse
> relations, configured so that the system won't go down just from the
> weight of someone trying to effectively do DESCRIBE
> <http://bio2rdf.org/taxon:9606>. That would be linked to in a few
> hundred thousand places, but still only has a few forward construct
> triples that come out of the taxonomy database. In this case, the
> direction of the relationship is important in real world terms because
> it the size of the relationship.
> 
> Insisting that whenever someone wants to get information about a
> taxonomy identifier (or some other classification method) that they
> have to also get everything else possibly related to it would cause a
> mountain of information. This is why [1] [2] [3] etc. are available
> for people wanting to get more related links. (although there may be
> slow endpoints that make each of those quite long operations)
> 
> Admittedly, the results for resolving Bio2RDF URI's come from multiple
> endpoints, so if you just focused on a single Bio2RDF SPARQL endpoint
> you would get reasonable results from DESCRIBE most of the time.
> 
> Cheers,
> 
> Peter
> 
> [1] http://qut.bio2rdf.org/pageoffset1/links/taxon:9606
> [2] http://qut.bio2rdf.org/pageoffset2/links/taxon:9606
> [3] http://qut.bio2rdf.org/pageoffset3/links/taxon:9606
> 
> 2009/5/21 Pierre-Antoine Champin <swlists-040405@champin.net>:
>> I would expect that a DESCRIBE query to the SPARQL endpoint return what
>> I get when dereferencing the URI.
>>
>>  pa

Received on Thursday, 21 May 2009 09:51:57 UTC