Re: Streaming version of CONSTRUCT from Seaborne, Andy on 2005-02-21 (public-rdf-dawg@w3.org from January to March 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 21 Feb 2005 15:42:45 +0000
To: Bob MacGregor <bmacgregor@siderean.com>
CC: public-rdf-dawg@w3.org
Message-ID: <421A0175.3060803@hp.com>

Bob MacGregor wrote:
> We have an application that is based around, among other things, efficient
> retrieval of tree-shaped RDF graphs.  We tried programming this using
> SELECT combined with OPTIONAL clauses, but the resultant cartesian
> products made that  approach completely infeasible for large datasets.
> 
> Our solution was to return an iterator that generates a tree-shaped graph
> at each iteration.  The closest notion to this in SPARQL is the
> CONSTRUCT clause -- using SPARQL, we could specify a tree-shaped template,
> and return back a graph containing our trees.  However, this would
> also have very bad performance, because in our applications, there
> may be 10,000 trees, but we only want to see the first few (10-20).
> And then maybe a few more.
> 
> The problem with CONSTRUCT as currently defined is that it
> only returns a single graph.  If it were redefined to return a
> stream of graphs (one per template), then we could get exactly the 
> efficiency we are
> looking for.
> 
> Also, it would be nice if LIMIT were defined to apply to CONSTRUCT
> as well as select.  Syntactially, it appears to be legal, but the spec 
> appears
> not to define what it does.   In this case, LIMIT should specify a bound 
> on how
> many times the template can be instantiated. 
> 
> Cheers, Bob
> -- 
> 
> Bob MacGregor
> Chief Scientist

Bob,

We had a similar situation in a system we built a couple of years ago.  The tree 
we wanted to handle coudl not be expressed in CONSTRUCT as there were of 
variable depths.

It might be that DESCRIBE is a better choice for this particular problem - it 
does not do what you want directly but it can be used to do it.

First, issue a SELECT query to find the graph nodes (assumed to have URIs) of 
items of interest then issue one DESCRIBE per URI (because each result is a 
single graph).  Your server has to have processing to calculate the DESCRIBE 
result as the required tree.  The shape of these results is application-domain 
dependent.

Depending on your connection technology to the SPARQL service this may work as 
there are one DESCRIBE request per item unless either your application can pick 
apart merged graphs (application dependent on the data forms).

- - - - - - -

We should specify whether LIMIT applies to CONSTRUCT (and DESCRIBE) or not.  I 
thought we did decide that it did at the last F2F (argument: it makes no sense 
to exclude) but it has not made the document yet.  mea culpa.

- - - - - - -

The other issue is the general whether CONSTRUCT should return multiple graphs, 
not a merged one.  This is a case of there being usages for both - mergeing can 
reduce the size of results where templates create duplicate triples.

It does seem to be a tension between features of the connection technology where 
it can handle many small requests and/or package multiple SPARQL requests into a 
single protocol unit vs a more complicated query language.

	Andy

Received on Monday, 21 February 2005 15:44:45 UTC