W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > January to March 2005

Re: Streaming version of CONSTRUCT

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 21 Feb 2005 17:31:35 +0000
Message-ID: <421A1AF7.7070304@hp.com>
To: Bob MacGregor <bmacgregor@siderean.com>
CC: public-rdf-dawg@w3.org

Bob MacGregor wrote:
> Andy,
> Seaborne, Andy wrote:
>>Bob MacGregor wrote:
>>>We have an application that is based around, among other things, 
>>>retrieval of tree-shaped RDF graphs.  We tried programming this using
>>>SELECT combined with OPTIONAL clauses, but the resultant cartesian
>>>products made that  approach completely infeasible for large datasets.
>>>Our solution was to return an iterator that generates a tree-shaped 
>>>at each iteration.  The closest notion to this in SPARQL is the
>>>CONSTRUCT clause -- using SPARQL, we could specify a tree-shaped 
>>>and return back a graph containing our trees.  However, this would
>>>also have very bad performance, because in our applications, there
>>>may be 10,000 trees, but we only want to see the first few (10-20).
>>>And then maybe a few more.
>>>The problem with CONSTRUCT as currently defined is that it
>>>only returns a single graph.  If it were redefined to return a
>>>stream of graphs (one per template), then we could get exactly the 
>>>efficiency we are
>>>looking for.
>>>Also, it would be nice if LIMIT were defined to apply to CONSTRUCT
>>>as well as select.  Syntactially, it appears to be legal, but the 
>>>spec appears
>>>not to define what it does.   In this case, LIMIT should specify a 
>>>bound on how
>>>many times the template can be instantiated.
>>>Cheers, Bob
>>>Bob MacGregor
>>>Chief Scientist
>>We had a similar situation in a system we built a couple of years 
>>ago.  The tree we wanted to handle coudl not be expressed in CONSTRUCT 
>>as there were of variable depths.
>>It might be that DESCRIBE is a better choice for this particular 
>>problem - it does not do what you want directly but it can be used to 
>>do it.
> Our trees all follow the same "template".  However, I hadn't looked 
> closely at the template notion within CONSTRUCT.
> Now that I have, I seen that it is practically useless for serious 
> applications, due to the fact that it disallows unbound
> variables.  The SPARQL assumption, which is simply wrong for the general 
> case, is that missing values are somehow erroneous.
> In RDF, unlike relational databases, diversity of data should be 
> presumed to be the norm, and the language should be
> constructed to handle such diversity.  Much better would be a semantics 
> for CONSTRUCT that omitted triples that contained
> unbound variables, but retained triples that are fully bound, i.e., 
> partially instantiated templates would be legal.

To quote the current working draft:

CONSTRUCT with a template
If any such instantiation produces a triple containing an unbound variable, or 
an illegal RDF construct (such as a literal in subject or predicate position) 
then that triple is not included in the RDF graph

Only the triple involved, not the whole template, is skipped on unbound variables.


> represents a strict increase in expressive power, since the WHERE clause 
> has full control of which parts should be optional,
> and which required.
> Note: This is an issue where a smart implementor will simply ignore the 
> restriction on unbound variables in CONSTRUCT
> templates.  By doing so, she is increasing the usability of her product, 
> but still obeying the spec for all legal SPARQL
> queries.  That gives her and edge, at the expense of a loss in 
> consistency of SPARQL implementations.  In general,
> any time the DAWG members opt away from orthogonality and generality, 
> they run the risk of diluting the uniformity
> of the final product.  For example, our implementation will have 
> negation, ORDER BY, etc. (essentially, anything in SQL)
> whether or not its defined in SPARQL, simply because real applications 
> benefit by these constructs.
>>First, issue a SELECT query to find the graph nodes (assumed to have 
>>URIs) of items of interest then issue one DESCRIBE per URI (because 
>>each result is a single graph).  Your server has to have processing to 
>>calculate the DESCRIBE result as the required tree.  The shape of 
>>these results is application-domain dependent.
>>Depending on your connection technology to the SPARQL service this may 
>>work as there are one DESCRIBE request per item unless either your 
>>application can pick apart merged graphs (application dependent on the 
>>data forms).
>>- - - - - - -
>>We should specify whether LIMIT applies to CONSTRUCT (and DESCRIBE) or 
>>not.  I thought we did decide that it did at the last F2F (argument: 
>>it makes no sense to exclude) but it has not made the document yet.  
>>mea culpa.
>>- - - - - - -
>>The other issue is the general whether CONSTRUCT should return 
>>multiple graphs, not a merged one.  This is a case of there being 
>>usages for both - mergeing can reduce the size of results where 
>>templates create duplicate triples.
>>It does seem to be a tension between features of the connection 
>>technology where it can handle many small requests and/or package 
>>multiple SPARQL requests into a single protocol unit vs a more 
>>complicated query language.
> There is indeed a tension.  However, the resolution is apparently not 
> consistent.  Here, the spec has opted on the side
> of small applications that can't scale.  When ORDER BY was disallowed, 
> the spec has opted in favor of applications
> so large (or lengthly) that sorting is impractical (and didn't give 
> users the option to use their own judgement).
> It would be rather easy to add a keyword that controls whether or not 
> one or many graphs should be generated when
> using CONSTRUCT.
>>    Andy
> Bob
Received on Monday, 21 February 2005 17:32:03 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:00:46 UTC