Re: [Fwd: Re: RDF Data Access Working Group : first working draft of SPARQL]

Steve Harris wrote:
> On Sun, Oct 17, 2004 at 06:27:31 +0100, Andy Seaborne wrote:
> 
>>>1. Is the SELECT clause really useful?  My implementations return all 
>>>variable bindings from the query, and I simply ignore those I don't want.
>>>
>>>...
>>
>>Locally, that is true - I'm sure that when the query processor and the
>>application are in the same process, the QP may ignore the SELECT (I know 
>>I do for everything except presenting results - anything else would be 
>>pure overhead).
> 
> 
> Also SELECT allow you to specify ordering of result columnss, which can e
> helpful for XSLT processing or similar.
>  
> 
>>>6. Can the resulting variable bindings contain repeated 
>>>binding-tuples;  e.g. in response to a query like:
>>>   SELECT ?a ?c
>>>   WHERE  ( ?a ?b ?c )
>>>against the graph:
>>>   :s1 :p1 :o1 .
>>>   :s1 :p2 :o1 .
>>
>>Yes - there can be repeated rows in the table.  It's a bag by the time
>>SELECT has projected out any variables.
>>
>>There seems to be a problem with terminology that needs correcting.  The
>>term "query solution" is used but it is confusing where it applies.  At
>>least the Query Results definition either has to use "bag" or be clear 
>>what it applies to.  Alternative naming might also help.
>>
>>There are two solutions (sets of bindings for variables "a" "b" and "c") 
>>to the pattern match.  Query solutions are not effected by the query 
>>result form.
>>
>>The query form takes solutions and transforms them into the
>>application-level results.  (Implementations may, of course, use all
>>information available in the query request and dataset to perform
>>optimizations to query execution.)
>>
>>SELECT projects just the "a" and "b" binding in each.  SELECT does not
>>change the number of rows in the table; SELECT DISTINCT does in this case.
>>
>>The language in the document is clearly confusing and I will go back and
>>find better wording and terminology in the pattern matching sections and
>>query form sections.
>>
>>[Steve - you may wish to comment here]
> 
> 
> You certainly would expect repeated results if you think of it as a
> results table, I'm not sure sure about when you think of it of a
> set/list/bag of bindings. Someone (posibly SimonR?) mentioned that they
> wouldnt expect multiple identical "rows" from a variable binding result,
> but that outside my experience really.
> 
> 3store only returns unique results, and its never caused any problems to
> my knowledge. Theres a slight overhead, but its not that great and
> gnerally offset (time wise) by the bandwidth saving.

The overhead in my implementation (ARQ) is a requirement to retain all 
results in the query processor until the end of the query and check them 
on each successive result.  I consider that a high overhead.  (We already 
have problems in RDQL/Jena with some JDBC drivers which do have a complete 
copy of the results before allowing the caller to start processing the 
JDBC-call outputs).

ARQ does not materialise the results at any single point and graphs are 
maintained by hashing.  Results are streamed as they are found so having 
to make them unique requires retaining all results already returned, 
making a potentially large memory requirement.  Without this, ARQ never 
needs more than an amount of memory that is independent of the dataset; it 
depends only on query complexity (how many stages it has) - some stages 
buffer one result to know if they have any more to handle and the basic 
graph pattern matcher runs asynchronously on another thread and has a 
buffer of about 5 triples.  There can be more than one graph pattern 
matcher running at once in a single (complex) query.

	Andy


> I have no strong
> feelings either way. If we dont specify unique results then I would like
> a DISTINCT keyword or equivalent.
>  
> - Steve
> 

Received on Tuesday, 19 October 2004 20:00:46 UTC