W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > July to September 2005

Re: Updated SPARQL Query Results XML Format draft

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Thu, 14 Jul 2005 11:59:10 +0100
Message-ID: <42D6457E.5020201@hp.com>
To: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>



Steve Harris wrote:
> On Thu, Jul 14, 2005 at 10:46:46 +0100, Dave Beckett wrote:
> 
>>On Wed, 2005-07-13 at 15:07 +0100, Steve Harris wrote:
>>
>>>On Wed, Jul 13, 2005 at 02:59:45 +0100, Dave Beckett wrote:
>>>
>>>>However, I've also noticed a couple of items in Red Ink that still need
>>>>thinking about:
>>>>
>>>>1. How/if to record duplicates in results. (Section 2.3.3)
>>>>
>>>>When ORDER BY is given, the result format may record index="1",
>>>>index="2" on the <result> element.  (Side issue - "may" or "should" do
>>>>this?)
>>>
>>>I dont see the point to this really, but how does it interact with OFFSET?
>>>Shouldn't the count start from OFFSET + 1?
>>
>>If we keep with this design, I guess so.
>> 
>>
>>>>However when there are duplicates should it generate indexes 1, 2, 2, 3
>>>>where items #2 and #3 are duplicates?  (A query with ORDER BY but no
>>>>SELECT DISTINCT).
>>>
>>>Strong "no" from me. Any numbering should be monotonic.
>>
>>monotonic means order preserving right?  So 1, 2, 2, 3 does preserve the
>>order - if items #2 and #3 are duplicate results.  Otherwise order
>>information is lost.
> 
> 
> Yes, sorry, wrong word. I mean incrementing and consequtive. Or something.
>  
> 
>>The index="number" item was added because we added ORDER BY and before
>>we finished deciding what it would do.
> 
> 
> As XML is inherantly ordered it just seems like a waste of bytes to me.
> I still care about bandwidth efficiency for mobile applications and so on.
>  
> 
>>Maybe you just need to know that the results are ordered - i.e. an
>>isOrdered boolean flag.   Is isDistinct also needed?  Those seem to be
>>the two crucial flags that tell you the four forms of variable bindings
>>results you can get:
>>  1. a bag (the default)
>>  2. an ordered sequence (ORDER BY)
>>  3. an ordered sequence with no duplicates (ORDER BY + DISTINCT)
>>  4. a set (DISTINCT)
> 
> 
> Maybe, I'm not clear on any situations where the client might not know, and
> would care.

The only cases I see as being important in the result set

DISTINCTness is detectable in the results whereas ordering is not.

How about an optional attribute to the <results> element.

    <results order="true">

Then there is no consistency issue about funny index orders, missing indexes, or 
duplicates.

The next complexity level would be to number the variable declarations in the 
header indicating the order of the variables but that does not make sense for 
function ordering.  So, just an indication in the <results> element seems fine, 
if anything at all.


>  
> 
>>  Refering to 10.1 Solution Sequences and Result Forms
>>  http://www.w3.org/2001/sw/DataAccess/rq23/#solutionsResults
>>
>>unless the LIMIT and OFFSET indexes are important.
> 
> 
> they may be, but again, the client would be aware of wether it had used
> LIMIT and/or OFFSET, or would be agnostic, I would have thought.

Yes - I think the result set should not encode features of the query where it is 
not necessary.  We have the link element for extenmsibility.

The only case I can think of is where a result has been written to disk, and 
retrived sometime later - and then only whether the results can be considered 
order seems to matter.

When writing application code, I have found that the result processing logic can 
be simpler if the results are known to be ordered but then it depends on the 
choosen order of variables (it tells you that when you have finished one value, 
you'll not see it again, so you can enter it into the application data 
structures for example).

> 
> - Steve
> 

	Andy
Received on Thursday, 14 July 2005 10:59:21 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:23 GMT