Re: [Fwd: Unexpected DISTINCT?] from Seaborne, Andy on 2007-03-05 (public-rdf-dawg@w3.org from January to March 2007)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 05 Mar 2007 13:01:22 +0000
To: Bijan Parsia <bparsia@cs.man.ac.uk>
CC: Lee Feigenbaum <feigenbl@us.ibm.com>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <45EC14A2.20102@hp.com>

Bijan Parsia wrote:
> On Mar 5, 2007, at 9:29 AM, Seaborne, Andy wrote:
> 
> [snip]
>> The cardinality for extensions of BGP matching isn't prescribed by  
>> the spec - it's just a matter of an extension deciding what is  
>> appropriate for it's extension.
>>
>> http://www.w3.org/2001/sw/DataAccess/rq23/rq25.html#sparqlBGPExtend
>>
>> which does not include anything on cardinality induced from blank  
>> nodes in BGPs.  Hopefully, that should give freedom to extended  
>> matchings such as OWL-DL.
> 
> Sorry, I didn't mean to suggest that the spec constrained OWL  
> extensions...but if plain SPARQL has an implicit ALL semantics (or an  
> implicit "best effort" semantics) it will be more consistent to  
> follow that. 

No problem - I didn't think you were suggesting it.  Not even sure what ALL 
might mean for OWL (or even RDFS) anyway, given two+ ways to the same answer.

> It seems that even in simpler cases than OWL, ALL might  
> not be easiest (for a number of reasons). If number of (extra)  
> answers in the non-distinct case *never* mattered, then best effort  
> would be fine, but I'm under the impression that people feel strongly  
> for stable numbers of answers in the non-distinct case. Having  
> explicit ALL with no modifier == best effort seems to accommodate all  
> these needs at the cost of departing from the standard SQL behavior.
> 
> If stable number of (non-distinct) answers doesn't matter, but only  
> an upper bound, then several things become easier spec-wise (though I  
> think that is a bit too loose).

I see it as a decision between consistency vs planning for (known) optimizations.

Currently, sec 12 gives a defined cardinality for BGP matching, UNION and 
project, which are the sources for duplicates, for simple entailment.  Steve 
added a case where his indexing means that some patterns can be done in 
particular ways.

If we value consistency across implementations, then the normal mode of 
operation shouldn't be left to implementation choice.  SELECT [nothing] will 
be the common case so would need to be defined for consistency.  Seems odd to 
leave the common case being the "best effort - local choice" if we value 
consistency.

If we value optimization, then plain SELECT can be whatever provided the set 
form is all the answers.  And we need to adjust the test suite accordingly for 
the interaction with ORDER BY/LIMIT that was the original comment.

My preference at the moment is for consistency.  If implementations deviate 
from the spec (and they will for all sorts of reasons - fact of life), we 
can't enumerate all the ways they can and be conformant.  Enumerating a few 
seems worthless, even dangerous.  So specify the consistent choice.

	Andy

> 
> Cheers,
> Bijan.

Received on Monday, 5 March 2007 13:01:38 UTC