W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > January to March 2007

Re: [Fwd: Unexpected DISTINCT?]

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Mon, 5 Mar 2007 13:22:28 +0000
Message-Id: <FF0950DF-42D0-4414-A43B-24D4872C6F32@cs.man.ac.uk>
Cc: Lee Feigenbaum <feigenbl@us.ibm.com>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
To: andy.seaborne@hp.com

On 5 Mar 2007, at 13:01, Seaborne, Andy wrote:
[snip]
> I see it as a decision between consistency vs planning for (known)  
> optimizations.

Agreed.

> Currently, sec 12 gives a defined cardinality for BGP matching,  
> UNION and project, which are the sources for duplicates, for simple  
> entailment.  Steve added a case where his indexing means that some  
> patterns can be done in particular ways.
>
> If we value consistency across implementations, then the normal  
> mode of operation shouldn't be left to implementation choice.\

I thought the contrary of this was being proposed. I think I could  
accept it *if* access to other desirable states were available. Of  
course, this is distinct from what the default should be.

>   SELECT [nothing] will be the common case so would need to be  
> defined for consistency.  Seems odd to leave the common case being  
> the "best effort - local choice" if we value consistency.

Well, other values might interfere ;)

> If we value optimization, then plain SELECT can be whatever  
> provided the set form is all the answers.  And we need to adjust  
> the test suite accordingly for the interaction with ORDER BY/LIMIT  
> that was the original comment.
>
> My preference at the moment is for consistency.  If implementations  
> deviate from the spec (and they will for all sorts of reasons -  
> fact of life), we can't enumerate all the ways they can and be  
> conformant.  Enumerating a few seems worthless, even dangerous.  So  
> specify the consistent choice.

That's pretty much my natural position as well.

One variant of my proposal is to add a "best effort" keyword, instead  
of ALL (really, only went with all). Thus, instead of forcing  
distinct in every case, or forcing all in every case, one could hint  
to the query engine that you prefer speed to all else (roughly  
speaking).

I agree that there is a ton of other tuning one could get crazy with  
and that's not a good idea. I do wonder if "at least the distinct,  
bounded by the all" is a useful space (or even "at least the  
distinct, finitely bounded") to have accessible. (I think I am  
against it as a general default *personally*, but I guess I can see  
that it could make sense to people.)

Cheers,
Bijan.
Received on Monday, 5 March 2007 13:21:53 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:36 GMT