Re: definition of INDISTINCT from Eric Prud'hommeaux on 2007-03-22 (public-rdf-dawg@w3.org from January to March 2007)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Thu, 22 Mar 2007 12:59:44 -0400
To: "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: Jeen Broekstra <j.broekstra@tue.nl>, public-rdf-dawg@w3.org
Message-ID: <20070322165944.GP4098@w3.org>
* Seaborne, Andy <andy.seaborne@hp.com> [2007-03-18 18:01+0000]
> 
> 
> 
> Eric Prud'hommeaux wrote:
> >* Jeen Broekstra <j.broekstra@tue.nl> [2007-03-16 17:39+0100]
> >>Alright, nitpicking a bit:
> >>
> >>Eric Prud'hommeaux wrote:
> >>
> >>>persuant to
> >>>  ACTION: ericP to draft text about a LOOSE keyword and run it by w3
> >>>  folks to see if we're abusing the "at risk" mechanism.
> >>>I drafted this section. It was slightly more awkward to not have an
> >>>ALL to lean on, but I think this is pretty well defined:
> >>>
> >>>9.4 INDISTINCT
> >>When/where was this term introduced?
> >>
> >>If we decide to add this, I think I would actually prefer LOOSE:
> >>INDISTINCT suggests (to me at least) that it is the opposite of DISTINCT
> >>(which it is not; it would even be acceptable to have the same behavior
> >>as DISTINCT).
> 
> I agree with Jeen.  LOOSE is better.
> 
> INDETERMINATE is a bit long :-)

and LOOSE it is

> >>
> >>>While the DISTINCT modifier ensures that duplicate solutions are
> >>>eliminated from the solution set, INDISTINCT simply permits them to be
> >>>eliminated. The cardinality of any set of variable bindings (solution)
> >>>in an INDISTINCT solution set at least one and not more than the
> >>...*is* at least one...
> >
> >noted
> >
> >>>cardinality of the solution set with no DISTINCT or INDISTINCT
> >>>modifier.
> >>Perhaps better formulation would be to refer to the cardinality of the
> >>solution set as prescribed by the algebra.
> >>
> >>>For example, the query
> >>>
> >>>  PREFIX foaf:    <http://xmlns.com/foaf/0.1/>
> >>>  SELECT INDISTINCT ?name WHERE { ?x foaf:name ?name }
> >>>
> >>>may have one, two (shown here) or three solutions:
> >>>  name
> >>>  "Alice"
> >>>  "Alice"
> >>Of course, this only holds for a dataset which holds at least three
> >>solutions for Alice, you might want to make that more explicit in this
> >>paragraph (referring back to the example dataset explicitly?).
> >
> >I think that in context with the DISTINCT proposal, it's clear. See
> >the attached HTML and tell me if you agree.
> 
> A thought: as DISTINCT and LOOSE are mutually exclusive, maybe 9.3.1 and 
> 9.3.2 would be a better arrangement.  That would mean that the first part 
> of 9.3, which is about queries without DISTINCT or LOOSE can go in 9.3 
> intro.
> 
> (Following that through, OFFSET and LIMT could go together in a "slice" 
> section - editorial - for CR?)

good idea. will do after LC publication

> I suggest adding text to say that queries with LOOSE may behave differently 
> across different implementations.  I don't know how to say that in the 
> conformance language which does not use "implementation".
> 
> And presumably a LOOSE query may differ across requests of the same query 
> at the same service, given the motivating example we have been using.

I've gone a small step further and said that LOOSE slicing may behave
differently across different query executions.

[[
Note that queries where LOOSE semantics have changed the cardinality
may not be consistently sliced by the LIMIT and OFFSET modifiers.
]]

a bit informal, but good enough until an elegent wording, i think

> It was suggested that there would be "at risk" text.  This should include 
> mention that queries using LOOSE will not necessarily extend to aggregate 
> functions.

pink text in 1.61:
[[
The LOOSE feature is at risk. Queries where LOOSE semantics have
changed the cardinality will not have any counting semantics and will
therefor not be useful with aggregate functions added to SPARQL or
performed on SPARQL result sets..
]]
-- 
-eric

office: +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Thursday, 22 March 2007 17:00:29 UTC