Re: Ambiguity for Literal order in SPARQL from Eric Prud'hommeaux on 2007-02-26 (public-rdf-dawg-comments@w3.org from February 2007)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 26 Feb 2007 17:25:13 +0100
To: Richard Newman <r.newman@reading.ac.uk>
Cc: Stephane Fellah <stephanef@imagemattersllc.com>, public-rdf-dawg-comments@w3.org
Message-ID: <20070226121718.GA13648@w3.org>
resent due to mailer probs...

* Richard Newman <r.newman@reading.ac.uk> [2007-02-21 14:21-0800]
>=20
> The text I was paraphrasing refers to extending the semantics of =20
> FILTER operations, and is quite clear. The difficulty here is that =20
> ORDER BY uses <, which is also a filter operation, and is subject to =20
> the operator mappings table: it's *not* clear whether extensions to =20
> this table apply to ORDER BY at all, or whether they are allowed to =20
> modify the ordering defined in an unextended implementation.

Hmm, good point. I can propose to add a short paragraph to 11.3.1:
[[
Additional mappings of the '<' operator are expected to control the
relative ordering of the operands, specifically, when used in an
<a href=3D"#modOrderBy"><code>ORDER BY</code></a> clause.
]]

The problem is that I'm not convinced that they should or should
not affect ordering, or even how much time it's worth delaying the
specification for this. The current spec does not specify total
ordering in part because the use cases didn't motivate us to. Not
every implementor decision that affects the query results need be
specified, only those where interop is useful.

> Given this sentence in 9.1:
>=20
> "If the ordering criteria do not specify the order of values, then =20
> the ordering in the solution sequence is undefined."
>=20
> from your example one could argue that, according to the standard, =20
> the ordering for an unextended implementation is only partially =20
> defined. In an extended implementation the ordering is defined, but =20
> the expectations of a user according to the standard are met, which =20
> is fine... for this simple case.
>=20
> However, ORDER BY clauses rely on the undefined result to do cascading:
>=20
> ORDER BY ?date ?someotherparameter
>=20
> ... a client can thus rely on the values of ?date being uncomparable =20
> to introduce sorting by some other parameter. Adding an =20
> implementation of < that compares these new datatypes to dateTimes =20
> alters that ordering clause so that the undefined sections are no =20
> longer ordered by ?someotherparameter. You upgrade your =20
> implementation and your queries start yielding different results. =20
> That's bad.

But how bad? We worked hard to avoid this in terms of the collection
of results. I guess changing the order and then slicing means you have
a different answer in your reported collection.

> To clarify:
>=20
> - you can extend < to apply to java:javaDateDTs if this extension =20
> only changes the behavior of the implementation where otherwise a =20
> type error would occur.
> - however, it presumably should not affect the ordering of the =20
> required output of a query when compared to an unextended =20
> implementation. Thus, ORDER BY must not use the operator mappings table.
> - to extend ordering, users should explicitly use conversion =20
> operators; it's very clear when these are implementation-dependent, =20
> so expectations are not confounded.
>=20
> It is my personal opinion that ORDER BY clauses must be unambiguous, =20
> comparing only booleans, numerics, dateTimes, and strings. However, I =20
> don't think that the standard is clear on whether an implementation =20
> must yield an order that is comparable to the defined order of an =20
> unextended implementation. Would the WG care to comment?

I like cake.

> >Thanks Richard for additional clarifications,
> >
> >You wrote:
> >
> >"Implementations may *extend* comparisons, but only where a type error
> >would otherwise occur. No implementation should ever produce
> >different results to those mandated by the specification, only
> >additional results. I don't think that custom mappings between value
> >spaces allows for that"
> >
> >I think my custom mappings between value space allows for that.
> >
> >Here an example:
> >
> >urn:S1 dc:created "Wed Feb 21 16:00:00 EST 2007"^^java:javaDateDT
> >urn:S2 dc:created "2007-02-21T16:05:55.265-05:00"^^xsd:dateTime
> >urn:S3 dc:created "Wed Feb 21 15:00:00 EST 2007"^^java:javaDateDT
> >urn:S4 dc:created "2007-02-21T16:10:55.265-05:00"^^xsd:dateTime
> >
> >
> >If the query looks like:
> >
> >SELECT ?s ?date
> >WHERE {?s dc:created ?date}
> >ORDERBY desc(?date)
> >
> >Assuming my implementation supports java:javaDataDT and map it =20
> >internally to
> >Date (or Calendar) then I should get the following results:
> >
> >S3,S2,S1,S4
> >
> >If my implementation does not support the custom datatype, then I =20
> >will have
> >
> >S3,S1,S2,S4
> >
> >(S3 and S1 used lexical comparison, S2,S4 used value space comparison)
> >
> >You can notice that S1 precedes S2 in both cases, which is correct
> >semantically, but the support of the custom datatype by the SPARQL =20
> >engine
> >produces additional (enhanced) results.
> >
> >"No implementation should ever produce different results to those =20
> >mandated
> >by the specification, only additional results". This sentence remains
> >ambiguous to me.  I think the result I get is consistent with the
> >specification.
> >
> >What do you think?
> >
> >Best regards
> >
> >Stephane Fellah

--=20
-eric

office: +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +1.857.222.5741

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Monday, 26 February 2007 16:25:24 UTC