[Bug 29044] array:sort does not define a total order when keys contain NaN

https://www.w3.org/Bugs/Public/show_bug.cgi?id=29044

--- Comment #3 from Michael Kay <mike@saxonica.com> ---
In Action A-615-04 I was asked to explore whether the proposed solution is
consistent with XQuery's "order by" clause in the case of NaN and the empty
sequence.

The "proposed solution" we are talking about is:

<quote>
The order of items in the result is such that, given two items $A and $B:

* If (fn:deep-equal($key($A), $key($B)), then the relative order of $A and $B
in the output is the same as their relative order in the input (that is, the
sort is stable)

* Otherwise, if (deep-less-than($key($A), $key($B)), then $A precedes $B in the
output. 

The function <code>deep-less-than</code> is defined as the boolean result of
the expression:

if (fn:empty($A)) then fn:exists($B)
else if (fn:deep-equal($A[1], $B[1])) then deep-less-than(fn:tail($A),
fn:tail($B))
else $A[1] lt $B[1]

</quote>

First, note that the proposed rule for fn:sort handles a sort key of any
length. The effect of the proposed rule is that if all sort keys are sequences
of length zero and one, the sequences of length zero sort first, which is
consistent with XQuery's option "empty least". The default for this option in
XQuery is implementation-defined. In XSLT, "empty least" is the only choice
available.

If "empty least" is chosen in XQuery, and if "descending" is not specified, and
if "stable" is specified, then the relevant rules are:

<quote>
* If V is an empty sequence and W is not an empty sequence, then W greater-than
V is true.
* If V is NaN and W is neither NaN nor an empty sequence, then W greater-than V
is true.
* If V1 is greater-than V2 ... then T2 precedes T1 in the output tuple stream.
* If neither V1 nor V2 is greater-than the other [then] the original order of
T1 and T2 is preserved in the output tuple stream
<quote>

So considering three values (), NaN, and a value X that is neither NaN nor an
empty sequence, and using ">" to mean "greater than" as defined above, we have:

NaN > ()
X > NaN

and the effect is the same as the XSLT rule: the result is (), followed by NaN,
followed by non-NaN values.

The "proposed rules" for fn:sort do NOT have this effect. Firstly,
deep-less-than(NaN,X) returns false. Secondly, we don't really say what happens
when deep-less-than() returns false. So I propose a revision of the "proposed
rules" as follows:

<quote>
The order of items in the result is such that, given two items $A and $B:

* If (fn:deep-equal($key($A), $key($B)), then the relative order of $A and $B
in the output is the same as their relative order in the input (that is, the
sort is stable)

* Otherwise, $A precedes $B in the output if and only if
(deep-less-than($key($A), $key($B)) returns true

The function <code>deep-less-than($A, $B)</code> is defined as the boolean
result of the expression:

if (fn:empty($A)) then fn:exists($B)
else if (fn:deep-equal($A[1], $B[1])) then deep-less-than(fn:tail($A),
fn:tail($B))
else if ($A[1] ne $A[1] (:that is, $A is NaN)) then fn:true()
else $A[1] lt $B[1]

</quote>

Note that under these rules, given three values (), NaN, and a numeric value X
that is not NaN:

* deep-less-than((), NaN) is true, so () precedes NaN in the result
* deep-less-than(NaN, X) is true, so NaN precedes X in the result

I believe the proposed rule is transitive (lt(A,B) and lt(B,C) => lt(A,C))
though it's a little difficult to prove, especially in the corner cases
involving numeric promotion (which I don't think XSLT or XQuery sorting handle
either.)

So I think that with the revised proposal, the rules for fn:sort and array:sort
are consistent with both XSLT and XQuery assuming stable=yes, descending=no,
empty=least.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

Received on Wednesday, 9 September 2015 09:27:27 UTC