[Bug 29722] [FO31] fn:sort, array:sort

https://www.w3.org/Bugs/Public/show_bug.cgi?id=29722

Michael Kay <mike@saxonica.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

--- Comment #3 from Michael Kay <mike@saxonica.com> ---
ACTION A-650-01: Action on Mike Kay to add a collation argument to fn:sort and
array:sort (see bug 29722).  

I will interpret this as an action to produce a proposal, since I don't think
the solution is entirely self-evident.

fn:sort currently has two signatures:

fn:sort($input as item()*) as item()*
fn:sort($input as item()*, $key as function(item()) as xs:anyAtomicType*) as
item()*

Adding a collation argument to both clearly doesn't work, so I propose to add
it only to fn:sort#2. This is OK, because fn:sort(x) is equivalent to
fn:sort(x, data#1), which means that fn:sort#1 is really just a convenience
function.

So I propose to add a third signature:

fn:sort($input as item()*, $key as function(item(), $collation as xs:string) as
xs:anyAtomicType*) as item()*

The 1- and 2- argument forms are now simple shortcuts: the second argument
defaults to data#1 and the third argument defaults to the default collation in
the static context. It's slightly awkward that the function isn't the last
argument as it usually is in higher-order functions, but we can't have
everything.

We need to add to the spec:

The collation used by this function is determined according to the rules in
5.3.5 Choosing a collation.

And then the detail becomes:

The result of the function is obtained as follows:

For each item in the sequence $input, the function supplied as $key is
evaluated with that item as its argument. The resulting values are the sort
keys of the items in the input sequence.

The result sequence contains the same items as the input sequence $input, but
generally in a different order.

The order of items in the result is such that, given two items $A and $B:

Let $collation be the explicit or implicit collation in use.

If (fn:deep-equal($key($A), $key($B), $collation), then the relative order of
$A and $B in the output is the same as their relative order in the input (that
is, the sort is stable)

Otherwise, if (deep-less-than($key($A), $key($B), $collation), then $A precedes
$B in the output. The function deep-less-than is defined as the boolean result
of the expression:

if (fn:empty($A)) then fn:exists($B)
else if (fn:deep-equal($A[1], $B[1], $collation)) then
deep-less-than(fn:tail($A), fn:tail($B), $collation)
else if ($A[1] ne $A[1] (:that is, $A[1] is NaN:)) then fn:true()
else if (is-string($A[1]) and is-string($B[1]) then fn:compare($A[1], $B[1],
$collation) < 0
else $A[1] lt $B[1]

where the function is-string($X) returns true if and only if $X is an instance
of xs:string, xs:anyURI, or xs:untypedAtomic.

This ordering of sequences is referred to by mathematicians as "lexicographic
ordering".


The changes for array:sort are virtually identical.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

Received on Wednesday, 20 July 2016 10:23:40 UTC