Re: Discussion of op:same-key from Michael Kay on 2016-06-08 (public-xsl-query@w3.org from June 2016)

From: Michael Kay <mike@saxonica.com>
Date: Wed, 8 Jun 2016 11:06:28 +0100
To: Tim Mills <tim@cbcl.co.uk>
Cc: Abel Braaksma <abel.braaksma@xs4all.nl>, public-xsl-query@w3.org
Message-Id: <58A5DC81-D818-442C-84A0-9E9CC9190F78@saxonica.com>

In discussion yesterday the question was raised, might it be possible to use some shorter xs:decimal representation of an xs:float or xs:double as the "canonical key".

I think that technically this might be possible.

The essential requirements are:

* there must be a mapping M from xs:double values D to xs:decimal values C such that op:same-key(C, M(D)) holds.
* this implies that for every xs:double value D there must be exactly one xs:decimal value C such that op:same-key(C, D) holds
* this should also have the property that (C eq M(D)).
* two different xs:double values must map to different xs:decimal values: if (D1 != D2) then M(D1) != M(D2).
* it's highly desirable that in simple cases, e.g. where D is a whole number, the chosen C should be mathematically equal, e.g. op:same-key(1, 1.0e0) should hold.

(And the same, of course, for xs:float)

The requirement in the spec that C must in all cases be the exact representation of the mathematical value of D is arguably stronger than is needed. Here's an attempt to define a less stringent formulation:

(a) Let S be the theoretical infinite value space S of xs:decimal, and T be the implementation-defined subset of this value space supported by an implementation. Let D be the value space of xs:double, and F be the value space of F.

(b) Let K be the set of distinct tokens used to represent numeric key values. This is a subset of the union of the value spaces of xs:double and xs:decimal; the subset excludes xs:double values that are mathematically equal to some xs:decimal value in T. Two numeric values are the same key if they map to the same value in K: call the mapping function M.

(c) For an xs:float value f, M(f) is M(xs:double(f)).

(d) For an xs:decimal value d, M(d) is d.

(e) For an xs:double value u:

(e1) if there is an xs:decimal value d in T such that u is mathematically equal to T, then M(u) is d.

(e2) otherwise, M(u) is u. 

The way that this works is, if there's no xs:decimal in T precisely equal to an xs:double, then we don't need to convert the xs:double to an xs:decimal, because it will never match any xs:decimal in T.

Now, the interesting thing is, I think the observable results of this algorithm are identical to the algorithm that we currently have in the spec; in other words this algorithm is a conformant implementation of what we have in the spec. But it doesn't require conversion of doubles to infinite-precision decimals; all it requires is the ability to determine whether a given xs:double is exactly representable in the implementation-defined subset of xs:decimal.

That invites the question, is there an efficient way to determine whether, given a mantissa m and exponent e, there is some pair of integers (i, k) such that m*2^e equals i/10^k? I leave that question to the reader.

Michael Kay
Saxonica

Received on Wednesday, 8 June 2016 10:07:14 UTC