[XSLT 2.0] format-number() - rounding large numbers from Michael Kay on 2004-02-06 (public-qt-comments@w3.org from February 2004)

From: Michael Kay <mhk@mhk.me.uk>
Date: Fri, 6 Feb 2004 09:57:54 -0000
To: <public-qt-comments@w3.org>
Message-ID: <005a01c3ec97$b1491740$6401a8c0@pcukmka>

This was originally raised internally at
http://lists.w3.org/Archives/Member/w3c-xsl-wg/2003Dec/0000.html

See also the replies to that message.

To summarize:

Following up on a bug report from a Saxon user, I've been looking at how
large numbers like 1E25 should be formatted by format-number(). (Saxon
7.8
gets it completely wrong).

After fixing the obvious bug, the expression

<xsl:value-of select="format-number(1E25,'#,######')"/>

produces

10,000000,000000,000905,969664

with the Saxon implementation of the XSLT 2.0 algorithm, while the
JDK-based
implementation produces:

10,000000,000000,000000,000000


Internally, the floating point representation of 1E25 is:

4656612873077393 x 2^31

which is the value that Saxon outputs, converted to decimal. Java
appears to
be doing some intelligent rounding of the value.

Questions for us are

(a) should we mandate such rounding?
(b) if so, what are the rules?

A related question is: what if the input is a decimal, rather than a
double?
At the moment, we simply "promote" the decimal to a double, which means
that
precision will in general be lost before the conversion to a string. So:

(c) should we try to define format-number() so that decimal values
(including integers) are not converted to doubles before being
formatted?

Michael Kay 

Response from David Marston:

I think the rule is about significant figures. Looking at '1E25', you
can say deterministically that there is one significant figure. That
number should be rendered in such a way that the rendered form can be
interpreted to have one significant figure. In this example,
10,000000,000000,000000,000000
Can be interpreted to have anywhere from 1 to 26 significant figures,
whereas
10,000000,000000,000905,969664
can only be interpreted to have 26.

>(c) should we try to define format-number() so that decimal values
(including integers) are not converted to doubles before being
formatted?

That seems reasonable. Perhaps the document should acknowledge the
potential to produce strings that express more precision than the number
really has, which would cover a multitude of sins. With that disclaimer,
you can proceed to say that the input number is taken as-is (i.e., with
whatever precision it has) and truncated or stretched without regard to
the dececptiveness of the result. 

Response from Michael Kay:

But our input is the IEEE floating point number represented by 1E25,
which
is equal to 10,000000,000000,000905,969664 . 
 
I would like to define that we produce a "rounded" representation, but
I'm
not sure how to achieve it. It can't be based on the original lexical
form:
we have no idea whether the number started life as "1E25" or as
10000000000000000905969664E0. 
 
Michael Kay

Received on Friday, 6 February 2004 04:57:30 UTC