[Bug 28845] fn:format-number, formatting rules for exponential notation

https://www.w3.org/Bugs/Public/show_bug.cgi?id=28845

Debbie Lockett <debbie@saxonica.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |debbie@saxonica.com

--- Comment #5 from Debbie Lockett <debbie@saxonica.com> ---
Apologies for not being aware of the ongoing discussion in this bug earlier.

I think there remain some clarifications to be made in the specification,
relating to Christian's questions (and the test case numberformat135:
fn:format-number(0.2, '#.e9') expected result '2e-1'). It seems there are three
main issues, to do with significant digits, minimum-integer-part-size, and the
rules for choosing the mantissa and exponent.

Recall that the definition of minimum-integer-part-size (in 4.7.4) is
currently:
"an integer indicating the minimum number of digits that will appear to the
left of the decimal-separator character. It is normally set to the number of
·decimal digit family· characters found in the integer part of the sub-picture.
But if the sub-picture contains no ·decimal digit family· character and no
decimal-separatorXP31 character, it is set to one."


1. First of all, it would be worth properly defining "significant digit of a
number" in the Spec (this has clearly caused some confusion). Specifically,
stating that leading zeroes of a number are not significant. So the first (and
only) significant digit of 0.2 is the digit '2'. Note that this definition
holds for *any* number, not just an integer (Mike only refered to integers in
comment 1, but in fact we need it for all numbers). Significant digits are
refered to in "4.7.5 Formatting the number" in rule 5(c); but also earlier in
"4.7.4 Analysing the picture string" - in the Note after the definition of
minimum-integer-part-size; and also the Notes section of 4.7.2 (which says:
"Numbers will always be formatted with the most significant digit on the left."
But I don't know what this means. That numbers read from left to right??? Is it
necessary to state that?)

Thus (using 5(c) "the number of significant digits in the integer part of the
mantissa is equal to the minimum integer part size") we obtain for example:

fn:format-number(0.2, '0.0e9') has expected result '2.0e-1' (not 0.2e0, for
which the mantissa has zero significant digits in its integer part),
minimum-integer-part-size = 1.

fn:format-number(0.2, '9e9') has expected result '2e-1' (not 0e0),
minimum-integer-part-size = 1.

fn:format-number(0.2, '000.0e9') has expected result '200.0e-3' (not
'002.0e-1'). minimum-integer-part-size = 3, and the number of significant
digits in the integer part of the mantissa of the result is also 3 (even though
the input 'adjusted number' only has one significant digit '2').


2. It appears that actually 4.7.5 rule 5 does not uniquely define the mantissa
and exponent in the case that minimum-integer-part-size = 0. Consider
fn:format-number(0.002, '.0e0'), for which minimum-integer-part-size = 0. Rule
(b) allows (M,E) to be (0.002, 0) or (0.02, -1) or (0.2, -2) (as well as
infinitely many other solutions). For each of these, the number of significant
digits in the integer part of M is zero, so rule (c) can not choose between
them. What should the expected result be? e.g. the options supplied give '.0e0'
or '.0e-1' or '.2e-2'. The last one looks most meaningful here. 

Also consider fn:format-number(0.002, '.000e0') We have the same options for
(M,E), but now the results are '.002e0' or '.020e-1' or '.200e-2'. Which should
be expected?


3. Again recall the definition of minimum-integer-part-size:
"It is normally set to the number of ·decimal digit family· characters found in
the integer part of the sub-picture. But if the sub-picture contains no
·decimal digit family· character and no decimal-separatorXP31 character, it is
set to one."

I think the last sentence should be changed to (A) "But if the *mantissa part*
contains no ·decimal digit family· character and no decimal-separatorXP31
character, it is set to one.", to catch the case for the picture '#e9' (as well
as '#', '###', etc).
e.g. fn:format-number(0.2, '#e9')
Note that the sub-picture contains a ·decimal digit family· character (though
the mantissa part doesn't), so by the current definition
minimum-integer-part-size = 0 (it is not set to one). We actually now hit the
problem described in point 2. With the new definition minimum-integer-part-size
is set to one, and you get the result '2e-1'.

In fact, as far as I can see, this last sentence was redundant before the
introduction of formatting using exponential notation. The formatting rules in
4.7.5 would produce the desired results even without setting the
minimum-integer-part-size to one for certain cases i.e. '#', '###', etc.
(Compare to the analysis for '.#', for which the minimum-integer-part-size is
not set to one.) The sentence *is* now necessary because I think you *do* want
to set the minimum-integer-part-size to one in the case '#e9'.

What about '#.' and '#.suffix' and '#.e9'? Should these pictures really be
valid? They are a bit odd, but are currently allowed. For each of these,
minimum-integer-part-size=0 (by the old and suggested new definition), and by
the formatting rules in 4.7.5 the first two produce results which I think are
reasonable:
e.g. fn:format-number(0.2, '#.') result is '0'
fn:format-number(1.2, '#.') result is '1'
fn:format-number(0.2, '#.suffix') result is '0suffix'
fn:format-number(1.2, '#.suffix') result is '1suffix'

So finally, we get to fn:format-number(0.2, '#.e9'), test case numberformat135.
If the picture '#.e9' is indeed supposed to be valid, I think you should expect
the same results as for the picture '#e9'. Again minimum-integer-part-size
should be set to one (in fact, we already thought this happened (me in the
expected result, and you in the analysis in comment 4) but as described above,
with the current definition this is not the case). Unfortunately, just dealing
with this case requires further intricate tweaking of the definition, something
like "But if the mantissa part contains no ·decimal digit family· character and
no decimal-separatorXP31 character, or if the mantissa part is not the whole
sub-picture but contains no ·decimal digit family· character and the fractional
part is empty, it is set to one."). Then the expected result will be '2e-1'.

A better solution would be to change the definition to (B) "But if the mantissa
part contains no ·decimal digit family· character and the fractional part is
either empty or contains only passive characters, it is set to one." Which
would mean that the minimum-integer-part-size is set to one for all of the
pictures '#', '###', '#.', '#.suffix', '#.e9'; and you get (what I believe are)
the right results.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

Received on Thursday, 30 July 2015 12:34:58 UTC