Proposed change for bug 28845 (format-number()) from Michael Kay on 2015-09-09 (public-xsl-query@w3.org from September 2015)

From: Michael Kay <mike@saxonica.com>
Date: Wed, 9 Sep 2015 18:49:45 +0100
To: public-xsl-query@w3.org
Cc: Debbie Lockett <debbie@saxonica.com>
Message-Id: <BF4EE9A6-D878-4085-970E-0AF8B880B97F@saxonica.com>

> ACTION 614-06 on MKay to review Bug 28845 and make a proposal.

Let's try and establish some principles.

There are two things we need to specify: (a) the "scaling factor" which determines the value of the exponent, and (b) the representation of the result.

By "scaling factor" I mean the number N, such that the mantissa M satisifies 10^(N-1) <= M < 10^N. So a scaling factor of zero means the mantissa is in the range 0.1 to 0.999999..., a scaling factor of one means it is the range 1 to 9.999999..., etc.

I propose that the scaling factor should be equal to the number of decimal-digit-family characters found in the integer part of the sub-picture. Note that this isn't quite the same as the minimum-integer-part-size, because that is adjusted from 0 to 1 if there is no decimal-digit-family character and no decimal-separator.

This leaves two problems, which I propose that we decide to live with:

• there is no way to specify a negative scaling factor. I don't think many people will miss this feature.
• if the scaling factor is zero, there is no way to get a zero digit before the decimal point: the output will be ".1e0" rather than "0.1e0". At the end of this note I'll suggest a solution to this that the WG might like to consider; but some may regard it as feature creep.

The guiding principle for formatting of the mantissa and exponent is then as follows: the mantissa is formatted using exactly the same rules as we would use for the number as a whole in the absence of an exponent separator.

The other problem is that we allow numbers to be output with no significant digits, which makes very little sense.

To tackle this, I propose an adjustment to the rules for minimum-integer-part-size: to replace the current rule that in some circumstances forces this to one, we define the following adjustments: if minimum-integer-part-size and maximum-fractional-part-size are both zero, then:
• if there is an exponent separator, set maximum-fractional-part-size to 1;
• otherwise set minimum-integer-part-size to 1.

To achieve this, and to fix the other problems identified, I think the following edits are needed.

In 4.7.4 Analyzing the picture string, add a new variable: The /scaling-factor/ is a non-negative integer used to determine the scaling of the mantissa in exponential notation. It is set to the number of decimal_digit_family characters found in the integer part of the sub-picture.

In 4.7.4, change the definition of minimum-integer-part-size, to read: The minimum-integer-part-size is an integer indicating the minimum number of digits that will appear to the left of the decimal-separator character. It is initially set to the number of decimal_digit_family characters found in the integer part of the sub-picture, but may be adjusted as described below.

In 4.7.4 add a new rule: if the effect of the above rules is that minimum-integer-part-size and maximum-fractional-part-size are both zero, then an adjustment is applied as follows: if an exponent separator is present then maximum-fractional-part-size is changed to 1 (one); otherwise minimum-integer-part-size is changed to 1 (one).

In 4.7.5 change rule 5 to read as follows (but using indented lists): If the minimum exponent size is non-zero, then the adjusted number is scaled to establish a mantissa and an integer exponent. The mantissa and exponent are chosen such that (a) the primitive type of the mantissa is the same as the primitive type of the adjusted number (integer, decimal, float, or double), (b) the mantissa multiplied by ten to the power of the exponent is equal to the adjusted number, and (c) the mantissa is less than 10^N, and at least 10^(N-1), where N is the scaling factor. If the minimum exponent size is zero, then the mantissa is the adjusted number and there is no exponent.

Additional changes suggested by other comments:

In 4.7.5 rule 4, reword as: The adjusted number is determined as follows: If the sub-picture contains a _percent_ character, the adjusted number is the input number multiplied by 100. If the sub-picture contains a _per-mille_ character, the adjusted number is the input number multiplied by 1000. Otherwise the adjusted number is the input number.

Change the Note in 4.7.2 to read: A string is an ordered sequence of characters, and this specification uses terms such as "left" and "right", "preceding" and "following" in relation to this ordering, irrespective of the position of the characters when visually rendered on some output medium. Both in the picture string and in the result string, digits with higher significance (that is, representing higher powers of ten) always precede digits with lower significance, even when the rendered text flow is from right to left.

I noticed one remaining reference to the term "mandatory-digit-sign", which is no longer defined. This is in the note at the start of 4.7.3, which furthermore appears on first reading to contain a rule that is not defined elsewhere. Rephrase the second sentence of the note as: The digits will all be from the same decimal_digit_family, specifically, the sequence of ten consecutive digits starting with the digit assigned to the zero_digit property.

To prevent other readers going down a blind alley that I found myself following, add after the penultimate bullet of 4.7.4 the Note: The rules for the syntax of the picture string ensure that if an exponent separator is present, then the minimum-exponent-size will always be greater than zero.

Implications on examples:

fn:format-number(0.2, '#.e9') => ".2e1"

(Explanation: min-int-part-size=0, scaling-factor=0. Initially max-frac-part-size=0, but it gets adjusted to 1 by the new rule. A '#' in the integer part of the picture has no effect except on grouping separators. But see the CODA, which would change the result to 0.2e1)

fn:format-number(0.2, '9e9') => "2e-1"

(Explanation: min-int-part-size=1, scaling-factor=1.)

fn:format-number(0.2, '000.0e9') => "200.0e-3"

fn:format-number(0.002, '.000e0') => ".200e-2

fn:format-number(0.2, '#.') => '0'

fn:format-number(1.2, '#.') => '1'

fn:format-number(0.2, '#.suffix') => '0suffix'

fn:format-number(1.2, '#.suffix') => '1suffix'

fn:format-number(0.2, '#.e0') => '.2e0'

fn:format-number(1.2, '#.e0') => '.1e1'

fn:format-number(0.2, '#.e0suffix') => '.2e0suffix'

fn:format-number(1.2, '#.e0suffix') => '.1e1suffix'

CODA

If we want to allow formatting of exponential numbers with a zero before the decimal point, for example "0.123e10", here are two ways we could achieve this:

(a) we could say that when the scale factor is zero, the minimum-integer-part-size is set to one. In this case the leading zero would always appear, and users would not be able to ask for the format ".123e10".

(b) we could say that when an exponent separator is present, any '#' signs in the integer part of the picture contribute to the minimum-integer-part-size. So a single mandatory zero before the decimal point could be requested with the picture "#.000e0". To reflect the change of meaning we could perhaps rename "optional digit character" as "conditional digit marker”.

Michael Kay
Saxonica
with acknowledgements to Debbie Lockett who found some of the errors in my first attempt.

Received on Wednesday, 9 September 2015 17:50:11 UTC