Canonical representation of float/double and number of digits from Vincent Lefevre on 2009-02-02 (www-xml-schema-comments@w3.org from January to March 2009)

From: Vincent Lefevre <vincent@vinc17.org>
Date: Mon, 2 Feb 2009 16:28:31 +0100
To: www-xml-schema-comments@w3.org
Message-ID: <20090202152831.GA1912@vin.lip.ens-lyon.fr>

[Note: I'm not subscribed to the list, please Cc <vincent@vinc17.org>]

In <http://www.w3.org/TR/xmlschema-2/> (XML Schema Part 2: Datatypes
Second Edition, 28 October 2004), the canonical representation for
the primitive datatypes "float" and "double" does not have special
requirements concerning the number of digits (except for leading and
trailing zeroes). For instance, for float:

  3.2.4.2 Canonical representation

  The canonical representation for float is defined by prohibiting
  certain options from the Lexical representation (§3.2.4.1).
  Specifically, the exponent must be indicated by "E". Leading zeroes
  and the preceding optional "+" sign are prohibited in the exponent.
  If the exponent is zero, it must be indicated by "E0". For the
  mantissa, the preceding optional "+" sign is prohibited and the
  decimal point is required. Leading and trailing zeroes are
  prohibited subject to the following: number representations must be
  normalized such that there is a single digit which is non-zero to
  the left of the decimal point and at least a single digit to the
  right of the decimal point unless the value being represented is
  zero. The canonical representation for zero is 0.0E0.

This means that, for instance, the value 1 can be represented by both
strings "1.0E0" and "1.00000000001E0", as they both map to the value 1
and they both follow the conditions of the canonical representation;
thus this representation is not unique. For unicity, I suppose that
the canonical representation should also require that:

  1. the string have the minimum number of digits (in which case,
     the conditions on leading and trailing zeroes are no longer
     necessary);

  2. the decimal value is correctly rounded in this precision.

Alternatively, instead of item (1), require the decimal precision to
be 9 for float and 17 for double (this is the definition of Pmin in
the IEEE754-2008 standard, Section 5.12.2), and keep the conditions
on leading and trailing zeroes.

The former solution allows to represent x = round-to-double(1.1) as
"1.1" instead of "1.1000000000000001" (which has 17 digits and its
distance to x is smaller than |x - 1.1|). However it is more difficult
to implement, IMHO, and care must be taken concerning the subnormals.

Also, do other standards have a notion of canonical representation (in
decimal) for float and double values?

-- 
Vincent Lefèvre <vincent@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)

Received on Monday, 2 February 2009 15:29:10 UTC