RE: Canonical representation of float/double and number of digits from Michael Kay on 2009-02-02 (www-xml-schema-comments@w3.org from January to March 2009)

From: Michael Kay <mike@saxonica.com>
Date: Mon, 2 Feb 2009 15:59:18 -0000
To: "'Vincent Lefevre'" <vincent@vinc17.org>, <www-xml-schema-comments@w3.org>
Cc: <vincent@vinc17.org>
Message-ID: <EDA956AB06BF48809DA9395C8FC44A7F@Sealion>

 
You might like to take into account that XSD 1.1 defines a canonical
representation more precisely:

http://www.w3.org/TR/xmlschema11-2/#f-doubleCanmap

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: www-xml-schema-comments-request@w3.org 
> [mailto:www-xml-schema-comments-request@w3.org] On Behalf Of 
> Vincent Lefevre
> Sent: 02 February 2009 15:29
> To: www-xml-schema-comments@w3.org
> Subject: Canonical representation of float/double and number of digits
> 
> 
> [Note: I'm not subscribed to the list, please Cc <vincent@vinc17.org>]
> 
> In <http://www.w3.org/TR/xmlschema-2/> (XML Schema Part 2: 
> Datatypes Second Edition, 28 October 2004), the canonical 
> representation for the primitive datatypes "float" and 
> "double" does not have special requirements concerning the 
> number of digits (except for leading and trailing zeroes). 
> For instance, for float:
> 
>   3.2.4.2 Canonical representation
> 
>   The canonical representation for float is defined by prohibiting
>   certain options from the Lexical representation (§3.2.4.1).
>   Specifically, the exponent must be indicated by "E". Leading zeroes
>   and the preceding optional "+" sign are prohibited in the exponent.
>   If the exponent is zero, it must be indicated by "E0". For the
>   mantissa, the preceding optional "+" sign is prohibited and the
>   decimal point is required. Leading and trailing zeroes are
>   prohibited subject to the following: number representations must be
>   normalized such that there is a single digit which is non-zero to
>   the left of the decimal point and at least a single digit to the
>   right of the decimal point unless the value being represented is
>   zero. The canonical representation for zero is 0.0E0.
> 
> This means that, for instance, the value 1 can be represented 
> by both strings "1.0E0" and "1.00000000001E0", as they both 
> map to the value 1 and they both follow the conditions of the 
> canonical representation; thus this representation is not 
> unique. For unicity, I suppose that the canonical 
> representation should also require that:
> 
>   1. the string have the minimum number of digits (in which case,
>      the conditions on leading and trailing zeroes are no longer
>      necessary);
> 
>   2. the decimal value is correctly rounded in this precision.
> 
> Alternatively, instead of item (1), require the decimal 
> precision to be 9 for float and 17 for double (this is the 
> definition of Pmin in the IEEE754-2008 standard, Section 
> 5.12.2), and keep the conditions on leading and trailing zeroes.
> 
> The former solution allows to represent x = 
> round-to-double(1.1) as "1.1" instead of "1.1000000000000001" 
> (which has 17 digits and its distance to x is smaller than |x 
> - 1.1|). However it is more difficult to implement, IMHO, and 
> care must be taken concerning the subnormals.
> 
> Also, do other standards have a notion of canonical representation (in
> decimal) for float and double values?
> 
> --
> Vincent Lefèvre <vincent@vinc17.org> - Web: 
> <http://www.vinc17.org/> 100% accessible validated (X)HTML - 
> Blog: <http://www.vinc17.org/blog/>
> Work: CR INRIA - computer arithmetic / Arenaire project (LIP, 
> ENS-Lyon)
>

Received on Monday, 2 February 2009 15:59:59 UTC