RE: Support of IEEE float; Canonical XML

Paul,

The WG has taken a comprehensive look at this issue.

EXI is a format that is for XML infoset, informed by schemas when
they are available, as opposed to being schema-bound. The goal is
to serve as an efficient alternative encoding of an infoset, for
users exchanging infosets.

Given that APIs that are in use today are based-on text, and we do not
expect the landscape to significantly change because of EXI, we
believe it is logical for us to keep the schema-informed float value
representation akin to text so that EXI float-to-text conversion requires
minimal processing overhead.

With that in mind and also considering the better compactness in
size and amenability to round-trip with XML that it provides, it is the
consensus of the WG that it should benefit the majority of users
and therefore is the best way to go with.

We thank you for your insight on this issue, and express our appreciation
for your verve in the whole discussion on this and related topics.

Regards,

Taki Kamiya (for the EXI Working Group)


-----Original Message-----
From: Paul Pierce [mailto:prp@teleport.com]
Sent: Wednesday, June 24, 2009 10:29 PM
To: Taki Kamiya; EXI Comments
Subject: "RE: Support of IEEE float; Canonical XML"

Taki,

The precision of the XML Schema datatypes float and double is specified and
cannot be "lost" in translation. What can be lost in the round trip from XML
to IEEE floating point back to XML is the accurate representation of a value
to the specified precision.

Thus, in your example, the decimal representation 0.1 need only be translated
into enough binary digits to maintain the specified precision of the datatype
(float or double). Because XML Schema specifies a precision that matches the
corresponding IEEE representation, the IEEE representation has enough binary
digits to do the job. On translation back into XML, it should be sufficient
to use the IEEE round to even option and perform the XML Schema canonicalization
in order to recover the representation 1.0E-1, which is exactly the same value
in canonical form.

With respect to the user specified datatype, I would like to point out that it
is never acceptable in a standards process to avoid a difficult decision in
the body of the standard by falling back on an extension option. This
discussion should proceed as if the user datatype option didn't exist (as
indeed it shouldn't, but we will get to that after the datatype representations
are cleared up. And after that, it will be time to tackle compression, which
seems to need quite a bit of work.)

Paul

> Hi Paul,
>
> The problem of numeric precision loss is most likely to occur not when the
> data starts with EXI, but when it originates in XML and gets transcoded to EXI
> then back to XML.
>
> The WG observed that any finite-length base-2 number can always be converted
> to an equivalent finite-length base-10 number, but not vice versa. There
> are finite-length base-10 numbers that when converted to base-2 number need
> infinite digits to describe exactly the same numbers.

Received on Wednesday, 22 July 2009 17:59:33 UTC