"RE: [EXI LC comments] Mapping from xs:float and xs:double to exi:Float and vice versa" from Paul Pierce on 2009-02-25 (public-exi-comments@w3.org from February 2009)

From: Paul Pierce <prp@teleport.com>
Date: 25 Feb 2009 02:17:11
To: "EXI Comments" <public-exi-comments@w3.org>
Message-ID: <paul.20090225t020707z.1@msi945g3/192.168.1.127>
John,

Floating point is a tricky thing. Its easy for us amateurs to mess it up without knowing (I'm an ex system programmer, and have implemented floating point care in several operating systems for scientific use. Fortunately I had a floating point expert looking over my shoulder.) Historically, each computer manufacturer designed their own floating point system. This caused endless problems, both in achieving correct results on a given system and in duplicating results between different systems.

The IEEE-754 standard was developed to address these problems, and has been so successful that effectively all processors with floating point capability implement it natively. Its still floating point, its still tricky to use in sensitive applications, but so many of the early problems have been addressed that many users need not worry about it at all and those who must worry have access to options to tune it to their needs.

There is a great deal of work going into various kinds of distributed computation using XML as the transport language between processors. EXI would of course be a boon as it retains all the advantages of XML while making it possible to transport much more data efficiently. Of course, in effectively all cases, the processors on each end of the transport are using IEEE-754 arithmetic. Since its a binary format, the only way EXI could mess it up would be to translate the data values into some other format, and back.

With XML, the only way to represent data values is in character strings or base-64 encoded binary. But base-64 binary loses type information and is not human readable. It is possible, with care, to translate IEEE-754 floating point values into XML-Schema compliant character strings in such a way that they can be translated back exactly, with only the loss of some special representations of NaN's as noted in XML-Schema. As long as existing XML API's are used, the programmer has control of the representation and can take the necessary care to ensure that data values are preserved as precisely as necessary.

One would expect programmers to embrace the new proposed API's that support data transfer to and from the XML Infoset using native datatypes. But this places the programmer at the mercy of us amateurs implementing the new API's, unless the Infoset representation is identical to that of the incoming datatype. (Perhaps you and I are not strictly amateurs, but certainly someone implementing EXI will be, and some poor scientific programmer will have to use their work.)

For XML, there is nothing to be done. But for EXI, this makes a very strong case for IEEE-754 representation of floating point. Since the implementation of the translation moves from the end-user programmer to the library programmer, it is not possible to argue that "Any work-arounds developed to address rounding issues for text XML will continue to work for EXI", as it may be only the end-user programmer who understands the necessary work-arounds.

Also, of course, using the same representation at the API and in the Infoset representation makes it run faster, perhaps quite a bit faster if there is a lot of data.

The rationale you gave (below) for the floating point representation in the existing EXI spec is consistent with the best overall approach to EXI design. EXI should certainly mimic XML in every way possible, especially for all the structure and for character data. But for numeric datatypes, and especially for the XML Schema float and double types where the spec explicitly references IEEE-754 and thereby sets reasonable expectations, blind adherence to XML semantics is excessive and, I think, will prove shortsighted. EXI has great potential but will be crippled permanently if things like the numeric datatype representation are not fixed.

Paul Pierce


--- Original Message ---

Mohamed,
 
Thank you for your question regarding the the EXI Float data type representation. One of the advantages of using a base 10 representation is that it avoids rounding issues when moving floating point data between EXI and text XML and between EXI and an application that use the standard XML interfaces. XML, EXI and the standard XML interfaces all use a base 10 representation for floating point numbers, so no rounding issues will occur in these circumstances. 
 
You are correct that rounding issues may occur when moving floating point data between EXI and a base 2 representation. These rounding issues will be identical to those that occur moving floating point data between text XML and a base 2 representation, so EXI maintains the same behavior as XML in these cases. As such we avoid introducing any *new* rounding issues. Any work-arounds developed to address rounding issues for text XML will continue to work for EXI.
 
I hope this helps to explain our rationale. Please let us know if you have follow-up questions or comments. 
 
    Thank you,
 
    John
 
    AgileDelta, Inc.
...
Received on Wednesday, 25 February 2009 03:26:21 UTC