"RE: Support of IEEE float; Canonical XML"

All,

The WG seems determined to take what I and apparently many others feel is obviously the wrong direction on this issue. This is always a danger when trying to make a standard based on an existing implementation, since the implementors must properly be closely involved but will usually advocate strongly for the status quo. This happened in the first standards committee I was on long ago, the implementers mostly had their way and, I think partly because of that, the standard ultimately failed. I don't know if that is whats happening here but the effect is the same.

I will try to summarize my arguments for IEEE floating point here for reference. I would urge anyone else who agrees to add their view at this time.

If the proposed recommendation is indeed in "last call", presumably it will eventually come up for a vote. In the mean time, W3C rules require that all comments be addressed so to keep things going despite the WG (and probably everyone else) being tired of the matter I'm also asking for documentation of the WG claims and evaluation.

If it does come up for a vote, I must reluctantly urge everyone to vote against the recommendation in its current form.



>From our discussion so far, changing to IEEE floating point format would basically require two changes. First, the normal representation of data identified as XML Schema float or double would be IEEE 754 32-bit or 64-bit binary, respectively, in the same bit order as n-bit integer. Second, when the preserve-lexical-values option is set the data would be represented in character form as in XML.

> Paul,
> 
> The WG has taken a comprehensive look at this issue.
> 
> EXI is a format that is for XML infoset, informed by schemas when
> they are available, as opposed to being schema-bound.

The particular case we are discussing here occurs specifically when EXI encoding is informed by XML Schema. There is no impact on uninformed encoding.

> The goal is
> to serve as an efficient alternative encoding of an infoset, for
> users exchanging infosets.

EXI can be so much more than a mere efficient encoding of XML text-based documents. Thats why its a good thing its based on encoding the infoset and not just gzipping the XML characters. This means its possible (when the preserve lexical values option is not set) to focus on data values rather than their character encodings. In the case of encoding data typed as XML Schema float or double, the infoset data is specified (on purpose) such that data values can be uniquely represented in IEEE 754 format.

> 
> Given that APIs that are in use today are based-on text, and we do not
> expect the landscape to significantly change because of EXI, we
> believe it is logical for us to keep the schema-informed float value
> representation akin to text so that EXI float-to-text conversion requires
> minimal processing overhead.

I disagree that the landscape will not change because of EXI; that would be a clear indication that EXI has failed. But more important, there are significant APIs in use today that have binary interfaces. Two classes of such APIs are web services (e.g. the new SOAP-JMS binding) and the many object binding systems for XML (e.g. XMLBeans).

> 
> With that in mind and also considering the better compactness in
> size and amenability to round-trip with XML that it provides, it is the
> consensus of the WG that it should benefit the majority of users
> and therefore is the best way to go with.

Is the supposed size advantage tested and documented? I find it difficult to believe that there would be a size advantage except for human-generated values and sparse data. For machine-generated values expressed at full precesion (always the likely default), where EXI would be most useful because of the large quantity of data, it seems unlikely that conversion from binary to a decimal-based represenation could be anything but wasteful. For sparse data containing large quantities of simple values such as 0.0, the proposed format might be more efficient but the compression option will eliminate most of the difference.

There is little if any net advantage with the current proposal in round-trip to XML, in fact, it will likely turn out to have a net disadvantage compared to IEEE floating point. Both the current spec and the IEEE option will convert accurately to and from character data for XML, given a high quality implementation. But for IEEE floating point, quality implementations already exist and can be leveraged from existing native language libraries. These are already tuned for performance, which cannot be expected of EXI implementations that must be created specifically to conform to a representation that is used nowhere else, even if it is simpler. Also, because EXI will be used where XML is too inefficient, the round-trip case will be used only in special situations such as debugging, so only accuracy matters - not performance.

What will best serve the majority of users should be indicated from evaluating the alternatives against the use cases for quantifiable differences, and against known best practices for intangibles.

The purported advantages of the current proposal:

1. More compact

2. Faster round-trip to XML

3. Otherwise better in round-trip to XML (accuracy is the only aspect we've discussed.)

4. Faster generation/parsing with text-based APIs.

5. Otherwise better with text-based APIs.

I would argue that an IEEE 754 representation would have at least these advantages, in addition to coopting all the above:

5. Its a standard.

6. Its the native representation on almost all computers.

7. Faster with binary APIs.

8. Otherwise better with binary APIs.

Items 1, 2, 4 and 7 are quantitative and should have been measured. Items 2 and 4 and items 3 and 5 are mechanically the same but would have different weights in the final evaluation.

In combining these items to reach a final conclusion, I give much higher weight to 7 and 8 (binary APIs) than 4 and 5 (text APIs), and no weight to 2 (XML), because EXI is needed where efficiency matters and, if successful, will be used most heavily where the entire path is most efficient.


> 
> We thank you for your insight on this issue, and express our appreciation
> for your verve in the whole discussion on this and related topics.
> 
> Regards,
> 
> Taki Kamiya (for the EXI Working Group)

I'm grateful for the opportunity to participate and hope to make what little contribution I can to the ultimate success of EXI.

Paul Pierce

Received on Thursday, 23 July 2009 18:09:02 UTC