RE: Support of IEEE float; Canonical XML" from John Schneider on 2009-07-24 (public-exi-comments@w3.org from July 2009)

From: John Schneider <john.schneider@agiledelta.com>
Date: Fri, 24 Jul 2009 10:53:41 -0700
To: "'Paul Pierce'" <prp@teleport.com>, "'Taki Kamiya'" <tkamiya@us.fujitsu.com>, "'EXI Comments'" <public-exi-comments@w3.org>
Message-ID: <E42F39535F9941EE9DDDF5B3AA9E81CE@jcsdell8600>
Paul,

This is a personal response and doesn't represent the position of the EXI
working group. My company created the Efficient XML technology selected as
the basis of the EXI standard, so I assume we are the "implementers" you are
referring to below. I know exactly what you're talking about when you say
"implementers" will often argue for the status quo. I've seen this myself.
More often than not, it is done by a 900 pound gorilla that can throw its
weight around to get what it wants. For what its worth, we are one of the
smallest companies in the working group and throwing our weight around
wouldn't get us very far. We might just generate enough momentum to knock
over a teacup. :-)

Your speculation that the implementer is arguing for the status quo, while
rational and informed, is incorrect in this case. We have had Efficient XML
implementations both with and without support for IEEE floating point
representation for several years now. So, from an implementation standpoint,
we have no motivation to go one way or another on this issue. In addition,
if you look at the changes made to the EXI specification since we first
submitted it, you'll find far more substantial changes than IEEE. Notable
examples are self-contained sub-trees, bounded string tables, byte-aligned
mode, strict mode and many more. This is clearly not the status quo.

In my experience, the EXI working group and the XBC working group before it
are  motivated primarily by technical arguments backed by concrete test
results. They benchmark and test everything before making decisions. They've
run benchmarks to test the impact of bounded integers, restricted charsets,
bounded-string table algorithms, a simplified all group, etc. And they've
run several tests and had lengthy discussions about IEEE. Before the group
ran tests and got into the details, I think more than half of the working
group members favored IEEE. However, as the group began reviewing the test
results, they came to the consensus that the EXI scalable floating point
representation was a better fit for most of the EXI use cases than the IEEE
representation. 

To be perfectly clear, there are definitely some use cases that will prefer
IEEE floating point representation and EXI will support the use of the IEEE
representation for these cases. The question is not whether you will be able
to use the IEEE floating point representation with EXI. You will. The
question is whether IEEE should be the default for all use cases. W3C tests
have shown that making IEEE the default representation will negatively
impact compactness for many/most use cases and negatively impact processing
performance for many others. In addition, it would make it very difficult
for many small devices that don't have built-in IEEE support to process EXI
documents that used this default. Implementing IEEE support on such devices
would require more code footprint than they can generally spare for EXI.

The EXI working group has taken a very deep dive on this topic and has a
very informed viewpoint on it. They have been looking at it, testing it and
analyzing it since before the first draft of the EXI spec was published.
They take your comments and feedback very seriously and have discussed and
debated each one at length.  

I do understand that you were not part of the W3C analysis of this topic and
do not have the benefit of the associated technical discussions and test
results. So, its not completely fair to expect you to be in the same place
as the working group on this topic. I'll recommend the working group share
some of our test data on this topic so you can better see where they are
coming from. 

	All the best,

	John


> -----Original Message-----
> From: public-exi-comments-request@w3.org 
> [mailto:public-exi-comments-request@w3.org] On Behalf Of Paul Pierce
> Sent: Wednesday, July 22, 2009 10:22 PM
> To: Taki Kamiya; EXI Comments
> Subject: "RE: Support of IEEE float; Canonical XML"
> 
> All,
> 
> The WG seems determined to take what I and apparently many 
> others feel is obviously the wrong direction on this issue. 
> This is always a danger when trying to make a standard based 
> on an existing implementation, since the implementors must 
> properly be closely involved but will usually advocate 
> strongly for the status quo. This happened in the first 
> standards committee I was on long ago, the implementers 
> mostly had their way and, I think partly because of that, the 
> standard ultimately failed. I don't know if that is whats 
> happening here but the effect is the same.
> 
> I will try to summarize my arguments for IEEE floating point 
> here for reference. I would urge anyone else who agrees to 
> add their view at this time.
> 
> If the proposed recommendation is indeed in "last call", 
> presumably it will eventually come up for a vote. In the mean 
> time, W3C rules require that all comments be addressed so to 
> keep things going despite the WG (and probably everyone else) 
> being tired of the matter I'm also asking for documentation 
> of the WG claims and evaluation.
> 
> If it does come up for a vote, I must reluctantly urge 
> everyone to vote against the recommendation in its current form.
> 
> 
> 
> >From our discussion so far, changing to IEEE floating point 
> format would basically require two changes. First, the normal 
> representation of data identified as XML Schema float or 
> double would be IEEE 754 32-bit or 64-bit binary, 
> respectively, in the same bit order as n-bit integer. Second, 
> when the preserve-lexical-values option is set the data would 
> be represented in character form as in XML.
> 
> > Paul,
> > 
> > The WG has taken a comprehensive look at this issue.
> > 
> > EXI is a format that is for XML infoset, informed by 
> schemas when they 
> > are available, as opposed to being schema-bound.
> 
> The particular case we are discussing here occurs 
> specifically when EXI encoding is informed by XML Schema. 
> There is no impact on uninformed encoding.
> 
> > The goal is
> > to serve as an efficient alternative encoding of an 
> infoset, for users 
> > exchanging infosets.
> 
> EXI can be so much more than a mere efficient encoding of XML 
> text-based documents. Thats why its a good thing its based on 
> encoding the infoset and not just gzipping the XML 
> characters. This means its possible (when the preserve 
> lexical values option is not set) to focus on data values 
> rather than their character encodings. In the case of 
> encoding data typed as XML Schema float or double, the 
> infoset data is specified (on purpose) such that data values 
> can be uniquely represented in IEEE 754 format.
> 
> > 
> > Given that APIs that are in use today are based-on text, 
> and we do not 
> > expect the landscape to significantly change because of EXI, we 
> > believe it is logical for us to keep the schema-informed 
> float value 
> > representation akin to text so that EXI float-to-text conversion 
> > requires minimal processing overhead.
> 
> I disagree that the landscape will not change because of EXI; 
> that would be a clear indication that EXI has failed. But 
> more important, there are significant APIs in use today that 
> have binary interfaces. Two classes of such APIs are web 
> services (e.g. the new SOAP-JMS binding) and the many object 
> binding systems for XML (e.g. XMLBeans).
> 
> > 
> > With that in mind and also considering the better 
> compactness in size 
> > and amenability to round-trip with XML that it provides, it is the 
> > consensus of the WG that it should benefit the majority of 
> users and 
> > therefore is the best way to go with.
> 
> Is the supposed size advantage tested and documented? I find 
> it difficult to believe that there would be a size advantage 
> except for human-generated values and sparse data. For 
> machine-generated values expressed at full precesion (always 
> the likely default), where EXI would be most useful because 
> of the large quantity of data, it seems unlikely that 
> conversion from binary to a decimal-based represenation could 
> be anything but wasteful. For sparse data containing large 
> quantities of simple values such as 0.0, the proposed format 
> might be more efficient but the compression option will 
> eliminate most of the difference.
> 
> There is little if any net advantage with the current 
> proposal in round-trip to XML, in fact, it will likely turn 
> out to have a net disadvantage compared to IEEE floating 
> point. Both the current spec and the IEEE option will convert 
> accurately to and from character data for XML, given a high 
> quality implementation. But for IEEE floating point, quality 
> implementations already exist and can be leveraged from 
> existing native language libraries. These are already tuned 
> for performance, which cannot be expected of EXI 
> implementations that must be created specifically to conform 
> to a representation that is used nowhere else, even if it is 
> simpler. Also, because EXI will be used where XML is too 
> inefficient, the round-trip case will be used only in special 
> situations such as debugging, so only accuracy matters - not 
> performance.
> 
> What will best serve the majority of users should be 
> indicated from evaluating the alternatives against the use 
> cases for quantifiable differences, and against known best 
> practices for intangibles.
> 
> The purported advantages of the current proposal:
> 
> 1. More compact
> 
> 2. Faster round-trip to XML
> 
> 3. Otherwise better in round-trip to XML (accuracy is the 
> only aspect we've discussed.)
> 
> 4. Faster generation/parsing with text-based APIs.
> 
> 5. Otherwise better with text-based APIs.
> 
> I would argue that an IEEE 754 representation would have at 
> least these advantages, in addition to coopting all the above:
> 
> 5. Its a standard.
> 
> 6. Its the native representation on almost all computers.
> 
> 7. Faster with binary APIs.
> 
> 8. Otherwise better with binary APIs.
> 
> Items 1, 2, 4 and 7 are quantitative and should have been 
> measured. Items 2 and 4 and items 3 and 5 are mechanically 
> the same but would have different weights in the final evaluation.
> 
> In combining these items to reach a final conclusion, I give 
> much higher weight to 7 and 8 (binary APIs) than 4 and 5 
> (text APIs), and no weight to 2 (XML), because EXI is needed 
> where efficiency matters and, if successful, will be used 
> most heavily where the entire path is most efficient.
> 
> 
> > 
> > We thank you for your insight on this issue, and express our 
> > appreciation for your verve in the whole discussion on this 
> and related topics.
> > 
> > Regards,
> > 
> > Taki Kamiya (for the EXI Working Group)
> 
> I'm grateful for the opportunity to participate and hope to 
> make what little contribution I can to the ultimate success of EXI.
> 
> Paul Pierce
>
Received on Friday, 24 July 2009 17:54:24 UTC