Re: TAG opinion on XML Binary Format

Your collective analysis agrees with our thinking to a large extent.  
What is unclear is the implication for next steps.  You note that "more 
detailed analysis is needed", "further careful analysis is needed before 
the W3C commits to a direction", "we suggest that a quantitative 
analysis is necessary", "representative binary technologies be 
benchmarked and analyzed", and "the TAG would like to see the above 
issues addressed".  It is unclear how you expect to proceed or how you 
expect "us" to proceed to accomplish this in a coherent, timely, and 
effective way.  "We" expected that these were activities to be performed 
in the early stages of a new working group.  Our documents were 
prepared, secondarily, for those purposes.  If you (collectively) have 
something else in mind, please enlighten us to both to your pending 
options and recommendations and our options.

Our documents should have made it clear that our conclusion is that one 
format could support all use cases well, although in some sense 
different modes will need to be supported (for example, a range of 
self-contained on one extreme and limited structure, externally typed, 
range restricted binary scalars on the other).  To make the coherency of 
this clear, it is probable that both extremes might be needed in a 
single instance.  I would be happy to elaborate.  We expected to start 
with consideration of multiple strategies, but produce a result that 
would be the best of all available ideas.

I was particularly thrilled by this paragraph:
"Benchmark environments should be as representative as possible of fully 
optimized implementations, not just of the XML parser, but of the 
surrounding application or middleware stack.  We note that different 
application-level optimizations may be necessary to maximize the 
performance of the Binary or text cases respectively.  Care should 
especially be taken to ensure that the performance of particular APIs 
such as DOM or SAX does not obscure the performance possible with either 
option (e.g. both SAX and DOM can easily result in high overhead string 
conversions when UTF-8 is used.)"

This is exactly where my thinking is.

Email message format note: You inadvertantly made all of your paragraphs 
preformatted ("<pre>....</pre>") which made it painful to read.  Your 
email editor has a bug if it didn't show this to you while editing.

Thanks!
sdw

Rice, Ed (HP.com) wrote:

> 
>
> TAG opinion on XML Binary Format
>
> The TAG has reviewed in detail the documents [1,2,3,4] prepared by the 
> XBC workgroup [5].  While we very much appreciate the significant 
> progress that these notes represent, the TAG believes that more 
> detailed analysis is needed before a W3C Binary XML Recommendation is 
> sufficiently justified.  We are taking no position at this time as to 
> whether Binary XML will prove to be warranted, as there seem to be 
> good arguments on both sides of that question.  Rather, we are 
> suggesting that further careful analysis is needed before the W3C 
> commits to a direction.
>  
> The TAG believes there are disadvantages as well as potential 
> advantages that will result from even a well crafted Binary XML 
> Recommendation.  The advantages are clear: a successful binary format 
> is likely to provide speed gains or size reductions, at least for 
> certain use cases.  The drawbacks are likely to include reduced 
> interoperability with XML 1.0 and XML 1.1 software, and an inability 
> to leverage the benefits of text-based formats.  These are important 
> concerns.  Quoting from the Web Architecture document[6]:
>
> 
>
>   "The trade-offs between binary and textual data
>
>   formats are complex and application-
>
>   dependent. Binary formats can be substantially
>
>   more compact, particularly for complex
>
>   pointer-rich data structures. Also, they can be
>
>   consumed more rapidly by agents in those cases
>
>   where they can be loaded into memory and used
>
>   with little or no conversion. Note, however,
>
>   that such cases are relatively uncommon as such
>
>   direct use may open the door to security issues
>
>   that can only practically be addressed by
>
>   examining every aspect of the data structure in
>
>   detail.
>
> 
>
>   "Textual formats are usually more portable and
>
>   interoperable. Textual formats also have the
>
>   considerable advantage that they can be
>
>   directly read by human beings (and understood,
>
>   given sufficient documentation). This can
>
>   simplify the tasks of creating and maintaining
>
>   software, and allow the direct intervention of
>
>   humans in the processing chain without recourse
>
>   to tools more complex than the ubiquitous text
>
>   editor. Finally, it simplifies the necessary
>
>   human task of learning about new data formats;
>
>   this is called the "view source" effect."
>
> 
>
> We therefore believe that the benefits of a binary XML must be 
> predictable and compelling in order to justify development of a 
> Recommendation.
>  
> In particular, we suggest that a quantitative analysis is necessary.  
> For at least a few key use cases, concrete targets should be set for 
> the size and/or speed gains that would be needed to justify the 
> disruption introduced by a new format.  For example, a target might be 
> that "in typical web services scenarios, median speed gains on the 
> order of 3x in combined parsing and deserialization are deemed 
> sufficient to justify a new format."  We further suggest that 
> representative binary technologies be benchmarked and analyzed to a 
> sufficient degree that such speed or size improvements can be 
> reasonably reliably predicted before we commit to a Recommendation.  
> No doubt, any given set of goals or benchmarks will suffer from some 
> degree of imprecision, but if the gains are sufficiently compelling to 
> justify a new format, then they should be relatively easy to 
> demonstrate.  In short, actual measurements should be a prerequisite 
> to preparing a Recommendation.
>  
> In doing such measurements, we believe it is essential that 
> comparisons be done to the best possible text-based XML 1.x 
> implementations, which are not necessarily those that are most widely 
> deployed.  Stated differently:
> if XML 1.x is inherently capable of meeting the needs of users, then 
> our efforts should go into tuning our XML implementations, not 
> designing new formats.  Benchmark environments should be as 
> representative as possible of fully optimized implementations, not 
> just of the XML parser, but of the surrounding application or 
> middleware stack.  We note that different application-level 
> optimizations may be necessary to maximize the performance of the 
> Binary or text cases respectively.  Care should especially be taken to 
> ensure that the performance of particular APIs such as DOM or SAX does 
> not obscure the performance possible with either option (e.g. both SAX 
> and DOM can easily result in high overhead string conversions when 
> UTF-8 is used.)
>  
> The TAG would also appreciate clarification as to how many formats are 
> likely to be included in a Recommendation; it's not clear whether the 
> proposal is for one binary xml format for all cases, or if multiple 
> formats are to be endorsed.  The use of multiple formats is likely to 
> further reduce interoperability.
>  
> We feel that introduction of a binary format would be an important 
> development for those who might benefit from its size or speed, but 
> also for those who might be impacted by its impact on interoperability 
> and perspicuity.  Therefore, in order to justify a potential new 
> format, the TAG would like to see the above issues addressed.  As 
> stated above, we make no prediction as to whether such an analysis 
> will ultimately confirm the need for Binary XML;  if it does, we will 
> be glad to support development of a Recommendation at the W3C.
>  
>
> 
>
>[1] http://www.w3.org/TR/xbc-use-cases/
>
>[2] http://www.w3.org/TR/xbc-properties/
>
>[3] http://www.w3.org/TR/xbc-measurement/
>
>[4] http://www.w3.org/TR/xbc-characterization/
>
>[5] http://www.w3.org/XML/Binary/
>
>[6] http://www.w3.org/TR/webarch/#binary
>
>  
>
>  
>


-- 
swilliams@hpti.com http://www.hpti.com Per: sdw@lig.net http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw

Received on Saturday, 28 May 2005 03:08:08 UTC