TAG opinion on XML Binary Format

 
TAG opinion on XML Binary Format
 
The TAG has reviewed in detail the documents [1,2,3,4] prepared by the
XBC workgroup [5].  While we very much appreciate the significant
progress that these notes represent, the TAG believes that more detailed
analysis is needed before a W3C Binary XML Recommendation is
sufficiently justified.  We are taking no position at this time as to
whether Binary XML will prove to be warranted, as there seem to be good
arguments on both sides of that question.  Rather, we are suggesting
that further careful analysis is needed before the W3C commits to a
direction.
 
The TAG believes there are disadvantages as well as potential advantages
that will result from even a well crafted Binary XML Recommendation.
The advantages are clear: a successful binary format is likely to
provide speed gains or size reductions, at least for certain use cases.
The drawbacks are likely to include reduced interoperability with XML
1.0 and XML 1.1 software, and an inability to leverage the benefits of
text-based formats.  These are important concerns.  Quoting from the Web
Architecture document[6]:
 
   "The trade-offs between binary and textual data
   formats are complex and application-
   dependent. Binary formats can be substantially
   more compact, particularly for complex
   pointer-rich data structures. Also, they can be
   consumed more rapidly by agents in those cases
   where they can be loaded into memory and used
   with little or no conversion. Note, however,
   that such cases are relatively uncommon as such
   direct use may open the door to security issues
   that can only practically be addressed by
   examining every aspect of the data structure in
   detail.
 
   "Textual formats are usually more portable and
   interoperable. Textual formats also have the
   considerable advantage that they can be
   directly read by human beings (and understood,
   given sufficient documentation). This can
   simplify the tasks of creating and maintaining
   software, and allow the direct intervention of
   humans in the processing chain without recourse
   to tools more complex than the ubiquitous text
   editor. Finally, it simplifies the necessary
   human task of learning about new data formats;
   this is called the "view source" effect."
 
We therefore believe that the benefits of a binary XML must be
predictable and compelling in order to justify development of a
Recommendation. 
 
In particular, we suggest that a quantitative analysis is necessary.
For at least a few key use cases, concrete targets should be set for the
size and/or speed gains that would be needed to justify the disruption
introduced by a new format.  For example, a target might be that "in
typical web services scenarios, median speed gains on the order of 3x in
combined parsing and deserialization are deemed sufficient to justify a
new format."  We further suggest that representative binary technologies
be benchmarked and analyzed to a sufficient degree that such speed or
size improvements can be reasonably reliably predicted before we commit
to a Recommendation.  No doubt, any given set of goals or benchmarks
will suffer from some degree of imprecision, but if the gains are
sufficiently compelling to justify a new format, then they should be
relatively easy to demonstrate.  In short, actual measurements should be
a prerequisite to preparing a Recommendation.
 
In doing such measurements, we believe it is essential that comparisons
be done to the best possible text-based XML 1.x implementations, which
are not necessarily those that are most widely deployed.  Stated
differently: 
if XML 1.x is inherently capable of meeting the needs of users, then our
efforts should go into tuning our XML implementations, not designing new
formats.  Benchmark environments should be as representative as possible
of fully optimized implementations, not just of the XML parser, but of
the surrounding application or middleware stack.  We note that different
application-level optimizations may be necessary to maximize the
performance of the Binary or text cases respectively.  Care should
especially be taken to ensure that the performance of particular APIs
such as DOM or SAX does not obscure the performance possible with either
option (e.g. both SAX and DOM can easily result in high overhead string
conversions when UTF-8 is used.)
 
The TAG would also appreciate clarification as to how many formats are
likely to be included in a Recommendation; it's not clear whether the
proposal is for one binary xml format for all cases, or if multiple
formats are to be endorsed.  The use of multiple formats is likely to
further reduce interoperability.
 
We feel that introduction of a binary format would be an important
development for those who might benefit from its size or speed, but also
for those who might be impacted by its impact on interoperability and
perspicuity.  Therefore, in order to justify a potential new format, the
TAG would like to see the above issues addressed.  As stated above, we
make no prediction as to whether such an analysis will ultimately
confirm the need for Binary XML;  if it does, we will be glad to support
development of a Recommendation at the W3C.
 
 
[1] http://www.w3.org/TR/xbc-use-cases/
[2] http://www.w3.org/TR/xbc-properties/
[3] http://www.w3.org/TR/xbc-measurement/
[4] http://www.w3.org/TR/xbc-characterization/
[5] http://www.w3.org/XML/Binary/
[6] http://www.w3.org/TR/webarch/#binary

 

 

Received on Tuesday, 24 May 2005 17:26:30 UTC