Re: [binaryXML-30] Binary XML problem statement. from Mike Champion on 2003-02-19 (www-tag@w3.org from February 2003)

From: Mike Champion <mike.champion@softwareag-usa.com>
Date: Wed, 19 Feb 2003 13:42:31 -0500
To: www-tag@w3.org
Message-id: <oprkut85pwt3hq37@smtp.comcast.com>
This is a useful starting point.  My major suggestion is that "Binary XML" 
should be treated merely as a label for this cluster of issues, and not as 
the outline of a solution.  I think it would be more effective to think of 
the problem here as *optimized* representations of the information in an 
XML document (or serializations of the Infoset, if you will).  Beginning 
with a discussion of optimization makes one ask the question "what property 
is being optimized."  Chris Lilley has a nice list of properties that 
alternative XML serializations could address, including Network Efficiency, 
Storage Efficiency, Data typing, Random Access, and Interoperability. [I 
must confess that the Trust Boundaries issue makes no sense to me in this 
context, but that's another discussion]. XML 1.0 appears in retrospect to 
be a nice compromise amoong all these criteria, obviously weighting 
interoperability the heaviest but not doing too much violence to the others 
(or allowing them to be layered on, e.g. datatype information inserted by a 
schema validator). Chris seems to be arguing that no "binary" format could 
optimize all these simultaneously, and that's certainly true.

A more interesting question, which I think is in the TAG's domain, is 
whether the XML 1.0 compromise, one-serialization-fits-all approach is the 
*only* one that the W3C should standardize.  Would there be benefit in 
blessing others that optimize some other property or set of compatible 
properties?  For example, let's say that a "space-optimized" version of XML 
was standardized (it might well be a gzip of XML 1.x), and a "speed- 
optimized" version was also available.  Via HTTP content negotiation or 
out-of-band agreement, producers and consumers of XML could decide which 
format to use in a particular application or network/processing context. 
The downside of that would be that interoperability would suffer -- an 
application that didn't understand the format negotiation mechanism might 
get an "XML" format that it didn't recognize.  (Making the default format 
unicode-with-angle-brackets would probably handle most of these issues in 
practice, but the guaranteed, universally interoperable *principle* would 
clearly be violated.)  The upside is of standardization is the network 
effect -- a small number of standardized formats than a large number of 
proprietary and non-interoperable formats increases the overall "value" of 
the system by maximizing the number of nodes that can *efficiently* 
interoperate.

So, the discussion points for the TAG would seem to be:

- Is the conception of alternative standardized representations of the XML 
Infoset that optimize specific properties (such as parsing speed *or* 
network bandwidth consumption) consistent with the overall Web 
architecture? [I personally think so ... it seems little different in 
concept than offering SVG, GIF, JPEG, and PDF representations of an image; 
you could also ask for SVG (XML 1.x), SVG-fast (optimized for fast parsing) 
, SVG-compressed, etc. ]

- Would the benefits of alternative "standard" serializations outweigh the 
costs in complexity / interoperability?  [Good question, but I would tend 
to say "yes" for the usual reasons: this stuff is happening out there in 
the wild, and better to try to coordinate and standardize within a common 
architectural framework than to have one or more of these things create a 
"fork" that destroys interop anyway.]

- Should some specific WG or CG be asked to look into alternative 
standardizations of the Infoset serialization format? [I personally doubt 
it; there isn't much interest or expertise in this inside the W3C that I'm 
aware of, but if some outside group does something like this within the 
constraints of the Webarch, you could say it would be a Good Thing.]





-- 
Mike Champion
Received on Wednesday, 19 February 2003 13:43:32 UTC