RE: Use cases, advantages and disadvantages of the various choices of sending XML across the wire

Hello Roger,

Thanks for your interest in EXI!


Before I proceed to your actual questions, let me point you to the EXI
Measurements Note, which we re-published just after the EXI format
draft: http://www.w3.org/TR/2007/WD-exi-measurements-20070725/


In order to have an objective basis for future decisions, the EXI WG
spent significant time measuring candidate implementations over as wide
a selection of test data as we could assemble before deciding on a base
format. Sample implementations include nearly all you have asked about,
including XML plain, XML + gzip, FastInfoset, ASN.1 with PER and BER
encoding rules. The results are summarily documented in the Measurements
Note and, if desired, raw test reports are available directly through
W3C's CVS repository. As far as actual performance & compactness is
concerned, this data set probably provides a much better (reproducible,
documented, and unbiased) base for your own decisions than any advice I
could give you. But, of course, I do know that in the real world many
additional factors (e.g. installed base) tend to influence or dominate
the decision making. It's just that I can't really help you with that. 

In addition to the Measurements Note, the EXI WG intents to publish a
'Best Practices' note, which may provide additional guidance. The exact
content of this document is not fixed yet, as it is still in the making.


Now, on to your questions:

> Below are various choices for sending XML across the wire.  Would this
> working group provide guidance on when to use each choice? and the
> advantages and disadvantages of each choice? 

As explained above, the EXI WG has provided performance and compactness
measurements, and probably will provide additional information. However,
I suspect we won't ever publish something that will exactly answer your
questions in the way you pose them, as a complete evaluation would
involve many factors outside of WG access or control.


Some (personal!) comments on your choices:

> 1. Send the XML document as is, [...] without any compression [...]

If XML does what you want, at costs that you (and your communication
partners) find acceptable, then use it. In this case, look no further.


> 2. Compress the XML document using a compression tool such as 
> WinZip or Bzip, [...]

Note that general purpose compression is generally a trade-off, which
buys you one property (compactness) at the expense of another
(processing). While that is acceptable for some cases, it is not so much
for others. EXI allows to improve both, simultaneously.

Furthermore, the EXI measurements show EXI to quite consistently
out-perform XML over gzip (using deflate, the same format/algorithm
WinZip usually uses), in both compaction and speed. So as far as XML
transmission is concerned, I would consider EXI to be the more general
solution: In every use case where compression does well, EXI seems to do
better. In the many uses cases where compression is not acceptable, EXI
still provides many benefits.

Of course, the installed base of deflate/gzip/zip may turn out to be a
strong argument.


> 3. Use the compression capabilities inherent in HTTP (gzip content
> encoding, i.e. http+gzip)

I think the HTTP issue is orthogonal to your other questions. My hope is
that EXI could be registered as an HTTP content encoding, too, so it
would work seamlessly with HTTP's built-in content negotiation just as
http+gzip does. Except better, as explained above. :-)  This will
hopefully be covered in the 'Best Practices' note, once available.


> 4. Encode the XML document as an ASN.1 BER, DER, or PER file [...]

I am very certain that using ASN.1 for XML transmission will be popular
in all applications that already employ ASN.1 for any of its other
virtues, e.g. in any non-XML context.

My understanding is that ASN.1 hasn't been too well received in the XML
community at large, presumably because it usually requires exact
adherence to the schema and also drops non-declared content (such as
comments, PIs, namespace declarations, ignorable whitespace, etc.) While
this is largely a non-issue in the ASN.1 world, it tends to be a tad
unpopular with the XML crowd. EXI, being a true XML technology, can
encode the full XML InfoSet and supports arbitrary schema deviations.

Due to some similarities in the content models of EXI and ASN.1, I would
expect EXI to play nicely with ASN.1. My hope is that the ASN.1 tool
vendors will embrace EXI, so ASN.1 and EXI can cooperate, rather than
making this an either-or issue.


Additionally...

> 4. Encode the XML document as an ASN.1 BER, DER, or PER file [...]
> 5. Encode the XML document as a Fast Infoset file, [...]

Both FastInfoset and ASN.1 (PER (aligned) and BER) where measured as
part of the EXI WG group effort. The chosen EXI baseline has
consistently won on compactness against all of them, and usually won or
came close to the leading candidate in processing time. (The
measurements showed a leading group of several candidates, whose
relative performance was usually contained within a relatively small
band, with various members of said group taking 1st place once in a
while. I suspect implementation aspects, such as the time spent
optimizing the respective implementations, were greater factors in
determining the exact placement than actual format properties.) Please
refer to the EXI Measurements Note for details.


> 6. Encode the XML as an Efficient XML Interchange file, and then send
> the EXI file.

Well now, I personally think that is an *excellent* plan! :-)


Of course, having said all that, I have to point out that the caveat I
voiced initially still applies: Any real world deployment decision will
have to involve a number of factors beyond the actual format capability.
As far as capability goes, we hope people will carefully examine and
repeat our measurements and will come to the same conclusions as we did.

Roger, if you think there is anything the EXI WG can do to help people
make such a decision, please let us know!



Sincerely,
Daniel Vogelheim

Received on Tuesday, 7 August 2007 14:48:17 UTC