- From: Tim Bray <tbray@textuality.com>
- Date: Mon, 02 Dec 2002 17:22:32 -0800
- To: Michael Mealling <michael@neonym.net>
- Cc: "Champion, Mike" <Mike.Champion@SoftwareAG-USA.com>, www-tag@w3.org
Michael Mealling wrote:
> I for one would appreciate it. There are several protocols I've been
> working with that, due to their particular nature, would benefit from an
> efficient serialization that was very specifically _not_ 'just gzip'.
> The model we're working with requires the impact to the server to be
> very low as well since the cost to recover is higher than the cost to
> requery. If gzip is used then that relationship flipflops and the impact
> to the entire system is extremely significant. Thus the reason why we
> keep coming back to WBXML as the solution.
This is a tough problem. If the tag density is very high relative to
running text, you can try to binary-encode markup with a dictionary
(what WBXML does IIRC); of course if you wanted to retain XML's virtue
of being self-contained you'd want to include the dictionary in the
message, which would blow off most of the benefit in the case the
messages are short. Another approach would be simply to be rigorously
minimal in choosing tag names, e.g.
<m a="33.34.44.55" from="foo@bar.org"><a u="3" h="ajfoeiw"/></m>
at which point the savings from compression are less significant.
If the markup density is lower, the problem reduces to that of
compressing text, which is fortunately well-understood, the considerable
redundancy in most XML compresses beautifully per all the standard
algorithms, so you can pick any particular cost-effectiveness point from
the menu.
I'd really want to see some hard statistical data about the
characteristics of the message traffic in question before I went out on
a limb as to the best way to deal with the problem. Also it's not clear
that there's a solution with acceptable cost-effectiveness across a
broad spectrum of applications, even if the apps are limited to the
wireless-networking space. -Tim
Received on Monday, 2 December 2002 20:22:31 UTC