Re: binaryXML, marshalling, and and trust boundaries

On Mon, 2002-12-02 at 20:22, Tim Bray wrote:
> Michael Mealling wrote:
> 
> > I for one would appreciate it. There are several protocols I've been
> > working with that, due to their particular nature, would benefit from an
> > efficient serialization that was very specifically _not_ 'just gzip'.
> > The model we're working with requires the impact to the server to be
> > very low as well since the cost to recover is higher than the cost to
> > requery. If gzip is used then that relationship flipflops and the impact
> > to the entire system is extremely significant. Thus the reason why we
> > keep coming back to WBXML as the solution.
> 
> This is a tough problem.  If the tag density is very high relative to 
> running text, you can try to binary-encode markup with a dictionary 
> (what WBXML does IIRC); of course if you wanted to retain XML's virtue 
> of being self-contained you'd want to include the dictionary in the 
> message, which would blow off most of the benefit in the case the 
> messages are short.

The messages are short and well understood (i.e. no arbitrary namespaces
or other 'higher order' XML features are used). Currently the entire
thing is handled just fine by a DTD. The issue is really packet size and
CPU time on the server side. 

> Another approach would be simply to be rigorously 
> minimal in choosing tag names, e.g.
> 
>     <m a="33.34.44.55" from="foo@bar.org"><a u="3" h="ajfoeiw"/></m>
> 
> at which point the savings from compression are less significant.

But still large enough to be an issue. I'm trying to get to the point
where tag names are limited to bits rather than bytes.

> If the markup density is lower, the problem reduces to that of 
> compressing text, which is fortunately well-understood, the considerable 
> redundancy in most XML compresses beautifully per all the standard 
> algorithms, so you can pick any particular cost-effectiveness point from 
> the menu.

In most of the cases the query from the client to the server is small
enough that compression of the text doesn't by you much beyond simply
stripping to 7 bit ASCII would give you. Its the tag name sizes and
redundency in closing tags that's the largest size savings.

> I'd really want to see some hard statistical data about the 
> characteristics of the message traffic in question before I went out on 
> a limb as to the best way to deal with the problem.  Also it's not clear 
> that there's a solution with acceptable cost-effectiveness across a 
> broad spectrum of applications, even if the apps are limited to the 
> wireless-networking space. -Tim

While we don't have specific data on this exact application, the model
is very close to that of DNS: queries are exact match lookup, queries
and responses are limited to 512 byte UDP packets, recovery from a
transport layer failure is to requery via TCP, etc. The issue is that if
the number of TCP-only queries at the roots increases by even a small
amount the requirements to scale the system to even the last decades
usage levels becomes cost-prohibitive.

So imagine what you would want to do to XML if DNS version 2.0 had to
use it but still had to maintain its current 'network footprint' and
client/server interaction methods. 

Also, to be clear, this isn't for the wireless world, its for the
current internet but at a scale that HTTP over TCP simply can't touch.

-MM

Received on Monday, 2 December 2002 20:56:45 UTC