[binaryXML-30] Binary Infosets


as DanC aptly pointed it out, discussions on binary infosets tend to quickly 
revolve around similar structures, leading to sterile stalemates. In order to 
help avoid that, I have summarized a little information, in part stemming from 
the recent xml-dev thread on the topic[1]. I won't go into specifics about 
existing formats, preferring instead to limit the scope of this post to what 
binary infosets are and what in my experience people expect from them.

It's Not XML

Despite the language that is colloquially used to describe them ("binary XML"), 
binary infosets are *not* XML, and no one is pretending they are. They don't 
intend to compete with XML. XML is more than just a way to serialize an infoset 
(in fact that's a backwards way of seeing it), binary infosets on the other hand 
are just that.

It just so happens that some applications only need to pass around an infoset 
and in order to do that there are cases in which a highly efficient 
serialization of an infoset is a highly desirable way , if not the only way, of 
doing it (due to resource constraints most of the time).

Typical Features

Being on the receiving end of feature requests for binary infosets, I've seen a 
relatively wide set of needs be expressed. The requirements tend to vary 
according to whether one is dealing with mobile, embedded, broadcast, web 
services, etc people, but they often overlap accross communities (if only 
because those sectors do). Here is a quick list off the top of my head:

  - Size. Binary infosets ought to be as compact as possible.

  - Speed. They should be faster to read than parsing XML is, and thus than 
generic compression of XML is.

  - Genericity. They should be applicable to any infoset. Requiring a schema is 
often OK, but should not be needed in all cases (and even given a schema, it is 
generally required that arbitrary extensions be includable without prior 

  - Memory Efficiency. It should be possible to use the binary infoset as an 
in-memory representation of a DOM (or similar) with lazy decoding of the content.

  - Streamability. It should be possible to produce a binary infoset stream that 
can be picked up at an arbitrary position and still made sense of. This 
functionality may require that the binary infoset be split up into subtree 
fragments that can be independently understood (this is easier than it usually 
sounds to people unused to stream applications).

  - Skippability. It is often desirable to skip entire subtrees either because 
you don't need them, or because you know you won't understand them, with minimal 
cost (ie without parsing the subtree).

  - Change Resilience. In the case of schema-constrained infosets, new versions 
of the schema should not require applications using the old schema information 
to be upgraded, even in case of radical change it should be possible to send the 
information to both new and old applications. This relies on the previous feature.

  - Fault Tolerance. If a fragment is lost during transmission, it should impact 
the result as little as possible. This is linked to streamability.

For different communities the above requests will have different rationales, but 
those are generally the things that I see.

It's Already Happening

People are already creating binary formats for the infoset, and either ratifying 
them as part of larger standards (MPEG, TV Anytime, 3GPP...) or using them 
within their own projects. The latter is not much of a concern to me, but I see 
a problem with the former because:

   - in a number of cases, the binary format is not used within a closed and 
controlled context, but rather in an open, Web-related situation;
   - the format is most of the time ad hoc, and limited to that vertical 
industry consortium's standard(s);
   - the format is most of the time encumbered;
   - ad hoc formats are of varying quality, to say the least.

This is progressively leading us to a balkanized situation in which I can very 
well imagine that if I wanted to send XHTML+SVG to a device, I'd have to use 
different binary encodings for XHTML and for SVG specified separately by two 
different consortia (possibly creating my own kludge to wrap both). I'll also 
have to pay royalties to do that, and it might be a poor format with serious 
issues. Chances are that making that balkanised mess interoperable will be 
difficult, when not impossible.

[1] http://lists.xml.org/archives/xml-dev/200212/threads.html#00159

Robin Berjon <robin.berjon@expway.fr>
Research Engineer, Expway
7FC0 6F5F D864 EFB8 08CE  8E74 58E6 D5DB 4889 2488

Received on Wednesday, 11 December 2002 14:47:20 UTC