Binary Babel Bridge

Dear XML Binary Characterization Working Group,

  According to the minutes of the Binary Interchange Workshop, a number
of people think that we should rather have one alternate encoding for
XML than many. Even though that makes sense at a first sight, I beg to
differ. I think that all binary formats are alternate encodings of XML
or that XML is an alternate encoding for all binary formats. Or rather
any format for that matter.

What lacks is a bridge. I believe this is nothing really new, but I
would like the XBC WG to explicitly consider whether building a good
enough bridge might actually better address the valid and invalid use
cases the group will find and explore.

A bridge would probably be a new schema/transformation language that
can be used to describe a mapping between XML markup and binary data.
I have already missed such a format back in the middle of the 1990s
when I reverse engineered and implemented a number of binary formats.
Among these the .tie file format, mission description files for
LucasArts' famous Star Wars: Tie Fighter. I tried to document what I
found out in a more readable fashion than Turbo Pascal source code and
looking at some prior art for other formats I came up with something
like

  +-| Flightgroup |--------------------------------------------------+
  | This contains the full description of a single FlightGroup.      |
  +------------------------------------------------------------------+
  Offset Length  Type   Description

      0      11  (char) Name of the Flightgroup
    +11       1  (byte) Unknown (Seperator)
    +12      11  (char) Name of Pilot
    +23       1  (byte) Unknown (Seperator)
    +24      11  (char) Crafts Cargo 1
    +35       1  (byte) Unknown (Seperator)
    +36      11  (char) Crafts Cargo 2
    +47       1  (byte) Unknown (Seperator)
    +48       1  (byte) Special Craft
    +49       1  (byte) Special Craft Random byte
    +50       1  (byte) Type Identifier
    +51       1  (byte) Number of crafts in FG
    +52       1  (byte) FG start status
    +53       1  (byte) type of FG's missles
    +54       1  (byte) type of FG's beam weapon
    +55       1  (byte) IFF Identifier
    +56       1  (byte) AI level
  ...

The most important byte here is at offset 50 as that byte defines the
type of my craft. If you ever watched Star Wars you would probably like
to avoid flying a simple Tie Fighter or Tie Interceptor as they do not
have shield technology... I would have prefered a more machine readable
format. A bridge would allow me to map this structure into something
like

  <flightGroup>
    <name          />
    <pilot         />
    <cargo         />
    <specialCargo  />
    <specialCraft  />
    <randomSpecial />
    <type          />
    <craftCount    />
    <status        />
    <missles       />
    <beam          />
    <iff           />
    <ai            />
    ...

This would already have been extremely helpful when reverse engineering
the format as it would probably have allowed me to get a better idea of
what I already know about the format and what not... What I apparently
did was generating (or writing?) files like

  ...
  024 +
  025 ¦
  026 ¦
  027 ¦
  028 ¦
  029 CARGO 2
  02A ¦
  02B ¦
  02C ¦
  02D ¦
  02E +
  02F
  030 SPECIAL CRAFT
  031 -->SET TO 01 IF SPC = RANDOM
  032 TYP
  033 STÄRKE
  034 SITUATION
  035 RAKETENTYP
  036 BEAM WEAPON
  037 BESITZER
  ...

Based on what I knew about the format I wrote a tool to extract missions
goals from the mission files. While you can access the primary and
secondary goals during normal combat, the bonus goals are hidden which
hinders serving the empire well by heading for a perfect score. Now I've
got a problem. The executable binary of that problem is probably on a
computer currently inaccessible for me, I have the source code but in
order to compile it I would probably need to install Turbo Pascal which
probably turns out to be difficult on my current operating system and it
is also likely that the installation files reside on the same computer
as the binary I am looking for. If I had a machine readable schema and a
bin2xml tool I would not have this problem, I could write an XSLT that
processes the resulting XML to list the bonus goals in an XHTML document
or one that generates an SVG animation like the one you would see in the
mission briefing. I could also write a Perl one-liner that just changes
my craft type and use the xml2bin tool to write the data back to a .tie
file.

There is <http://www.wotsit.org/>, an archive of format descriptions.
Hundrets of binary bitmap and vector graphic formats. Some of them in a
format like illustrated above, some of them in plain english, some use
source code, etc. Imagine how it would look like if there was a machine-
readable mapping to markup for each format, it would be trivial to map
between these formats using bin2xml, XSLT, and xml2bin. At a first sight
it would sound like a bad idea to encode bitmap graphics in XML but it
would allow to use XML tools on it to change some colors, add a border,
rotate or mirror it, adding text from bitmap fonts my merging bitmap
glyph XML documents with the rest of the image for internationalization,
and so on.

Another example of what I would like to do is to XQuery my mail/usenet
reader's binary message database to gather statistics of my mail traffic
and convert these statistics into an SVG diagram; or converting the
binary message data base of my instant messager into an XHTML document
to archive a discussion on my homepage.

There are countless binary formats and their number grows every day.
Even if it would at some point be possible to come up with a binary XML
encoding that is satisfactory for many users and use cases, it will not
be considered for many of the applications that use binary formats today
even though it would be very valuable for some users. It would also not
help with "legacy" formats, you would need specialized tools to convert
these formats into XML which likely means that you need to depend on
third party software or run into portability problems (your C++ might
not work on a Java PDA).

A bridge between binary formats and XML could also be an upgrade
strategy for a number of users. Much like there already are filters for
some data formats like CVS that read the data and provide a stream of
SAX events or build a DOM in memory, you could use the bridge format to
encode and decode the binary data and process it as if it were XML if
you want while still maintaining interoperability with parties that
would still like to use other means.

It seems feasible to provide such a bridge as most binary formats have a
sufficiently simple structure and use a limited set of common data types
like 32 bit integers in network byte order for example. I believe it
would also address a number of use cases. It would not really matter how
many so-called binary XML encodings exist, you would have a single
bridge between them. And you would incorporate infinite amounts of
binary data into the XML world at low cost.

A good general purpose bridge would also allow to encode arbitrary
infosets into a binary format, maybe the community agrees on a single
brige for this purpose, maybe they use four due to the diversity of use
cases. With a bridge it does not really matter, you would have only two
processors, one for XML and one for the bridge rather than say four,
XML, W3C Binary XML and two common proprietary formats that evolved
because W3C Binary XML did not cover their needs. You would also avoid
endless discussions on what W3C Binary XML should do and what not.

There are of course downsides here. A bridge processor would probably
not be as lightweight as a W3C Binary XML processor, designing the
format might turn out to be difficult, companies might be opposed to
this approach as it could make proprietary data formats more open then
desired and certain properties of binary formats like checksums,
compression, and encryption would be difficult to express through
markup.

On the other hand, it could allow more lightweight data formats as they
can be customized and a recieving agent might not need to care at all
about implementing a bridge processor but rather focus on processing the
binary data and the bridge only ensures interoperability and simple
access to the data. You could in fact generate parsers from the bridge
document which supposedly helps portability as you could easily generate
a JavaScript processor for use from SVG images and a C++ processor for
your data center backend.

Well, as I've said, I am sure that there is prior art in this direction
and some discussions on this matter suggest that efforts might already
be heading towards something more like a bridge rather than than just a,
say, binary encoding of the PSVI and as I think incorporating existing
and future binary content into the XML world is a strong use case for
anything that involves "binary" and "XML", I think that this slightly
different perspective on your charter should be explicitly mentioned.

I thought about posting this to public-xml-binary@w3.org but the list is
not mentioned on the public group web site and there appears to be no
public information on the scope of the list, and I rather avoid the
complaints if this is just for announcements or somesuch. It would be
good if you link the list from the group's page and say something about
the scope of the list on that page and the archive web site. Note that I
copied www-archive@w3.org which is a publicly archived mailing list.

regards.

Received on Thursday, 29 April 2004 19:13:21 UTC