- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Fri, 30 Apr 2004 01:13:01 +0200
- To: member-xml-binary@w3.org
- Cc: www-archive@w3.org
Dear XML Binary Characterization Working Group, According to the minutes of the Binary Interchange Workshop, a number of people think that we should rather have one alternate encoding for XML than many. Even though that makes sense at a first sight, I beg to differ. I think that all binary formats are alternate encodings of XML or that XML is an alternate encoding for all binary formats. Or rather any format for that matter. What lacks is a bridge. I believe this is nothing really new, but I would like the XBC WG to explicitly consider whether building a good enough bridge might actually better address the valid and invalid use cases the group will find and explore. A bridge would probably be a new schema/transformation language that can be used to describe a mapping between XML markup and binary data. I have already missed such a format back in the middle of the 1990s when I reverse engineered and implemented a number of binary formats. Among these the .tie file format, mission description files for LucasArts' famous Star Wars: Tie Fighter. I tried to document what I found out in a more readable fashion than Turbo Pascal source code and looking at some prior art for other formats I came up with something like +-| Flightgroup |--------------------------------------------------+ | This contains the full description of a single FlightGroup. | +------------------------------------------------------------------+ Offset Length Type Description 0 11 (char) Name of the Flightgroup +11 1 (byte) Unknown (Seperator) +12 11 (char) Name of Pilot +23 1 (byte) Unknown (Seperator) +24 11 (char) Crafts Cargo 1 +35 1 (byte) Unknown (Seperator) +36 11 (char) Crafts Cargo 2 +47 1 (byte) Unknown (Seperator) +48 1 (byte) Special Craft +49 1 (byte) Special Craft Random byte +50 1 (byte) Type Identifier +51 1 (byte) Number of crafts in FG +52 1 (byte) FG start status +53 1 (byte) type of FG's missles +54 1 (byte) type of FG's beam weapon +55 1 (byte) IFF Identifier +56 1 (byte) AI level ... The most important byte here is at offset 50 as that byte defines the type of my craft. If you ever watched Star Wars you would probably like to avoid flying a simple Tie Fighter or Tie Interceptor as they do not have shield technology... I would have prefered a more machine readable format. A bridge would allow me to map this structure into something like <flightGroup> <name /> <pilot /> <cargo /> <specialCargo /> <specialCraft /> <randomSpecial /> <type /> <craftCount /> <status /> <missles /> <beam /> <iff /> <ai /> ... This would already have been extremely helpful when reverse engineering the format as it would probably have allowed me to get a better idea of what I already know about the format and what not... What I apparently did was generating (or writing?) files like ... 024 + 025 ¦ 026 ¦ 027 ¦ 028 ¦ 029 CARGO 2 02A ¦ 02B ¦ 02C ¦ 02D ¦ 02E + 02F 030 SPECIAL CRAFT 031 -->SET TO 01 IF SPC = RANDOM 032 TYP 033 STÄRKE 034 SITUATION 035 RAKETENTYP 036 BEAM WEAPON 037 BESITZER ... Based on what I knew about the format I wrote a tool to extract missions goals from the mission files. While you can access the primary and secondary goals during normal combat, the bonus goals are hidden which hinders serving the empire well by heading for a perfect score. Now I've got a problem. The executable binary of that problem is probably on a computer currently inaccessible for me, I have the source code but in order to compile it I would probably need to install Turbo Pascal which probably turns out to be difficult on my current operating system and it is also likely that the installation files reside on the same computer as the binary I am looking for. If I had a machine readable schema and a bin2xml tool I would not have this problem, I could write an XSLT that processes the resulting XML to list the bonus goals in an XHTML document or one that generates an SVG animation like the one you would see in the mission briefing. I could also write a Perl one-liner that just changes my craft type and use the xml2bin tool to write the data back to a .tie file. There is <http://www.wotsit.org/>, an archive of format descriptions. Hundrets of binary bitmap and vector graphic formats. Some of them in a format like illustrated above, some of them in plain english, some use source code, etc. Imagine how it would look like if there was a machine- readable mapping to markup for each format, it would be trivial to map between these formats using bin2xml, XSLT, and xml2bin. At a first sight it would sound like a bad idea to encode bitmap graphics in XML but it would allow to use XML tools on it to change some colors, add a border, rotate or mirror it, adding text from bitmap fonts my merging bitmap glyph XML documents with the rest of the image for internationalization, and so on. Another example of what I would like to do is to XQuery my mail/usenet reader's binary message database to gather statistics of my mail traffic and convert these statistics into an SVG diagram; or converting the binary message data base of my instant messager into an XHTML document to archive a discussion on my homepage. There are countless binary formats and their number grows every day. Even if it would at some point be possible to come up with a binary XML encoding that is satisfactory for many users and use cases, it will not be considered for many of the applications that use binary formats today even though it would be very valuable for some users. It would also not help with "legacy" formats, you would need specialized tools to convert these formats into XML which likely means that you need to depend on third party software or run into portability problems (your C++ might not work on a Java PDA). A bridge between binary formats and XML could also be an upgrade strategy for a number of users. Much like there already are filters for some data formats like CVS that read the data and provide a stream of SAX events or build a DOM in memory, you could use the bridge format to encode and decode the binary data and process it as if it were XML if you want while still maintaining interoperability with parties that would still like to use other means. It seems feasible to provide such a bridge as most binary formats have a sufficiently simple structure and use a limited set of common data types like 32 bit integers in network byte order for example. I believe it would also address a number of use cases. It would not really matter how many so-called binary XML encodings exist, you would have a single bridge between them. And you would incorporate infinite amounts of binary data into the XML world at low cost. A good general purpose bridge would also allow to encode arbitrary infosets into a binary format, maybe the community agrees on a single brige for this purpose, maybe they use four due to the diversity of use cases. With a bridge it does not really matter, you would have only two processors, one for XML and one for the bridge rather than say four, XML, W3C Binary XML and two common proprietary formats that evolved because W3C Binary XML did not cover their needs. You would also avoid endless discussions on what W3C Binary XML should do and what not. There are of course downsides here. A bridge processor would probably not be as lightweight as a W3C Binary XML processor, designing the format might turn out to be difficult, companies might be opposed to this approach as it could make proprietary data formats more open then desired and certain properties of binary formats like checksums, compression, and encryption would be difficult to express through markup. On the other hand, it could allow more lightweight data formats as they can be customized and a recieving agent might not need to care at all about implementing a bridge processor but rather focus on processing the binary data and the bridge only ensures interoperability and simple access to the data. You could in fact generate parsers from the bridge document which supposedly helps portability as you could easily generate a JavaScript processor for use from SVG images and a C++ processor for your data center backend. Well, as I've said, I am sure that there is prior art in this direction and some discussions on this matter suggest that efforts might already be heading towards something more like a bridge rather than than just a, say, binary encoding of the PSVI and as I think incorporating existing and future binary content into the XML world is a strong use case for anything that involves "binary" and "XML", I think that this slightly different perspective on your charter should be explicitly mentioned. I thought about posting this to public-xml-binary@w3.org but the list is not mentioned on the public group web site and there appears to be no public information on the scope of the list, and I rather avoid the complaints if this is just for announcements or somesuch. It would be good if you link the list from the group's page and say something about the scope of the list on that page and the archive web site. Note that I copied www-archive@w3.org which is a publicly archived mailing list. regards.
Received on Thursday, 29 April 2004 19:13:21 UTC