Yet another perspective on EXI from Henry S. Thompson on 2007-11-05 (public-xml-core-wg@w3.org from November 2007)

Forwarded message 1

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Fri, 02 Nov 2007 12:17:07 +0000
Subject: Yet another perspective on EXI
To: w3t-arch@w3.org
Message-ID: <f5bk5p03kqk.fsf@hildegard.inf.ed.ac.uk>
Archived-At: <http://www.w3.org/mid/f5bk5p03kqk.fsf@hildegard.inf.ed.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

There are a number of existing classes of 'languages' which might
provide the right way to look at EXI:

 1) Authoring languages, with media types:    XML (and any +xml friends),
                                              HTML, CSS, postscript, even troff

 2) Application-specific persistence/transfer
                 languages, with media types: PDF, MS Word, Mathematica

 3) General-purpose compression schemes:      gzip, deflate, zip, bz

 4) General-purpose persistence/transfer
          languages, _some_ with media types: ASN.1, JSON, Java Object
                                              Serialization, pickle

 5) Audio/image/video encoding languages,
          with media types:                   MPEG, MP3, OGG, PNG

So what can we eliminate?  (5) is obviously out, but I guess a case
could be made for any of the other 4.  Taking the charset route
suggests (1), taking the Content-Encoding route suggests (3), I
thought seriously about (4) for a while, when I was focussed on the
infoset transfer aspect, but in fact I think, perhaps somewhat
surprisingly, that the right answer is (2).

Whereas the relevant precedent for thinking about (3) is SVGZ, the
relevant precedent for (2) is Postscript and PDF.  Without too much
violence to the facts, we can say that EXI is to XML as PDF is to
Postscript---a distillation of the same object model into an opaque
and more efficient transfer/archival format.

Let's look at the practical consequences of the three plausible
options (leaving out (4)):

                 (1)               (2)             (3)
               
Precedent        XML               PDF            SVGZ

Media Type  application/xml  application/exi   application/xml

charset          x.exi             N/A           ad lib.

Content-Encoding  N/A              N/A            x.exi

Magic number
     <?xml.*encoding=.x\.exi.*?>   TBD             TBD

Does anyone else find this compelling, or even worth considering?  It
has the tremendous advantage that it might just be seen as a win for
all sides: it decouples EXI from XML enough to keep the XML folk
happy, without I hope decoupling it so much as to lose the people who
want EXI in the first place.  It's an endorsement of the value of the
infoset, without compromising the value of the existing serialisation.

If so, I think maybe a name change really _is_ the right was to go:
Efficient eXchange of Infosets.

ht
- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFHKxVHkjnJixAXWBoRAtXmAJsEE1JW/02DLziu0A+/5w3SCMytYQCdE7yn
Db1UlZIfJm7SStSp+HiqSGY=
=OTYZ
-----END PGP SIGNATURE-----