XML Core WG review of Efficient XML Interchange (EXI) Format 1.0, draft of 2007-07-16

This is the XML Core WG's review of EXI WD1 (2007-07-16).  Items are
mostly in the order they appear in the draft, and do not appear in
priority order.

0) The Core XML WG remains concerned about the whole concept of EXI as an
alternative representation of XML infosets, but does not have consensus
about whether it is a Good Thing, a Bad Thing, or a Neutral Thing.
Further comment on this fundamental point may be forthcoming later.

1) We find the draft somewhat hard to follow; in particular, the unusual
and non-standard grammar notation is not easy to grasp at a glance;
the explanation of compression should be postponed to after the grammars
section; the explanation of event codes is very hard to follow.

2) We believe it is essential to provide (as called out in an editorial
note) a better magic number for EXI.  The current magic number is only
2 bits long, and serves to discriminate between EXI and XML, but not
between EXI and other formats.  This should be fixed by using a 3-4 byte
magic number.

3) We believe that an XML document containing xsi:type attributes
should be treated as a schema-informed document rather than a schemaless
document.  This allows processes that create a single XML document to
decorate it with xsi:type attributes and then get good compression from
an EXI encoder following in the pipeline.

4) Reversing the digits when representing decimal fractions (and
fractions of seconds in the date-time datatypes) is very unnatural.
We think it is better to use a (total digits, scale factor) pair.
Thus instead of representing 12.345 as (12,543) it would be (12345,3).
This is one byte longer, but much easier to decode properly.

5) IEEE float representation is better on all counts than the EXI-specific
representation.  It's true that some hardwares can't process it directly,
but *no* hardware can process the current EXI representation.

6) The current date-time representation expresses a date as ((years-2000),
(month*31+day), hour*1440+minute*60+seconds, reversed fractional
second).  However, logically years and months can be reduced to months,
and days can be reduced to seconds, since leap seconds are ignored.
We therefore propose the following triple: ((year-2000)*12+month,
day*86400+hour*1440+minute*60+seconds scaled, scale factor).  If
fraction scaling is rejected, this would become ((year-2000)*12+month,
day*86400+hour*1440+minute*60+seconds, reversed fractional second).

7) We believe that the current representation of strings has no
material advantage over UTF-8, since although it uses at most 3 bytes
per character, 4-byte UTF characters are very rare except in documents
written in obsolete scripts.

8) We are strongly concerned about the concept of pluggable codecs as a
barrier to interoperability, and believe that the draft should contain a
strong health warning about the use of these: they should be used only in
cases where there is explicit agreement between the communicating parties,
and never for documents intended for consumption by a general audience.

-- 
Híggledy-pìggledy / XML programmers            John Cowan
Try to escape those / I-eighteen-N woes;        http://www.ccil.org/~cowan
Incontrovertibly / What we need more of is      cowan@ccil.org
Unicode weenies and / François Yergeaus.

Received on Thursday, 25 October 2007 21:29:32 UTC