DRAFT XML Core WG review of Efficient XML Interchange (EXI) Format 1.0

[I have written this draft in the first person plural, but its current
state reflects John Cowan's views only.]

This is the XML Core WG's review of EXI WD1 (2007-07-16).  Items are
mostly in the order they appear in the draft, and do not appear in
priority order.

1) We find the draft somewhat hard to follow; in particular, the unusual
and non-standard grammar notation is not easy to grasp at a glance;
the explanation of compression should be postponed to after the grammars
section; the explanation of event codes is very hard to follow.

2) We believe it is essential to provide (as called out in an editorial
note) a better magic number for EXI.  The current magic number is only
2 bits long, and serves to discriminate between EXI and XML, but not
between EXI and other formats.  This should be fixed by using a 3-4 byte
magic number.

3) We believe that an XML document containing xsi:type attributes
should be treated as a schema-informed document rather than a schemaless
document.  This allows processes that create a single XML document to
decorate it with xsi:type attributes and then get good results from an
EXI encoder following in the pipeline.

4) Reversing the digits when representing decimal fractions (and
fractions of seconds in the date-time datatypes) is very unnatural.
We think it is better to use a (total digits, scale factor) pair.
Thus instead of representing 12.345 as (12,543) it would be (12345,3).
This is one byte longer, but much easier to decode properly.

5) IEEE float representation is better on all counts than the specialized
representation.  It's true that some hardwares can't process it directly,
but *no* hardware can process the current EXI representation.

6) The current date-time representation expresses a date as ((years-2000),
(month*31+day), hour*1440+minute*60+seconds, reversed fractional
second).  However, logically years and months can be reduced to months,
and days can be reduced to seconds, since leap seconds are ignored.
We therefore propose the following triple: ((year-2000)*12+month,
day*86400+hour*1440+minute*60+seconds scaled, scale factor).  If
fraction scaling is rejected, this would become ((year-2000)*12+month,
day*86400+hour*1440+minute*60+seconds, reversed fractional second).

7) We believe that the current representation of strings has no
material advantage over UTF-8, since although it uses at most 3 bytes
per character, 4-byte UTF characters are very rare except in documents
written in obsolete scripts.

[This discharges my action.]

-- 
Híggledy-pìggledy / XML programmers            John Cowan
Try to escape those / I-eighteen-N woes;        http://www.ccil.org/~cowan
Incontrovertibly / What we need more of is      cowan@ccil.org
Unicode weenies and / François Yergeaus.

Received on Monday, 20 August 2007 17:38:58 UTC