RE: DRAFT XML Core WG review of Efficient XML Interchange (EXI) Format 1.0

Interesting and most likely worthwhile comments.

But what about the bigger picture.  I assume EXI is
a (different from XML 1.x) way to serialize an XML
infoset--is this correct?  Does EXI describe the same
set of infosets as XML 1.x?

Is EXI something the XML Core WG should support, allow,
or argue against?  


> -----Original Message-----
> From: 
> [] On Behalf Of John Cowan
> Sent: Monday, 2007 August 20 12:39
> To:
> Subject: DRAFT XML Core WG review of Efficient XML 
> Interchange (EXI) Format 1.0
> [I have written this draft in the first person plural, but its current
> state reflects John Cowan's views only.]
> This is the XML Core WG's review of EXI WD1 (2007-07-16).  Items are
> mostly in the order they appear in the draft, and do not appear in
> priority order.
> 1) We find the draft somewhat hard to follow; in particular, 
> the unusual
> and non-standard grammar notation is not easy to grasp at a glance;
> the explanation of compression should be postponed to after 
> the grammars
> section; the explanation of event codes is very hard to follow.
> 2) We believe it is essential to provide (as called out in an 
> editorial
> note) a better magic number for EXI.  The current magic number is only
> 2 bits long, and serves to discriminate between EXI and XML, but not
> between EXI and other formats.  This should be fixed by using 
> a 3-4 byte
> magic number.
> 3) We believe that an XML document containing xsi:type attributes
> should be treated as a schema-informed document rather than a 
> schemaless
> document.  This allows processes that create a single XML document to
> decorate it with xsi:type attributes and then get good results from an
> EXI encoder following in the pipeline.
> 4) Reversing the digits when representing decimal fractions (and
> fractions of seconds in the date-time datatypes) is very unnatural.
> We think it is better to use a (total digits, scale factor) pair.
> Thus instead of representing 12.345 as (12,543) it would be (12345,3).
> This is one byte longer, but much easier to decode properly.
> 5) IEEE float representation is better on all counts than the 
> specialized
> representation.  It's true that some hardwares can't process 
> it directly,
> but *no* hardware can process the current EXI representation.
> 6) The current date-time representation expresses a date as 
> ((years-2000),
> (month*31+day), hour*1440+minute*60+seconds, reversed fractional
> second).  However, logically years and months can be reduced 
> to months,
> and days can be reduced to seconds, since leap seconds are ignored.
> We therefore propose the following triple: ((year-2000)*12+month,
> day*86400+hour*1440+minute*60+seconds scaled, scale factor).  If
> fraction scaling is rejected, this would become ((year-2000)*12+month,
> day*86400+hour*1440+minute*60+seconds, reversed fractional second).
> 7) We believe that the current representation of strings has no
> material advantage over UTF-8, since although it uses at most 3 bytes
> per character, 4-byte UTF characters are very rare except in documents
> written in obsolete scripts.
> [This discharges my action.]
> -- 
> Híggledy-pìggledy / XML programmers            John Cowan
> Try to escape those / I-eighteen-N woes;        
> Incontrovertibly / What we need more of is
> Unicode weenies and / François Yergeaus.

Received on Monday, 20 August 2007 19:11:14 UTC