RE: DRAFT XML Core WG review of Efficient XML Interchange (EXI) Format 1.0 from Grosso, Paul on 2007-08-20 (public-xml-core-wg@w3.org from August 2007)

From: Grosso, Paul <pgrosso@ptc.com>
Date: Mon, 20 Aug 2007 15:10:56 -0400
To: <public-xml-core-wg@w3.org>
Message-ID: <CF83BAA719FD2C439D25CBB1C9D1D30208734AED@HQ-MAIL4.ptcnet.ptc.com>
Interesting and most likely worthwhile comments.

But what about the bigger picture.  I assume EXI is
a (different from XML 1.x) way to serialize an XML
infoset--is this correct?  Does EXI describe the same
set of infosets as XML 1.x?

Is EXI something the XML Core WG should support, allow,
or argue against?  

paul


> -----Original Message-----
> From: public-xml-core-wg-request@w3.org 
> [mailto:public-xml-core-wg-request@w3.org] On Behalf Of John Cowan
> Sent: Monday, 2007 August 20 12:39
> To: public-xml-core-wg@w3.org
> Subject: DRAFT XML Core WG review of Efficient XML 
> Interchange (EXI) Format 1.0
> 
> 
> [I have written this draft in the first person plural, but its current
> state reflects John Cowan's views only.]
> 
> This is the XML Core WG's review of EXI WD1 (2007-07-16).  Items are
> mostly in the order they appear in the draft, and do not appear in
> priority order.
> 
> 1) We find the draft somewhat hard to follow; in particular, 
> the unusual
> and non-standard grammar notation is not easy to grasp at a glance;
> the explanation of compression should be postponed to after 
> the grammars
> section; the explanation of event codes is very hard to follow.
> 
> 2) We believe it is essential to provide (as called out in an 
> editorial
> note) a better magic number for EXI.  The current magic number is only
> 2 bits long, and serves to discriminate between EXI and XML, but not
> between EXI and other formats.  This should be fixed by using 
> a 3-4 byte
> magic number.
> 
> 3) We believe that an XML document containing xsi:type attributes
> should be treated as a schema-informed document rather than a 
> schemaless
> document.  This allows processes that create a single XML document to
> decorate it with xsi:type attributes and then get good results from an
> EXI encoder following in the pipeline.
> 
> 4) Reversing the digits when representing decimal fractions (and
> fractions of seconds in the date-time datatypes) is very unnatural.
> We think it is better to use a (total digits, scale factor) pair.
> Thus instead of representing 12.345 as (12,543) it would be (12345,3).
> This is one byte longer, but much easier to decode properly.
> 
> 5) IEEE float representation is better on all counts than the 
> specialized
> representation.  It's true that some hardwares can't process 
> it directly,
> but *no* hardware can process the current EXI representation.
> 
> 6) The current date-time representation expresses a date as 
> ((years-2000),
> (month*31+day), hour*1440+minute*60+seconds, reversed fractional
> second).  However, logically years and months can be reduced 
> to months,
> and days can be reduced to seconds, since leap seconds are ignored.
> We therefore propose the following triple: ((year-2000)*12+month,
> day*86400+hour*1440+minute*60+seconds scaled, scale factor).  If
> fraction scaling is rejected, this would become ((year-2000)*12+month,
> day*86400+hour*1440+minute*60+seconds, reversed fractional second).
> 
> 7) We believe that the current representation of strings has no
> material advantage over UTF-8, since although it uses at most 3 bytes
> per character, 4-byte UTF characters are very rare except in documents
> written in obsolete scripts.
> 
> [This discharges my action.]
> 
> -- 
> Híggledy-pìggledy / XML programmers            John Cowan
> Try to escape those / I-eighteen-N woes;        
> http://www.ccil.org/~cowan
> Incontrovertibly / What we need more of is      cowan@ccil.org
> Unicode weenies and / François Yergeaus.
>
Received on Monday, 20 August 2007 19:11:14 UTC