Re: XML Core WG review of Efficient XML Interchange (EXI) Format 1.0, draft of 2007-07-16 from Greg White on 2007-11-06 (public-xml-core-wg@w3.org from November 2007)

From: Greg White <gwhite@stanford.edu>
Date: Tue, 6 Nov 2007 12:25:26 -0500
To: John Cowan <cowan@ccil.org>
Cc: public-exi@w3.org, public-xml-core-wg@w3.org
Message-Id: <D0A3FFF7-8165-460D-B9F8-1AF1C9DCAC4A@stanford.edu>
Dear XML Core colleagues,

Firstly, thank you very much for reviewing the Efficient XML  
Interchange format specification. Just to give you an update on the  
status of our processing your remarks, we shall be discussing your  
comments at our f2f meeting during the W3C Technical Plenary week  
(your comments scheduled for this Thursday 15.30-17.00). A number of  
us have been discussing your comments among ourselves, and it seems  
the consensus so far is to agree with you on a number of your  
technical points. Please expect a considered reply, along with how we  
intend to pursue our specification document, following our f2f  
meeting. Also, please note that your remarks will be used to edit the  
other imminent publications of the working group, namely a Primer on  
the specification itself, Best Practices for users, and a document we  
are calling "Impacts", which attempts to critically weigh the impact  
of EXI on users of XML and interoperability of the Web generally.

Cheers
Greg White, for the EXI Working Group.

On Oct 25, 2007, at 5:29 PM, John Cowan wrote:

>
> This is the XML Core WG's review of EXI WD1 (2007-07-16).  Items are
> mostly in the order they appear in the draft, and do not appear in
> priority order.
>
> 0) The Core XML WG remains concerned about the whole concept of EXI  
> as an
> alternative representation of XML infosets, but does not have  
> consensus
> about whether it is a Good Thing, a Bad Thing, or a Neutral Thing.
> Further comment on this fundamental point may be forthcoming later.
>
> 1) We find the draft somewhat hard to follow; in particular, the  
> unusual
> and non-standard grammar notation is not easy to grasp at a glance;
> the explanation of compression should be postponed to after the  
> grammars
> section; the explanation of event codes is very hard to follow.
>
> 2) We believe it is essential to provide (as called out in an  
> editorial
> note) a better magic number for EXI.  The current magic number is only
> 2 bits long, and serves to discriminate between EXI and XML, but not
> between EXI and other formats.  This should be fixed by using a 3-4  
> byte
> magic number.
>
> 3) We believe that an XML document containing xsi:type attributes
> should be treated as a schema-informed document rather than a  
> schemaless
> document.  This allows processes that create a single XML document to
> decorate it with xsi:type attributes and then get good compression  
> from
> an EXI encoder following in the pipeline.
>
> 4) Reversing the digits when representing decimal fractions (and
> fractions of seconds in the date-time datatypes) is very unnatural.
> We think it is better to use a (total digits, scale factor) pair.
> Thus instead of representing 12.345 as (12,543) it would be (12345,3).
> This is one byte longer, but much easier to decode properly.
>
> 5) IEEE float representation is better on all counts than the EXI- 
> specific
> representation.  It's true that some hardwares can't process it  
> directly,
> but *no* hardware can process the current EXI representation.
>
> 6) The current date-time representation expresses a date as  
> ((years-2000),
> (month*31+day), hour*1440+minute*60+seconds, reversed fractional
> second).  However, logically years and months can be reduced to  
> months,
> and days can be reduced to seconds, since leap seconds are ignored.
> We therefore propose the following triple: ((year-2000)*12+month,
> day*86400+hour*1440+minute*60+seconds scaled, scale factor).  If
> fraction scaling is rejected, this would become ((year-2000)*12+month,
> day*86400+hour*1440+minute*60+seconds, reversed fractional second).
>
> 7) We believe that the current representation of strings has no
> material advantage over UTF-8, since although it uses at most 3 bytes
> per character, 4-byte UTF characters are very rare except in documents
> written in obsolete scripts.
>
> 8) We are strongly concerned about the concept of pluggable codecs  
> as a
> barrier to interoperability, and believe that the draft should  
> contain a
> strong health warning about the use of these: they should be used  
> only in
> cases where there is explicit agreement between the communicating  
> parties,
> and never for documents intended for consumption by a general  
> audience.
>
> -- 
> Híggledy-pìggledy / XML programmers            John Cowan
> Try to escape those / I-eighteen-N woes;        http://www.ccil.org/~cowan
> Incontrovertibly / What we need more of is      cowan@ccil.org
> Unicode weenies and / François Yergeaus.
>
Received on Tuesday, 6 November 2007 17:29:57 UTC