- From: John Cowan <cowan@ccil.org>
- Date: Thu, 25 Oct 2007 17:29:20 -0400
- To: public-exi@w3.org
- Cc: public-xml-core-wg@w3.org
This is the XML Core WG's review of EXI WD1 (2007-07-16). Items are mostly in the order they appear in the draft, and do not appear in priority order. 0) The Core XML WG remains concerned about the whole concept of EXI as an alternative representation of XML infosets, but does not have consensus about whether it is a Good Thing, a Bad Thing, or a Neutral Thing. Further comment on this fundamental point may be forthcoming later. 1) We find the draft somewhat hard to follow; in particular, the unusual and non-standard grammar notation is not easy to grasp at a glance; the explanation of compression should be postponed to after the grammars section; the explanation of event codes is very hard to follow. 2) We believe it is essential to provide (as called out in an editorial note) a better magic number for EXI. The current magic number is only 2 bits long, and serves to discriminate between EXI and XML, but not between EXI and other formats. This should be fixed by using a 3-4 byte magic number. 3) We believe that an XML document containing xsi:type attributes should be treated as a schema-informed document rather than a schemaless document. This allows processes that create a single XML document to decorate it with xsi:type attributes and then get good compression from an EXI encoder following in the pipeline. 4) Reversing the digits when representing decimal fractions (and fractions of seconds in the date-time datatypes) is very unnatural. We think it is better to use a (total digits, scale factor) pair. Thus instead of representing 12.345 as (12,543) it would be (12345,3). This is one byte longer, but much easier to decode properly. 5) IEEE float representation is better on all counts than the EXI-specific representation. It's true that some hardwares can't process it directly, but *no* hardware can process the current EXI representation. 6) The current date-time representation expresses a date as ((years-2000), (month*31+day), hour*1440+minute*60+seconds, reversed fractional second). However, logically years and months can be reduced to months, and days can be reduced to seconds, since leap seconds are ignored. We therefore propose the following triple: ((year-2000)*12+month, day*86400+hour*1440+minute*60+seconds scaled, scale factor). If fraction scaling is rejected, this would become ((year-2000)*12+month, day*86400+hour*1440+minute*60+seconds, reversed fractional second). 7) We believe that the current representation of strings has no material advantage over UTF-8, since although it uses at most 3 bytes per character, 4-byte UTF characters are very rare except in documents written in obsolete scripts. 8) We are strongly concerned about the concept of pluggable codecs as a barrier to interoperability, and believe that the draft should contain a strong health warning about the use of these: they should be used only in cases where there is explicit agreement between the communicating parties, and never for documents intended for consumption by a general audience. -- Híggledy-pìggledy / XML programmers John Cowan Try to escape those / I-eighteen-N woes; http://www.ccil.org/~cowan Incontrovertibly / What we need more of is cowan@ccil.org Unicode weenies and / François Yergeaus.
Received on Thursday, 25 October 2007 21:29:32 UTC