- From: Vogelheim, Daniel <daniel.vogelheim@siemens.com>
- Date: Tue, 29 Jan 2008 21:49:13 +0100
- To: "John Cowan" <cowan@ccil.org>, <public-xml-core-wg@w3.org>
- Cc: <public-exi@w3.org>
Hello, > This is the XML Core WG's review of EXI WD1 (2007-07-16). [...] Many thanks for your review! We greatly appreciate it. It will take us a while to fully take in your feedback and reflect them in our documents. We'd like to offer right here some brief comments that reflect our current thinking on the issues you raised, without meaning to preempt the eventual resolutions by the WG. In terms of time frame, we just published a 2nd draft of our format specification [1]. This draft mainly finishes existing content and only partially addresses the comments and issues you raised, or other valuable feedback from the recent TPAC meeting. The subsequent draft should be more accommodating. Additionally, we have published initial drafts of two supplementary documents, the EXI Primer [2] and the EXI Best Practices [3]. We hope these will help to better explain our format and its use to the wider XML community. So here are our comments: > 0) The Core XML WG remains concerned about the whole concept of > EXI as an alternative representation of XML infosets, but does > not have consensus about whether it is a Good Thing, a Bad Thing, > or a Neutral Thing. Further comment on this fundamental point may > be forthcoming later. Thank you for your considerations. Judging from feedback we have received - particularly at the recent TPAC - we do indeed believe that this is a central point of any discussion of EXI ( & related technologies ). In this mail, we'd like to call particular attention to only the following aspect: At its core, concerns about the concept of EXI often seem centered around the perceived benefit/cost ratio of an alternative Infoset encoding. The benefits are covered elsewhere; for the cost side, please observe that EXI is expressedly intended to be used as an "opt-in" technology through content negotiation or similar techniques. For the popular use-case of XML over HTTP transmission using built-in, standard HTTP content negotiation, this would allow seamless deployment of EXI among EXI-capable participants or EXI-capable proxy servers, without any change requirements for those participants that do not wish or need to implement EXI. Such an approach should guarantee zero cost to the audience that sees no benefit in EXI for their own purposes, and should dramatically change the benefit/cost ratio in EXI's favor. We would be very interested in discussing this subject further with XML Core, the TAG, and the general public. We are looking forward to your further comments. (The EXI WG is well aware that some proponents of similar technologies have styled their respective offerings as XML replacements, which is not and has not been the standpoint of the EXI WG. We hope that the W3C WG's work will be judged on its own words and merits.) > 1) We find the draft somewhat hard to follow; in particular, > the unusual and non-standard grammar notation is not easy to > grasp at a glance; the explanation of compression should be > postponed to after the grammars section; the explanation of > event codes is very hard to follow. In the most recent draft, we have begun to rework some potentially confusing parts, e.g. the improved grammar notation. The WG will look at other opportunities to improve readability of the specifications. Additionally, we'd like to draw your attention to the EXI Primer. This is a supplementary document, which provides a more gentle introduction to the EXI format. Presumably, most interested parties will look at the Primer first. Armed with this knowledge, the EXI specification should become a lot more amendable. > 2) We believe it is essential to provide (as called out in > an editorial note) a better magic number for EXI. The current > magic number is only 2 bits long, and serves to discriminate > between EXI and XML, but not between EXI and other formats. > This should be fixed by using a 3-4 byte magic number. This is in principle agreed within the WG, but the mechnism(s) are still under discussion. There are presently several proposals under consideration that would rectify this. The proposals mainly differ in the allowed 'magic' identifier(s)', and whether and which of these identifiers would be mandatory or optional. > 3) We believe that an XML document containing xsi:type > attributes should be treated as a schema-informed document > rather than a schemaless document. This allows processes > that create a single XML document to decorate it with > xsi:type attributes and then get good compression from > an EXI encoder following in the pipeline. There is one easy work-around with the present specifcation: If the encoder is informed by an empty XML Schema, it will know about all built-in XML Schema types and will behave as you suggest. The WG has not found such documents to be common, and thus we are unsure of whether such a feature will find widespread use. The WG intends to discuss this proposal in the more general context of allowing typed encoding for schema-less documents. The evaluation will presumably depend on the expected uptake of such a feature in the community vs. the complexity it will add to the specification. > 8) We are strongly concerned about the concept of pluggable > codecs as a barrier to interoperability, and believe that the > draft should contain a strong health warning about the use of > these: they should be used only in cases where there is explicit > agreement between the communicating parties, and never for > documents intended for consumption by a general audience. The EXI WG agrees on this and has added a clarification to the latest draft. The comments 4)-7) all concern the representation of simple type content. These particular items generally allow evaluation by comparing performance of a sample implementation over our test suite, which shall be the main criteria for selecting among alternatives. We have scheduled all of the following for discussion, with 5) and 7) already being under discussion within the WG. Again, without preempting any such future evaluation or discussion, here is a list of comments on and/or reasons for the current representations: > 4) Reversing the digits when representing decimal fractions > (and fractions of seconds in the date-time datatypes) is > very unnatural. We think it is better to use a (total digits, > scale factor) pair. Thus instead of representing 12.345 as > (12,543) it would be (12345,3). This is one byte longer, > but much easier to decode properly. The WG finds it hard to quantify "very unnatural" and "much easier". Our intention is to compare performance of either method, and presumably select the simpler one when there is little difference. > 5) IEEE float representation is better on all counts than > the EXI-specific representation. It's true that some hardwares > can't process it directly, but *no* hardware can process > the current EXI representation. The IEEE float representation tends to be larger than the current variable length representation. xsd:float is often used to represent non-scientific data (e.g. a person's age), where this bears significantly. So at least in that aspect an IEEE 754 representation will be at a significant disadvantage. On the plus side, several WG members have significant interest and experience in using EXI for scientific data transmission and intend to look very closely at direct IEEE 754 encoding and evaluating the corresponding issues. > 6) The current date-time representation expresses a date as > ((years-2000), (month*31+day), hour*1440+minute*60+seconds, > reversed fractional second). However, logically years and > months can be reduced to months, and days can be reduced to > seconds, since leap seconds are ignored. We therefore propose > the following triple: ((year-2000)*12+month, > day*86400+hour*1440+minute*60+seconds scaled, scale factor). If > fraction scaling is rejected, this would become ((year-2000)*12+month, > day*86400+hour*1440+minute*60+seconds, reversed fractional second). The current representation was modeled after the various XML Schema date or time related simple types, and tries to encompass all of them. Merging several fields usually works well for a type that includes both, but not so much for one that includes only one. An example would be xsd:gMonthDay, which fits quite naturally into the EXI representation but not into the proposed one. Other adversely affected types would include xsd:gYear and xsd:gDay. Types for which the proposed scheme may work well would be xsd:duration. An initial analysis suggests that the differences between the two methods would mostly be pretty small, except maybe in the cases listed above. If time permits, we'll use the data found in the test suite to more accurately assess the two. > 7) We believe that the current representation of strings has no > material advantage over UTF-8, since although it uses at most 3 bytes > per character, 4-byte UTF characters are very rare except in documents > written in obsolete scripts. The UTF-8 design incorporates a number of features that are not of much interest in the case of EXI, such as the ability to discern whether any byte marks the beginning of a character. While for the popular ASCII characters the compactness is the same, that is not the case for other character ranges. Note that the EXI design will always do at least as well as UTF-8. E.g., there is a range of code points where EXI uses 2 bytes, versus 3 for UTF-8. Any content in such scripts would therefore be 50% larger in UTF-8 vs. current EXI. This would include the Devanagari scripts (used in several Indic languages, including Hindi), Thai, Hangul Jamo (but not Hangul syllables; Korea), Hiragana and Katakana (but not Kanji/CJK unified, Japan). The EXI WG can't endorse the rarity claim, as these scripts appear to be in daily use by easily over one billion people with little observable tendencies to obsolete any of them. Again, we'd like to thank you for your thorough review. Due to timing constraints, the recently released draft will unfortunately not reflect much of your recommendations, yet; please bear with us. We sincerely hope your attention and criticism will accompany us throughout our way towards a Recommendation. [1] EXI Format Specification, 2nd PWD: http://www.w3.org/TR/2007/WD-exi-20071219/ [2] EXI Primer, 1st PWD: http://www.w3.org/TR/2007/WD-exi-primer-20071219/ [3] EXI Best Practices, 1st PWD: http://www.w3.org/TR/2007/WD-exi-best-practices-20071219/ Yours Truly, The EXI WG
Received on Tuesday, 29 January 2008 20:55:25 UTC