RE: "Request for response to original XML Core WG comments" from Cokus, Michael S. on 2009-09-24 (public-xml-core-wg@w3.org from September 2009)

From: Cokus, Michael S. <msc@mitre.org>
Date: Thu, 24 Sep 2009 16:32:39 -0400
To: Paul Pierce <prp@teleport.com>, EXI Comments <public-exi-comments@w3.org>
CC: "public-xml-core-wg@w3.org" <public-xml-core-wg@w3.org>
Message-ID: <D93B26E07F2DD147A3E17BA1602444B807CDEC000A@IMCMBX2.MITRE.ORG>
Hello Paul,

Back in March, I promised an update on the EXI WG actions in response
to comments from XML Core.  These are discussed below.  The original
comments from XML Core are quoted, followed by the updated information.
I apologize for not following up on this sooner.


> 0) The Core XML WG remains concerned about the whole concept of
> EXI as an alternative representation of XML infosets, but does
> not have consensus about whether it is a Good Thing, a Bad Thing,
> or a Neutral Thing. Further comment on this fundamental point may
> be forthcoming later.

The EXI group's initial response indicated that EXI is intended to be used as an
"opt-in" technology.  Additionally, we wanted to note that we believe EXI is
a necessary thing.  Without a common standard, a number of diverging
approaches would create serious interoperability issues.


> 1) We find the draft somewhat hard to follow; in particular,
> the unusual and non-standard grammar notation is not easy to
> grasp at a glance; the explanation of compression should be
> postponed to after the grammars section; the explanation of
> event codes is very hard to follow.

In our initial response, we reported that work was underway to address this
comment.  We have revised the specification accordingly. The compression section
had been moved as recommended.  The use of "event code" was linked to its
definition.  Also, the definition of event code was expanded/clarified.

In addition, we would like to clarify that the grammar notation in the EXI
specification is based on common conventions used to describe Java, C#,
and JavaScript.  The primary difference is that we have added annotations
to signify EXI event codes used to represent non-terminals in the grammar.
We trust that keeping this difference in mind will make it easier to
interpret the grammar notation, as the rest of the conventions are commonly 
used in practice.


> 2) We believe it is essential to provide (as called out in
> an editorial note) a better magic number for EXI.  The current
> magic number is only 2 bits long, and serves to discriminate
> between EXI and XML, but not between EXI and other formats.
> This should be fixed by using a 3-4 byte magic number.

In the initial EXI working group response we reported that this topic was
already under discussion.  Since then, a magic number has been added to the
EXI format.  The change has been noted in http://www.w3.org/TR/2008/WD-exi-
20080919/#changes2 .


> 3) We believe that an XML document containing xsi:type
> attributes should be treated as a schema-informed document
> rather than a schemaless document.  This allows processes
> that create a single XML document to decorate it with
> xsi:type attributes and then get good compression from
> an EXI encoder following in the pipeline.

In our initial response, we suggested a temporary workaround to address this.
Since then, a more elegant means has been devised to achieve the affect
described by XML Core and we have revised the specification.  Providing an
empty schemaID indicates the EXI encoding is schema-informed, but uses no
user-defined types (i.e., uses only built-in XSD types which may be referenced
using xsi:type).


> 4) Reversing the digits when representing decimal fractions
> (and fractions of seconds in the date-time datatypes) is
> very unnatural. We think it is better to use a (total digits,
> scale factor) pair. Thus instead of representing 12.345 as
> (12,543) it would be (12345,3). This is one byte longer,
> but much easier to decode properly.

In our initial response we stated that our intension was to employ simpler
techniques when performance did not differ significantly.  In this
case, the difference in compaction is quite significant, though. In addition,
we believe that writing the digits from right to left is really no more complex
than writing from left to right.  So We have decided keep this approach
(reversed digits) because it provides better efficiency and does not increase
complexity.


> 5) IEEE float representation is better on all counts than
> the EXI-specific representation.  It's true that some hardwares
> can't process it directly, but *no* hardware can process
> the current EXI representation.

The issue of floating point encoding has involved much good discussion, and
we appreciate all the comments we have received on this important issue.  The
EXI group has run several tests and found that for the majority of EXI use
cases, EXI float has significant advantages over IEEE float. We plan to share
some test results with the public shortly.


> 6) The current date-time representation expresses a date as
> ((years-2000), (month*31+day), hour*1440+minute*60+seconds,
> reversed fractional second).  However, logically years and
> months can be reduced to months, and days can be reduced to
> seconds, since leap seconds are ignored. We therefore propose
> the following triple: ((year-2000)*12+month,
> day*86400+hour*1440+minute*60+seconds scaled, scale factor).  If
> fraction scaling is rejected, this would become ((year-2000)*12+month,
> day*86400+hour*1440+minute*60+seconds, reversed fractional second).

In our initial response we explained that the EXI date-time encoding was
modeled after the various XML Schema date and time related simple types.
Our analysis shows that the two representations are comparable regarding size.
When representing the full dateTime type, the two approaches differ by 4 bits,
with the current EXI format being smaller in the majority of cases.  So we
have decided to retain the original date-time representation for EXI, because it
is "closer" to XML Schema and has comparable (or slightly better) size performance.


> 7) We believe that the current representation of strings has no
> material advantage over UTF-8, since although it uses at most 3 bytes
> per character, 4-byte UTF characters are very rare except in documents
> written in obsolete scripts.

In our initial response we noted that a number of languages in common use are
represented in UTF using 4 bytes.  So we concluded that the EXI design (which
uses 3 bytes) would result in significant savings in size.  To our knowledge,
there were no further questions/responses concerning this comment.


> 8) We are strongly concerned about the concept of pluggable
> codecs as a barrier to interoperability, and believe that the
> draft should contain a strong health warning about the use of
> these: they should be used only in cases where there is explicit
> agreement between the communicating parties, and never for
> documents intended for consumption by a general audience.

We agree and said as much in our initial response.  A note has been placed
in section 7.4 "Data Representation Map" to address this:

http://www.w3.org/TR/2008/WD-exi-20080919/#datatypeRepresentationMap


Thanks for your interest and comments.  We hope this response has adequately
explained the working group's activities undertaken to address the comments
from XML Core.  Please let us know if you have additional questions.


Mike Cokus (for the EXI Working Group)
The MITRE Corporation
757-896-8553; 757-826-8316 (fax)
903 Enterprise Parkway
Hampton, VA 23666

>-----Original Message-----
>From: public-exi-comments-request@w3.org [mailto:public-exi-comments-request@w3.org] On Behalf Of >Cokus, Michael S.
>Sent: Sunday, March 15, 2009 5:16 PM
>To: Paul Pierce; EXI Comments
>Subject: RE: "Request for response to original XML Core WG comments"
>
>Hello Paul,
>
>Thanks much for your comments!
>
>The EXI Working Group responded publicly to the XML Core comments in January of last year [1].  >But as you noted, the group has indeed taken further action since then to address their comments.  >We are working on an update to our original response to clarify the group's actions/resolutions >(including any changes made to the EXI specification) to address the comments from XML Core.  We >expect to post the update within the next couple of weeks.
>
>Thanks again,
>
>Mike Cokus (for the EXI Working Group)
> 
>[1] http://lists.w3.org/Archives/Public/public-exi/2008Jan/0003.html
>
>Mike Cokus
>The MITRE Corporation
>757-896-8553; 757-826-8316 (fax)
>903 Enterprise Parkway
>Hampton, VA 23666
>
>>-----Original Message-----
>>From: public-exi-comments-request@w3.org [mailto:public-exi-comments-
>>request@w3.org] On Behalf Of Paul Pierce
>>Sent: Friday, February 27, 2009 3:56 PM
>>To: EXI Comments
>>Subject: "Request for response to original XML Core WG comments"
>>
>>These original comments on the first draft were acknowleged but never
>>publicly responded to as far as I can tell, so I'm incorporating them
>>here in the comments list by reference:
>>
>>XML Core WG review of Efficient XML Interchange (EXI) Format 1.0, draft
>>of 2007-07-16
>>http://lists.w3.org/Archives/Public/public-exi/2007Oct/0005.html
>>
>>I know the EXI WG discussed these long ago and incorporated a few into
>>the spec, but these comments are very important and must have a
>>substantial public response.
>>
>>In my opinion comments 2, 4, 5, 7, 8 must be more carefully considered
>>for inclusion; 2 and 8 as mandatory rather than options.
>>
>>Paul Pierce
Received on Thursday, 24 September 2009 20:33:18 UTC