RE: "RE: "Request for response to original XML Core WG comments"" from Cokus, Michael S. on 2009-11-02 (public-exi-comments@w3.org from November 2009)

From: Cokus, Michael S. <msc@mitre.org>
Date: Mon, 2 Nov 2009 10:02:03 -0500
To: Paul Pierce <prp@teleport.com>, EXI Comments <public-exi-comments@w3.org>
CC: "public-xml-core-wg@w3.org" <public-xml-core-wg@w3.org>
Message-ID: <D93B26E07F2DD147A3E17BA1602444B807D0CA84EB@IMCMBX2.MITRE.ORG>

> Michael,

Hi Paul,

>
> Thank you. I would like further discussion on the following two (in addition
> to the ongoing IEEE floating point discussion, where I look forward to seeing
> the relevant test results.)

Please find our responses below, in line with your comments/questions.

Just as a cross-reference, the floating point test data is discussed in a recent posting to the EXI public comments list: http://lists.w3.org/Archives/Public/public-exi-comments/2009Oct/0000.html

>
>
> > > 7) We believe that the current representation of strings has no
> > > material advantage over UTF-8, since although it uses at most 3 bytes
> > > per character, 4-byte UTF characters are very rare except in documents
> > > written in obsolete scripts.
> >
> > In our initial response we noted that a number of languages in common use are
> > represented in UTF using 4 bytes.  So we concluded that the EXI design (which
> > uses 3 bytes) would result in significant savings in size.  To our knowledge,
> > there were no further questions/responses concerning this comment.
>
> Is it possible that the languages that are inefficiently coded in UTF-8 work
> better in UTF-16? A lot of XML documents are coded in either UTF-8 or UTF-16,
> plus some heavily used programming languages use UTF string encoding natively.
> It would be very cool if EXI processors could move character data straight
> across. EXI could have a single bit to indicate either UTF-8 or UTF-16,
> corresponding to this common subset of the XML encoding declaration.
>
> If UTF-16 isn't good enough, is there another relatively simple way to import
> a subset of the XML encoding declaration into EXI in such a way that most
> characters can travel between EXI and XML or across API's without translation?

We thought John Cowan's comments on this topic were quite good:
http://lists.w3.org/Archives/Public/public-exi-comments/2009Sep/0004.html

>
> > > 8) We are strongly concerned about the concept of pluggable
> > > codecs as a barrier to interoperability, and believe that the
> > > draft should contain a strong health warning about the use of
> > > these: they should be used only in cases where there is explicit
> > > agreement between the communicating parties, and never for
> > > documents intended for consumption by a general audience.
> >
> > We agree and said as much in our initial response.  A note has been placed
> > in section 7.4 "Data Representation Map" to address this:
> >
> > http://www.w3.org/TR/2008/WD-exi-20080919/#datatypeRepresentationMap
> >
>
> I would very much like to see this pluggable codec/user datatype feature
> disappear altogether. It is already effectively present in schema and need
> not be duplicated in EXI. Leaving it to schema would make EXI more like XML
> and would be, I think, better design in having good separation of function.
>

The purpose of the Datatype Representation Map is distinct from that of the user-defined datatype feature in XML Schema.  The former provides the capability to associate a user-defined datatype *representation* with a given datatype.  In other words, Datatype Representation Map allows the user to pick how data of a given datatype is represented (i.e.,"written") in the EXI stream.  It is the mechanism by which a user can tell an EXI processor that he wants a particular encoding/compression method employed for certain types of data.

> So I guess I'm asking for a robust case for its existence, beyond the few
> cases already discussed (e.g. floating point) where the standard leans on
> user datatypes to support a standard representation in liu of the default
> EXI specific representation. What use cases require user datatypes and why
> can't they use schema? Are there other considerations?

Another good example is found in the visualization domain (X3D). An optimized serialization of X3D data can be achieved by employing application-specific compression (e.g. combining coplanar polygons, quantizing colors).  In some cases lossy compression is an acceptable option.  Supporting these types of compression goes beyond the capabilities of XML Schema.

>
> Paul

Hope addresses your questions, Paul.

Thanks,

--mike


Mike Cokus
The MITRE Corporation
757-896-8553; 757-826-8316 (fax)
903 Enterprise Parkway, Hampton, VA

Received on Monday, 2 November 2009 15:02:53 UTC