RE: "RE: "Request for response to original XML Core WG comments"" from Grosso, Paul on 2009-11-09 (public-xml-core-wg@w3.org from November 2009)

From: Grosso, Paul <pgrosso@ptc.com>
Date: Mon, 9 Nov 2009 14:45:06 -0500
To: <public-xml-core-wg@w3.org>
Message-ID: <CF83BAA719FD2C439D25CBB1C9D1D302115DC332@HQ-MAIL4.ptcnet.ptc.com>
John et al.,

Any comments on this response from the EXI WG to our comments?

paul

> -----Original Message-----
> From: public-xml-core-wg-request@w3.org [mailto:public-xml-core-wg-
> request@w3.org] On Behalf Of Cokus, Michael S.
> Sent: Monday, 2009 November 02 9:02
> To: Paul Pierce; EXI Comments
> Cc: public-xml-core-wg@w3.org
> Subject: RE: "RE: "Request for response to original XML Core WG
> comments""
> 
> > Michael,
> 
> Hi Paul,
> 
> >
> > Thank you. I would like further discussion on the following two (in
> addition
> > to the ongoing IEEE floating point discussion, where I look forward
> to seeing
> > the relevant test results.)
> 
> Please find our responses below, in line with your comments/questions.
> 
> Just as a cross-reference, the floating point test data is discussed
in
> a recent posting to the EXI public comments list:
> http://lists.w3.org/Archives/Public/public-exi-
> comments/2009Oct/0000.html
> 
> >
> >
> > > > 7) We believe that the current representation of strings has no
> > > > material advantage over UTF-8, since although it uses at most 3
> bytes
> > > > per character, 4-byte UTF characters are very rare except in
> documents
> > > > written in obsolete scripts.
> > >
> > > In our initial response we noted that a number of languages in
> common use are
> > > represented in UTF using 4 bytes.  So we concluded that the EXI
> design (which
> > > uses 3 bytes) would result in significant savings in size.  To our
> knowledge,
> > > there were no further questions/responses concerning this comment.
> >
> > Is it possible that the languages that are inefficiently coded in
> UTF-8 work
> > better in UTF-16? A lot of XML documents are coded in either UTF-8
or
> UTF-16,
> > plus some heavily used programming languages use UTF string encoding
> natively.
> > It would be very cool if EXI processors could move character data
> straight
> > across. EXI could have a single bit to indicate either UTF-8 or UTF-
> 16,
> > corresponding to this common subset of the XML encoding declaration.
> >
> > If UTF-16 isn't good enough, is there another relatively simple way
> to import
> > a subset of the XML encoding declaration into EXI in such a way that
> most
> > characters can travel between EXI and XML or across API's without
> translation?
> 
> We thought John Cowan's comments on this topic were quite good:
> http://lists.w3.org/Archives/Public/public-exi-
> comments/2009Sep/0004.html
> 
> >
> > > > 8) We are strongly concerned about the concept of pluggable
> > > > codecs as a barrier to interoperability, and believe that the
> > > > draft should contain a strong health warning about the use of
> > > > these: they should be used only in cases where there is explicit
> > > > agreement between the communicating parties, and never for
> > > > documents intended for consumption by a general audience.
> > >
> > > We agree and said as much in our initial response.  A note has
been
> placed
> > > in section 7.4 "Data Representation Map" to address this:
> > >
> > > http://www.w3.org/TR/2008/WD-exi-
> 20080919/#datatypeRepresentationMap
> > >
> >
> > I would very much like to see this pluggable codec/user datatype
> feature
> > disappear altogether. It is already effectively present in schema
and
> need
> > not be duplicated in EXI. Leaving it to schema would make EXI more
> like XML
> > and would be, I think, better design in having good separation of
> function.
> >
> 
> The purpose of the Datatype Representation Map is distinct from that
of
> the user-defined datatype feature in XML Schema.  The former provides
> the capability to associate a user-defined datatype *representation*
> with a given datatype.  In other words, Datatype Representation Map
> allows the user to pick how data of a given datatype is represented
> (i.e.,"written") in the EXI stream.  It is the mechanism by which a
> user can tell an EXI processor that he wants a particular
> encoding/compression method employed for certain types of data.
> 
> > So I guess I'm asking for a robust case for its existence, beyond
the
> few
> > cases already discussed (e.g. floating point) where the standard
> leans on
> > user datatypes to support a standard representation in liu of the
> default
> > EXI specific representation. What use cases require user datatypes
> and why
> > can't they use schema? Are there other considerations?
> 
> Another good example is found in the visualization domain (X3D). An
> optimized serialization of X3D data can be achieved by employing
> application-specific compression (e.g. combining coplanar polygons,
> quantizing colors).  In some cases lossy compression is an acceptable
> option.  Supporting these types of compression goes beyond the
> capabilities of XML Schema.
> 
> >
> > Paul
> 
> Hope addresses your questions, Paul.
> 
> Thanks,
> 
> --mike
> 
> 
> Mike Cokus
> The MITRE Corporation
> 757-896-8553; 757-826-8316 (fax)
> 903 Enterprise Parkway, Hampton, VA
Received on Monday, 9 November 2009 19:46:35 UTC