- From: Cokus, Michael S. <msc@mitre.org>
- Date: Mon, 2 Nov 2009 10:02:03 -0500
- To: Paul Pierce <prp@teleport.com>, EXI Comments <public-exi-comments@w3.org>
- CC: "public-xml-core-wg@w3.org" <public-xml-core-wg@w3.org>
> Michael, Hi Paul, > > Thank you. I would like further discussion on the following two (in addition > to the ongoing IEEE floating point discussion, where I look forward to seeing > the relevant test results.) Please find our responses below, in line with your comments/questions. Just as a cross-reference, the floating point test data is discussed in a recent posting to the EXI public comments list: http://lists.w3.org/Archives/Public/public-exi-comments/2009Oct/0000.html > > > > > 7) We believe that the current representation of strings has no > > > material advantage over UTF-8, since although it uses at most 3 bytes > > > per character, 4-byte UTF characters are very rare except in documents > > > written in obsolete scripts. > > > > In our initial response we noted that a number of languages in common use are > > represented in UTF using 4 bytes. So we concluded that the EXI design (which > > uses 3 bytes) would result in significant savings in size. To our knowledge, > > there were no further questions/responses concerning this comment. > > Is it possible that the languages that are inefficiently coded in UTF-8 work > better in UTF-16? A lot of XML documents are coded in either UTF-8 or UTF-16, > plus some heavily used programming languages use UTF string encoding natively. > It would be very cool if EXI processors could move character data straight > across. EXI could have a single bit to indicate either UTF-8 or UTF-16, > corresponding to this common subset of the XML encoding declaration. > > If UTF-16 isn't good enough, is there another relatively simple way to import > a subset of the XML encoding declaration into EXI in such a way that most > characters can travel between EXI and XML or across API's without translation? We thought John Cowan's comments on this topic were quite good: http://lists.w3.org/Archives/Public/public-exi-comments/2009Sep/0004.html > > > > 8) We are strongly concerned about the concept of pluggable > > > codecs as a barrier to interoperability, and believe that the > > > draft should contain a strong health warning about the use of > > > these: they should be used only in cases where there is explicit > > > agreement between the communicating parties, and never for > > > documents intended for consumption by a general audience. > > > > We agree and said as much in our initial response. A note has been placed > > in section 7.4 "Data Representation Map" to address this: > > > > http://www.w3.org/TR/2008/WD-exi-20080919/#datatypeRepresentationMap > > > > I would very much like to see this pluggable codec/user datatype feature > disappear altogether. It is already effectively present in schema and need > not be duplicated in EXI. Leaving it to schema would make EXI more like XML > and would be, I think, better design in having good separation of function. > The purpose of the Datatype Representation Map is distinct from that of the user-defined datatype feature in XML Schema. The former provides the capability to associate a user-defined datatype *representation* with a given datatype. In other words, Datatype Representation Map allows the user to pick how data of a given datatype is represented (i.e.,"written") in the EXI stream. It is the mechanism by which a user can tell an EXI processor that he wants a particular encoding/compression method employed for certain types of data. > So I guess I'm asking for a robust case for its existence, beyond the few > cases already discussed (e.g. floating point) where the standard leans on > user datatypes to support a standard representation in liu of the default > EXI specific representation. What use cases require user datatypes and why > can't they use schema? Are there other considerations? Another good example is found in the visualization domain (X3D). An optimized serialization of X3D data can be achieved by employing application-specific compression (e.g. combining coplanar polygons, quantizing colors). In some cases lossy compression is an acceptable option. Supporting these types of compression goes beyond the capabilities of XML Schema. > > Paul Hope addresses your questions, Paul. Thanks, --mike Mike Cokus The MITRE Corporation 757-896-8553; 757-826-8316 (fax) 903 Enterprise Parkway, Hampton, VA
Received on Monday, 2 November 2009 15:02:52 UTC