- From: Robin Berjon <robin.berjon@expway.fr>
- Date: Sat, 2 Sep 2006 18:15:41 +0200
- To: www-tag@w3.org
Dear all, this is an issue that I know has been discussed here and there, but I don't recall it being brought up as a TAG issue proper, and I don't believe I've seen any final resolution on the matter (if there has been one, I haven't seen it). I'm thinking about it largely from an XML and efficient XML background, but I believe it applies equally to the variety of RDF syntaxes hanging around. For context, this is how it is discussed in the XBC Properties Document under Content Type Management[0]: """ The media type and encoding infrastructure provides for a common and simple way of identifying the contents of a document and the content coding with which it is transmitted. It is fundamental to the functioning of the Web and enables powerful features such as content negotiation. While required for the Web, these mechanisms are not specific to it and are typically reused in many other situations. It is therefore desirable that formats meant to be used on the Web define (and preferably register) the media type and/or encoding that one is to use when transmitting them. There are multiple ways in which an alternate XML format could define how media types and encodings are to be used with it. Several options of note and their associated trade-offs are: • The alternate XML serialization is considered to just be a content coding. In this case it may have a media type (as gzip does with 'application/gzip' in addition to the 'gzip' content coding) but the principal way of using it is to keep the original media type of the XML content and only change the content coding. The upside of this approach is that the existing content dispatching system is untouched, that the media type information is fully useful, and that the content coding infrastructure is put to good use. The downside is that there is philosophical and technical dissent as to whether an alternate XML serialization is an encoding in the way that gzip is —a discussion that needs to involve considerations concerning the 5.22 Roundtrip Support[1], 5.5 Directly Readable and Writable[2], and 5.16 Integratable into XML Stack[3] properties. With this approach content negotiation is fully possible. The behaviour of fragment identifiers does not need to be re-specified. • The alternate XML format is not a mere content coding but requires the definition of one or more media types. This case subdivides into two options: o There is only the alternate XML format's media type. Any content sent using that format must have that media type. The upside of this approach is that it is simple. The downside is that you lose all media type information of the original XML content so that you must then define another system to provide that information, or define new media types for all possible content (application/ binxhtml, image/binsvg, etc.). With this content negotiation is entirely impossible (or rather, totally useless) unless new media types are defined for all things XML. The behaviour of fragment identifiers becomes impossible to specify, or has to be re-specified for all the new media types. o A new media type suffix is defined in the manner that it was done for XML content (e.g., "+bix") to be used for all content expressed using the alternate XML serialization. The upside of this approach is that it's simple and that the diversity of media types is maintained. The downside is that it requires much more intrusive modifications to systems that rely on existing media types. With this content negotiation is possible, but with lesser power. The behaviour of fragment identifiers has to be re-specified to map back to the one in +xml types. """ In short, I think that it boils down to defining what exactly may constitute content coding. The gzip case is simple: it reproduces the original content byte for byte, which firmly place it in the content coding basket. And it's extremely tempting to just stop there: if it loses only one byte from the original physical representation of the content, no matter how irrelevant that byte may be, then it's not a content coding and actually defines a new media type. Temptingly simple, but as described above, potentially extremely impractical too. Taking the SVG case as an example, there's a lot of information that can be lost with no impact. First there's everything that does not typically matter in XML. By that I don't mean genuine DM constructs such as comments, but the parts that are normally ignored: e.g. attribute order, white space between attributes, the difference between empty elements. And then there's a lot specific to SVG that can be modified with no impact either, for instance the exact syntax of path data. Can an encoding that optimises those away still honestly count itself as an HTTP content coding? If adding new media types were a zero-cost operation, the solution would be simple: just add new ones, probably using some form of +exi suffix, yielding image/svg+exi. But it's not, there are many XML types, and given that one of the primary goals of EXI is to disrupt the existing ecosystem as little as possible this cost would tend to look like a bad idea. So the question is: if one does not go with the stringent byte- preserving approach, where do we draw the line? Everything is an encoding at some level but one doesn't see image/raw documents being shuttled around with content coding JPEG. It may be that this problem is EXI-specific, and that all that is required is for the EXI WG to draw some guidelines if it does end up producing a format (or in fact even if it doesn't given that people will be using efficient XML anyway). I would find such a response satisfactory, but I would prefer that the TAG mulled over the issue first and then told them that that is the way to go, rather than see the EXI WG produce such guidelines only for us all to realise later that there is actually a more general take to be had there, and risk that the general take contradict what the EXI folks would come up with. Have a nice week-end! [0] http://www.w3.org/TR/xbc-properties/#content-type-management [1] http://www.w3.org/TR/xbc-properties/#roundtrip-support [2] http://www.w3.org/TR/xbc-properties/#directly-readable-writable [3] http://www.w3.org/TR/xbc-properties/#integratable-into-xml-stack -- Robin Berjon Senior Research Scientist Expway, http://expway.com/
Received on Saturday, 2 September 2006 16:16:40 UTC