- From: Robin Berjon <robin.berjon@expway.fr>
- Date: Sat, 2 Sep 2006 18:15:41 +0200
- To: www-tag@w3.org
Dear all,
this is an issue that I know has been discussed here and there, but I
don't recall it being brought up as a TAG issue proper, and I don't
believe I've seen any final resolution on the matter (if there has
been one, I haven't seen it).
I'm thinking about it largely from an XML and efficient XML
background, but I believe it applies equally to the variety of RDF
syntaxes hanging around. For context, this is how it is discussed in
the XBC Properties Document under Content Type Management[0]:
"""
The media type and encoding infrastructure provides for a common and
simple way of identifying the contents of a document and the content
coding with which it is transmitted. It is fundamental to the
functioning of the Web and enables powerful features such as content
negotiation. While required for the Web, these mechanisms are not
specific to it and are typically reused in many other situations.
It is therefore desirable that formats meant to be used on the Web
define (and preferably register) the media type and/or encoding that
one is to use when transmitting them.
There are multiple ways in which an alternate XML format could define
how media types and encodings are to be used with it. Several options
of note and their associated trade-offs are:
• The alternate XML serialization is considered to just be a
content coding. In this case it may have a media type (as gzip does
with 'application/gzip' in addition to the 'gzip' content coding) but
the principal way of using it is to keep the original media type of
the XML content and only change the content coding. The upside of
this approach is that the existing content dispatching system is
untouched, that the media type information is fully useful, and that
the content coding infrastructure is put to good use. The downside is
that there is philosophical and technical dissent as to whether an
alternate XML serialization is an encoding in the way that gzip is —a
discussion that needs to involve considerations concerning the 5.22
Roundtrip Support[1], 5.5 Directly Readable and Writable[2], and 5.16
Integratable into XML Stack[3] properties. With this approach content
negotiation is fully possible. The behaviour of fragment identifiers
does not need to be re-specified.
• The alternate XML format is not a mere content coding but
requires the definition of one or more media types. This case
subdivides into two options:
o There is only the alternate XML format's media type. Any
content sent using that format must have that media type. The upside
of this approach is that it is simple. The downside is that you lose
all media type information of the original XML content so that you
must then define another system to provide that information, or
define new media types for all possible content (application/
binxhtml, image/binsvg, etc.). With this content negotiation is
entirely impossible (or rather, totally useless) unless new media
types are defined for all things XML. The behaviour of fragment
identifiers becomes impossible to specify, or has to be re-specified
for all the new media types.
o A new media type suffix is defined in the manner that it was
done for XML content (e.g., "+bix") to be used for all content
expressed using the alternate XML serialization. The upside of this
approach is that it's simple and that the diversity of media types is
maintained. The downside is that it requires much more intrusive
modifications to systems that rely on existing media types. With this
content negotiation is possible, but with lesser power. The behaviour
of fragment identifiers has to be re-specified to map back to the one
in +xml types.
"""
In short, I think that it boils down to defining what exactly may
constitute content coding. The gzip case is simple: it reproduces the
original content byte for byte, which firmly place it in the content
coding basket.
And it's extremely tempting to just stop there: if it loses only one
byte from the original physical representation of the content, no
matter how irrelevant that byte may be, then it's not a content
coding and actually defines a new media type. Temptingly simple, but
as described above, potentially extremely impractical too.
Taking the SVG case as an example, there's a lot of information that
can be lost with no impact. First there's everything that does not
typically matter in XML. By that I don't mean genuine DM constructs
such as comments, but the parts that are normally ignored: e.g.
attribute order, white space between attributes, the difference
between empty elements. And then there's a lot specific to SVG that
can be modified with no impact either, for instance the exact syntax
of path data. Can an encoding that optimises those away still
honestly count itself as an HTTP content coding?
If adding new media types were a zero-cost operation, the solution
would be simple: just add new ones, probably using some form of +exi
suffix, yielding image/svg+exi. But it's not, there are many XML
types, and given that one of the primary goals of EXI is to disrupt
the existing ecosystem as little as possible this cost would tend to
look like a bad idea.
So the question is: if one does not go with the stringent byte-
preserving approach, where do we draw the line? Everything is an
encoding at some level but one doesn't see image/raw documents being
shuttled around with content coding JPEG. It may be that this problem
is EXI-specific, and that all that is required is for the EXI WG to
draw some guidelines if it does end up producing a format (or in fact
even if it doesn't given that people will be using efficient XML
anyway). I would find such a response satisfactory, but I would
prefer that the TAG mulled over the issue first and then told them
that that is the way to go, rather than see the EXI WG produce such
guidelines only for us all to realise later that there is actually a
more general take to be had there, and risk that the general take
contradict what the EXI folks would come up with.
Have a nice week-end!
[0] http://www.w3.org/TR/xbc-properties/#content-type-management
[1] http://www.w3.org/TR/xbc-properties/#roundtrip-support
[2] http://www.w3.org/TR/xbc-properties/#directly-readable-writable
[3] http://www.w3.org/TR/xbc-properties/#integratable-into-xml-stack
--
Robin Berjon
Senior Research Scientist
Expway, http://expway.com/
Received on Saturday, 2 September 2006 16:16:40 UTC