content negotiation anti-principle

Setting up Web sites which support content negotiation has become much
easier in recent years.  In part this is because of XML's ready
conversion into other forms using XSLT and tools like Batik which
convert SVG to binary formats.  XML's remarkable transformability has
also spurred the growth of frameworks like Cocoon which make setting up
this kind of application relatively simple.  These projects may in fact
be dangerous.

In rereading various W3C specifications, it appears that they
effectively dance around or ignore content negotiation issues.  The few
mechanisms which can be read as supporting content negotiation can also
easily be read as mechanisms for avoiding it.

There are several reasons for this:

1) In some cases (RDF, Namespaces in XML), URIs are used primarily as
identifiers for resources that have only a loose connection to any
particular representation.

2) In some cases (XHTML's object element, script and style elements in
XHTML and in other specs), a variety of choices are provided which
effectively offer the client the option of routing around content
negotiation by providing a list of types and URIs (or URI references)
which are likely to return representations of that particular type.  An
unexpected representation type is probably a nuisance, not precisely an
error.

3) In many cases (XML 1.0 system identifiers, XLink/XPointer, XInclude),
the developers of the specifications appear to assume that the
relationship between the URI or URI reference and the representation is
under the control of the developer, and any unfortunate results which
prove otherwise are pretty much cause to throw out errors.

It seems like the principle that emerges from these many varied
specifications - and this perhaps perverse reading of them - is that
content negotiation, while maybe something that can and does happen on
the Web, is not something that is actually integral to specifications
intended for client-side processing. Servers do weird things, and the
best way to handle this is to just let it go.  HTTP 1.1's
content-negotiation features (and the philosophical disconnect between
resources and representations) are not actually central to the operation
of the Web.  

Given this set of widely-shared beliefs and plenty of concrete
specifications, the section on "consistent representation" [1] in the
Web Architecture draft should be rewritten such that impressions of
"consistently represent the same resource" are converted into "provide a
single consistent representation of the resource at any given time."  

This means, to make this more concrete, that is worth reinforcing the
common practice whereby a URI ending in ".html" always returns HTML
(text/html) and simultaneously discouraging the use of URIs like
"http://simonstl.com/assembly" where no such association is made and I
have [an XML parts list and directions], [an SVG diagram], and [a SMIL
document exploring the assembly process] lurking behind content
negotiation.

This clears a number of pathways for client specification developers.

3) Specifications which don't care about multiple media types are free
to ignore their existence entirely.  The developer using the
specification must either control the media type used to represent
referenced resources or be able to rely on practices which ensure a
consistent media type, or errors will probably result, but at some point
those errors are unavoidable.

2) Specifications which might want to support multiple media types (or
other content features) can provide explicit paths to URIs or URI
references which have consistent media types, much like XHTML.

1) Specifications which only use resources in an abstract manner can
continue to do so, though content-negotiation can no longer be used as a
strong justification for treating resources differently from
representations.  (Change over time still offers hope on that count, of
course, as does the semi-existence of purely abstract resources.)

It also has a number of side-effects worth considering.  The one that
immediately comes to mind is that developers (including the W3C) who use
content-negotiation to provide schemas and other information about
namespaces at the namespace URI should settle on a single format
instead. This may provide yet more reason to focus on RDDL.

Perhaps in this process of removing uncertainty from URI processing we
can strike down some other pieces of uncertainty, such as questions
about processing external resources, or the relationship(s) between
XInclude, parsers, and applications.  Or perhaps we should pause for a
while, and ask whether this is a principle or anti-principle.

A few glasses of New Year's champagne may help with that discussion.

[1] - http://www.w3.org/2001/tag/2002/webarch-20021206#pr-rep-ambiguity

-- 
Simon St.Laurent
Ring around the content, a pocket full of brackets
Errors, errors, all fall down!
http://simonstl.com -- http://monasticxml.org

Received on Monday, 30 December 2002 23:28:52 UTC