Re: [nsMediaType-3] Principles and corner cases

At the TAG telecon this morning, I agreed to do a write-up on
some of the issues around [nsMediaType-3], including core
issues and problematic corner cases.

I believe the TAG probably has consensus on these principles:

P1. Media types are an important part of the web architecture;
dispatching on them, when possible, is efficient and robust
and well-understood.

P2. When processing XML resources, dispatching to software
modules on the basis of namespaces is desirable and correct
behavior.

P3. Clearly in many cases context matters: you can't in the
general case reach into the middle of a resource and
safely process some element based only on its namespace.

P4. The namespace of the root element of an XML resource
has a special status, if only because it provides the 
outermost level of context.

Agreeing on all this doesn't make the problems go away.
Here are some that arise - maybe they're corner/pathological
cases that can be overlooked, but they should be considered:

C1. As demonstrated by the example of XSLT on this list, 
the namespace of a root element can be misleading.  It has 
been suggested that the same problem is likely to show up in
XQuery.

C2. Namespace processing obviously becomes more relevant 
in the case where the resource is served as text/xml or
application/xml.  There is currently no consensus as to
whether or when it's desirable to serve resources with
either of these media types.

C3. The issue has been raised of whether MIME headers
or media types are useful in signaling the makeup of
XML resources which contain markup from multiple
namespaces.  There's no consensus on this issue.

C4. There is the possibility of inconsistency between the
media type and what the namespace says.  This is a specific
case of a more general problem of what happens when there's
an inconsistency between any of the MIME headers and
anything about the document content.  Here are three examples
that illustrate both the general and specific problem:

- simple obvious inconsistency, e.g. a server sends a resource
  with media type text/xhtml+xml, but the root element has a namespace 
  declaration saying it's SVG
- a slight variation where the resource in the SVG namespace is
  sent with a media-type of application/xml.
- certain browsers have been known to sniff into resource content
  and decide to render as HTML [or not] based on whether 
  there's an internal subset, or whether the first few hundred
  bytes have tags that "look like" HTML.
- there is the whole isssue of the charset header.  This has
  spawned huge volumes of debate that I won't reproduce here -
  the basic problem comes from the fact that a conformant
  XML processor can with very high probability determine the
  correct encoding of a resource by reading it.  What then if the
  server 
  (a) sends an incorrect charset header, or 
  (b) transcodes the resource so that the XML self-description is
      wrong (allowed for text/* resources) - this is particularly
      nasty when the XML processor uses the charset parameter
      to read the doc, but then breaks it by saving it in its
      non-self-describing form.

It should be pointed out that IETF considers (correctly) that
there are security issues raised whenever a software module
steps outside the bounds set by the MIME headers.  

Cheers, Tim

Received on Monday, 21 January 2002 14:27:27 UTC