Re: [nsMediaType-3] Principles and corner cases (fwd) from Mark Baker on 2002-01-22 (xml-dist-app@w3.org from January 2002)

From: Mark Baker <distobj@acm.org>
Date: Mon, 21 Jan 2002 23:05:54 -0500 (EST)
To: xml-dist-app@w3.org
Message-Id: <200201220405.XAA25684@markbaker.ca>
FYI, the TAG's summary (as communicated by Tim Bray) of our question #3
regarding the relationship between media types and namespaces.

Forwarded message:
> At the TAG telecon this morning, I agreed to do a write-up on
> some of the issues around [nsMediaType-3], including core
> issues and problematic corner cases.
> 
> I believe the TAG probably has consensus on these principles:
> 
> P1. Media types are an important part of the web architecture;
> dispatching on them, when possible, is efficient and robust
> and well-understood.
> 
> P2. When processing XML resources, dispatching to software
> modules on the basis of namespaces is desirable and correct
> behavior.
> 
> P3. Clearly in many cases context matters: you can't in the
> general case reach into the middle of a resource and
> safely process some element based only on its namespace.
> 
> P4. The namespace of the root element of an XML resource
> has a special status, if only because it provides the 
> outermost level of context.
> 
> Agreeing on all this doesn't make the problems go away.
> Here are some that arise - maybe they're corner/pathological
> cases that can be overlooked, but they should be considered:
> 
> C1. As demonstrated by the example of XSLT on this list, 
> the namespace of a root element can be misleading.  It has 
> been suggested that the same problem is likely to show up in
> XQuery.
> 
> C2. Namespace processing obviously becomes more relevant 
> in the case where the resource is served as text/xml or
> application/xml.  There is currently no consensus as to
> whether or when it's desirable to serve resources with
> either of these media types.
> 
> C3. The issue has been raised of whether MIME headers
> or media types are useful in signaling the makeup of
> XML resources which contain markup from multiple
> namespaces.  There's no consensus on this issue.
> 
> C4. There is the possibility of inconsistency between the
> media type and what the namespace says.  This is a specific
> case of a more general problem of what happens when there's
> an inconsistency between any of the MIME headers and
> anything about the document content.  Here are three examples
> that illustrate both the general and specific problem:
> 
> - simple obvious inconsistency, e.g. a server sends a resource
>   with media type text/xhtml+xml, but the root element has a namespace 
>   declaration saying it's SVG
> - a slight variation where the resource in the SVG namespace is
>   sent with a media-type of application/xml.
> - certain browsers have been known to sniff into resource content
>   and decide to render as HTML [or not] based on whether 
>   there's an internal subset, or whether the first few hundred
>   bytes have tags that "look like" HTML.
> - there is the whole isssue of the charset header.  This has
>   spawned huge volumes of debate that I won't reproduce here -
>   the basic problem comes from the fact that a conformant
>   XML processor can with very high probability determine the
>   correct encoding of a resource by reading it.  What then if the
>   server 
>   (a) sends an incorrect charset header, or 
>   (b) transcodes the resource so that the XML self-description is
>       wrong (allowed for text/* resources) - this is particularly
>       nasty when the XML processor uses the charset parameter
>       to read the doc, but then breaks it by saving it in its
>       non-self-describing form.
> 
> It should be pointed out that IETF considers (correctly) that
> there are security issues raised whenever a software module
> steps outside the bounds set by the MIME headers.  
> 
> Cheers, Tim
> 


-- 
Mark Baker, Chief Science Officer, Planetfred, Inc.
Ottawa, Ontario, CANADA.      mbaker@planetfred.com
http://www.markbaker.ca   http://www.planetfred.com
Received on Monday, 21 January 2002 23:04:22 UTC