Re: [nsMediaType-3] Principles and corner cases from Tim Berners-Lee on 2002-02-02 (www-tag@w3.org from February 2002)

From: Tim Berners-Lee <timbl@w3.org>
Date: Fri, 1 Feb 2002 19:48:36 -0500
To: <www-tag@w3.org>, "Tim Bray" <tbray@textuality.com>
Message-ID: <007b01c1ab83$5910e730$6501a8c0@CREST>
----- Original Message -----
From: "Tim Bray" <tbray@textuality.com>
To: <www-tag@w3.org>
Sent: Monday, January 21, 2002 2:27 PM
Subject: Re: [nsMediaType-3] Principles and corner cases


> At the TAG telecon this morning, I agreed to do a write-up on
> some of the issues around [nsMediaType-3], including core
> issues and problematic corner cases.
>
> I believe the TAG probably has consensus on these principles:
>
> P1. Media types are an important part of the web architecture;
> dispatching on them, when possible, is efficient and robust
> and well-understood.
>
> P2. When processing XML resources, dispatching to software
> modules on the basis of namespaces is desirable and correct
> behavior.
>
> P3. Clearly in many cases context matters: you can't in the
> general case reach into the middle of a resource and
> safely process some element based only on its namespace.

I would like us to state the stronger case that in any case,
the meaning of XML is always dependent on the parental context.
Otherwise, we manacle the design of new langauges.

> P4. The namespace of the root element of an XML resource
> has a special status, if only because it provides the
> outermost level of context.

Practically, when a document is turned into an object,
it determines what sort of an object.  In other words, (whether or not you
have  XSLT inside it) you will always get a graphic object.
You can expect it to be able to render itself.

> Agreeing on all this doesn't make the problems go away.
> Here are some that arise - maybe they're corner/pathological
> cases that can be overlooked, but they should be considered:
>
> C1. As demonstrated by the example of XSLT on this list,
> the namespace of a root element can be misleading.  It has
> been suggested that the same problem is likely to show up in
> XQuery.

I propse that in fact we clean this up by looking at an SVG document
with embedded XSLT as an SVG document.  The XSLT vocabulary
extends SVG to allow one to insert functions of other things.
But the document is still and SVG document. And if you imagine
that now I invent a conditional compilation language, I could
us it to exclude the XSLT bits, so that processing with XSLT is not
even required.

To do anything else leads, as far as I can see, to total chaos.
The supreme court will have to sit every time we need to figure
out whether it is an SVG, XSLT, or Conditional document.


> C2. Namespace processing obviously becomes more relevant
> in the case where the resource is served as text/xml or
> application/xml.  There is currently no consensus as to
> whether or when it's desirable to serve resources with
> either of these media types.

I would say that it is esential to be able to server an arbitrary XML
document up as application/xml, but if the namespace of the
outermost element in fact has a its own media type, then that should be
used to provide more transparency.

> C3. The issue has been raised of whether MIME headers
> or media types are useful in signaling the makeup of
> XML resources which contain markup from multiple
> namespaces.  There's no consensus on this issue.

It seems to me that, when there is noexplicit media type for the
outermost element, that being able to put the outermost namespace
as a parameter to hte media type is useful for visibility as per
Roy's dissertation.  However, putting a list of other namespaces
on the line does not seem to be like a win, as it doesn't tell you
in what way if any they are used, or anything about the type of
object which the document might be represneted by in yout software.


> C4. There is the possibility of inconsistency between the
> media type and what the namespace says.  This is a specific
> case of a more general problem of what happens when there's
> an inconsistency between any of the MIME headers and
> anything about the document content.  Here are three examples
> that illustrate both the general and specific problem:
>
> - simple obvious inconsistency, e.g. a server sends a resource
>   with media type text/xhtml+xml, but the root element has a namespace
>   declaration saying it's SVG
> - a slight variation where the resource in the SVG namespace is
>   sent with a media-type of application/xml.
 Not inconsistent by the RFC.

> - certain browsers have been known to sniff into resource content
>   and decide to render as HTML [or not] based on whether
>   there's an internal subset, or whether the first few hundred
>   bytes have tags that "look like" HTML.

I think we had a consensus on the call that that was bad and a
secuity hazard.

> - there is the whole isssue of the charset header.  This has
>   spawned huge volumes of debate that I won't reproduce here -
>   the basic problem comes from the fact that a conformant
>   XML processor can with very high probability determine the
>   correct encoding of a resource by reading it.  What then if the
>   server
>   (a) sends an incorrect charset header, or
>   (b) transcodes the resource so that the XML self-description is
>       wrong (allowed for text/* resources) - this is particularly
>       nasty when the XML processor uses the charset parameter
>       to read the doc, but then breaks it by saving it in its
>       non-self-describing form.

This I think is well defined by Martin's summary graph
http://www.w3.org/2001/tag/2002/mime/xml-charset.svg
(source: http://www.w3.org/2001/tag/2002/mime/xml-charset.xml)

The conclusion here is that you cannot trancode an XML document without
looking inside.  I understand from Roy that there is no expectation that you
can do this to a MIME body. So to do it would be just wrong.

> It should be pointed out that IETF considers (correctly) that
> there are security issues raised whenever a software module
> steps outside the bounds set by the MIME headers.

Yes.

> Cheers, Tim
>
Received on Friday, 1 February 2002 19:48:26 UTC