Re: 9 July 2003 draft of "Client handling of MIME headers" available from Roy T. Fielding on 2003-07-10 (www-tag@w3.org from July 2003)

From: Roy T. Fielding <fielding@apache.org>
Date: Thu, 10 Jul 2003 20:14:29 +0200
To: Tim Bray <tbray@textuality.com>
Cc: "Ian B. Jacobs" <ij@w3.org>, www-tag@w3.org
Message-Id: <5926BAE2-B302-11D7-BE2F-000393753936@apache.org>

>> generators and recipients of messages.  You'll have to make that 
>> process
>> a normative requirement.
>
> I believe that XML does in fact ensure that they are processed in the 
> same way.  Check out the coverage of character encodings in 
> http://www.w3.org/TR/REC-xml, in particular 4.3.3 (#charencoding) and 
> the #sec-guessing appendix.  It is possible, with some effort, to 
> "fool" an XML procssor with a bogus encoding declaration, but (unless 
> I'm missing something) not for anything but ASCII characters.
>
> So at the moment, I can't visualize a security vulnerability that 
> would occur as a result of charset settings <emph>in the case where 
> data is served as XML and given to an XML processor</emph>.  Of 
> course, I wouldn't be that surprised if someone could dream up a 
> counter-example.  But I couldn't, and I tried.

For example, you are assuming that an intermediary is going to use a
full XML parser while processing documents to search for and remove
viral Word macros that happen to be in XML format.  I have no doubt
that if everyone follows the same rules then there won't be an issue.
However, parsing XML correctly is slow and subject to various DoS
concerns of its own, so intermediate filters do not follow all of the
XML rules.  They make shortcuts according to how they have been
configured, and they do so for good reasons.

> I agree on authoritativeness, but not with the rest of the sentence.  
> If the XML processor's auto-detection of the character type disagrees 
> with the media type metadata, then that is an error condition and the 
> agent MUST report an error.  Since an XML processor's autodetection of 
> encoding is infinitely more likely to be correct than, for example, an 
> Apache server's guess based on file extension, local policies, and so 
> on, the best solution is to do as we say and *not* provide the charset 
> parameter <emph>for data served as XML</emph>, unless you are *really 
> sure* that it knows what the encoding is, and to

No, that is the point that I object to.  A server can never be *really 
sure*
about anything -- that is complete nonsense.  A server is configured to 
do
a certain job in a certain way.  If the configuration causes the type to
be mislabeled, then it is either the fault of the configuration or the
fault of the data provider for providing data that isn't allowed by that
server.  Placing constraints on how servers are configured based on the
theory that authors are more likely to be right is contradictory to the
right of server owners to protect themselves from evil content.  The
software cannot be required to make a judgement call.

It is not our job and doesn't belong in the finding.  If we want to
require that XML be processed in the way that is suggested, then the
right thing to do is forbid the use of a charset parameter in the
first place.

....Roy

Received on Thursday, 10 July 2003 14:15:11 UTC