Re: 9 July 2003 draft of "Client handling of MIME headers" available from Tim Bray on 2003-07-09 (www-tag@w3.org from July 2003)

From: Tim Bray <tbray@textuality.com>
Date: Wed, 09 Jul 2003 16:57:31 -0700
To: "Roy T. Fielding" <fielding@apache.org>
Cc: "Ian B. Jacobs" <ij@w3.org>, www-tag@w3.org
Message-ID: <3F0CABEB.5020708@textuality.com>

Roy T. Fielding wrote:

> Sorry for the late comments, but the following is incorrect:

I'll skip Roy's comments about the terminology & so on, I assume he's right.

>  Second, the charset parameter is
> usually supplied by the server if the security checks it places on
> the content are dependent upon the configured character encoding
> for that content.  This strategy exists because of security flaws in
> deployed browsers that allow auto-selection of character encoding
> to change the interpretation of certain fields from raw data to
> executable content.  So, contrary to this finding, such servers must
> provide a default charset parameter to work around security flaws
> and, in particular, the boneheaded way that browsers try to
> autoselect character encoding, which is not recommended by HTTP.

I'm really unconvinced.

In a substantial proportion of cases, when you're sending an XML message 
over the web, the recipient will either not be a browser at all or will 
be a reasonably modern one.  In particular, if the recipient is capable 
of parsing XML and doing anything sensible with it, it seems unlikely 
that it's going to be doing stupid autoselection, and if it is, it's 
already breaking the rules of the governing spec, namely XML 1.*, which 
is very specific about how autoselection is to be done.

In the case where it is a browser, it seems unlikely that the old HTML 
machinery with broken autoselection would be invoked on data served as 
*/xhtml+xml (or */*+xml for that matter); and of course the advice here 
only applies when data is explicitly served as XML.

So unless it can be demonstrated that there are broken agents out there 
that will take data *served as XML* and apply broken character-encoding 
heuristics, then I think the language is OK as stated; because for 
things served as XML, trying to assert the character encoding is 
actively damaging unless you know for sure what it is, in which case 
it's merely redundant, because anything that has a real XML processor in 
it doesn't need to be told.

-- 
Cheers, Tim Bray
         (ongoing fragmented essay: http://www.tbray.org/ongoing/)

Received on Thursday, 10 July 2003 00:53:58 UTC