W3C home > Mailing lists > Public > www-tag@w3.org > July 2003

Re: 9 July 2003 draft of "Client handling of MIME headers" available

From: Roy T. Fielding <fielding@apache.org>
Date: Thu, 10 Jul 2003 15:44:04 +0200
Cc: "Ian B. Jacobs" <ij@w3.org>, www-tag@w3.org
To: Tim Bray <tbray@textuality.com>
Message-Id: <92521501-B2DC-11D7-BE2F-000393753936@apache.org>

>>  Second, the charset parameter is
>> usually supplied by the server if the security checks it places on
>> the content are dependent upon the configured character encoding
>> for that content.  This strategy exists because of security flaws in
>> deployed browsers that allow auto-selection of character encoding
>> to change the interpretation of certain fields from raw data to
>> executable content.  So, contrary to this finding, such servers must
>> provide a default charset parameter to work around security flaws
>> and, in particular, the boneheaded way that browsers try to
>> autoselect character encoding, which is not recommended by HTTP.
> I'm really unconvinced.
> In a substantial proportion of cases, when you're sending an XML 
> message over the web, the recipient will either not be a browser at 
> all or will be a reasonably modern one.  In particular, if the 
> recipient is capable of parsing XML and doing anything sensible with 
> it, it seems unlikely that it's going to be doing stupid 
> autoselection, and if it is, it's already breaking the rules of the 
> governing spec, namely XML 1.*, which is very specific about how 
> autoselection is to be done.

I have more problems with "reasonably modern" browsers than the ones
that simply follow the standards.  XML doesn't define how applications
are expected to process the content within elements and attributes.
XML does not prevent someone from implementing client-side scripting
within XML elements.  Therefore, the only way that XML can enable
auto-selection of character encodings without opening a security hole
is by requiring that they always be processed in the same way by both
generators and recipients of messages.  You'll have to make that process
a normative requirement.

The media type charset parameter tells the recipient that the content
is to processed in one and only one way, at least until user 
changes the context in which it is being processed.  A server should
apply the charset parameter if it has made assumptions about the
character encoding (for the sake of efficiency).  If there is a 
the user must be alerted so that the conflict can be resolved correctly,
just as is the case for the rest of the media type.  In short, the
exceptions listed in that section are neither needed nor desirable:
the media type is authoritative and that's all there is to it.  If you
don't want to allow servers the freedom to be efficient, then do not
allow the charset parameter on application/*xml.  That way at least
everyone is following the same rules and we get interoperability,
albeit a bit slower than for data formats that don't need to be
byte-fondled during delivery.

Received on Thursday, 10 July 2003 09:57:36 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:55:59 UTC