- From: Roy T. Fielding <fielding@apache.org>
- Date: Thu, 10 Jul 2003 15:44:04 +0200
- To: Tim Bray <tbray@textuality.com>
- Cc: "Ian B. Jacobs" <ij@w3.org>, www-tag@w3.org
>> Second, the charset parameter is >> usually supplied by the server if the security checks it places on >> the content are dependent upon the configured character encoding >> for that content. This strategy exists because of security flaws in >> deployed browsers that allow auto-selection of character encoding >> to change the interpretation of certain fields from raw data to >> executable content. So, contrary to this finding, such servers must >> provide a default charset parameter to work around security flaws >> and, in particular, the boneheaded way that browsers try to >> autoselect character encoding, which is not recommended by HTTP. > > I'm really unconvinced. > > In a substantial proportion of cases, when you're sending an XML > message over the web, the recipient will either not be a browser at > all or will be a reasonably modern one. In particular, if the > recipient is capable of parsing XML and doing anything sensible with > it, it seems unlikely that it's going to be doing stupid > autoselection, and if it is, it's already breaking the rules of the > governing spec, namely XML 1.*, which is very specific about how > autoselection is to be done. I have more problems with "reasonably modern" browsers than the ones that simply follow the standards. XML doesn't define how applications are expected to process the content within elements and attributes. XML does not prevent someone from implementing client-side scripting within XML elements. Therefore, the only way that XML can enable auto-selection of character encodings without opening a security hole is by requiring that they always be processed in the same way by both generators and recipients of messages. You'll have to make that process a normative requirement. The media type charset parameter tells the recipient that the content is to processed in one and only one way, at least until user intervention changes the context in which it is being processed. A server should apply the charset parameter if it has made assumptions about the character encoding (for the sake of efficiency). If there is a conflict, the user must be alerted so that the conflict can be resolved correctly, just as is the case for the rest of the media type. In short, the exceptions listed in that section are neither needed nor desirable: the media type is authoritative and that's all there is to it. If you don't want to allow servers the freedom to be efficient, then do not allow the charset parameter on application/*xml. That way at least everyone is following the same rules and we get interoperability, albeit a bit slower than for data formats that don't need to be byte-fondled during delivery. ....Roy
Received on Thursday, 10 July 2003 09:57:36 UTC