- From: Tim Bray <tbray@textuality.com>
- Date: Thu, 10 Jul 2003 09:27:44 -0700
- To: "Roy T. Fielding" <fielding@apache.org>
- Cc: "Ian B. Jacobs" <ij@w3.org>, www-tag@w3.org
The subject under debate is (I think, maybe I'm wrong & that's why we're having trouble syncing up) is what we ought to say about the use of the charset parameter when accompanying data served as XML, by which we mean */xml or */*+xml. Roy T. Fielding wrote: > I have more problems with "reasonably modern" browsers than the ones > that simply follow the standards. XML doesn't define how applications > are expected to process the content within elements and attributes. > XML does not prevent someone from implementing client-side scripting > within XML elements. Therefore, the only way that XML can enable > auto-selection of character encodings without opening a security hole > is by requiring that they always be processed in the same way by both > generators and recipients of messages. You'll have to make that process > a normative requirement. I believe that XML does in fact ensure that they are processed in the same way. Check out the coverage of character encodings in http://www.w3.org/TR/REC-xml, in particular 4.3.3 (#charencoding) and the #sec-guessing appendix. It is possible, with some effort, to "fool" an XML procssor with a bogus encoding declaration, but (unless I'm missing something) not for anything but ASCII characters. So at the moment, I can't visualize a security vulnerability that would occur as a result of charset settings <emph>in the case where data is served as XML and given to an XML processor</emph>. Of course, I wouldn't be that surprised if someone could dream up a counter-example. But I couldn't, and I tried. > > In short, the > exceptions listed in that section are neither needed nor desirable: > the media type is authoritative and that's all there is to it. I agree on authoritativeness, but not with the rest of the sentence. If the XML processor's auto-detection of the character type disagrees with the media type metadata, then that is an error condition and the agent MUST report an error. Since an XML processor's autodetection of encoding is infinitely more likely to be correct than, for example, an Apache server's guess based on file extension, local policies, and so on, the best solution is to do as we say and *not* provide the charset parameter <emph>for data served as XML</emph>, unless you are *really sure* that it knows what the encoding is, and to recognize that in this case the information is purely redundant; it may be of interest to intermediate entities such as caches and proxies (although it's not obvious to me how) but it can never be of positive utility to the receiving agent, if the receiving agent is an XML processor. So why can't we say that? > If you > don't want to allow servers the freedom to be efficient, then do not > allow the charset parameter on application/*xml. I'd go for that, and extend the ban to application/*+xml, but I see no reason why this would decrease efficiency. -- Cheers, Tim Bray (ongoing fragmented essay: http://www.tbray.org/ongoing/)
Received on Thursday, 10 July 2003 12:40:14 UTC