- From: Chris Lilley <chris@w3.org>
- Date: Wed, 9 Apr 2003 04:46:55 +0200
- To: "Rick Jelliffe" <ricko@topologi.com>
- CC: www-tag@w3.org
On Tuesday, April 8, 2003, 2:18:13 PM, Rick wrote: RJ> Chris Lilley <chris@w3.org> >> - request a modification the +XML media type registration to state that >> encoding information MUST NOT be supplied by the server unless it is >> known to be correct and agrees with any internal encoding information >> in the content. RJ> IIRC the extra thing to keep in mind is Japanese dumb transcoding proxies, RJ> which may transcode text/* without rewriting the headers. Yes, and I am aware that this use case is what drove the 'charset parameter everywhere' design. The solution to this is to use application/xml, not text/xml. encoding proxies are unlikely to be altering image, video, application and so forth. RJ> AFAIK RJ> they are the only thing that may make it desirable to favour any explicit RJ> MIME header charset over the XML encoding PI. Yes, agreed. Which is not in itself as good reason to spread that behaviour into the previously safe world of image, video, application and so forth. RJ> Maybe some Japanese expert can comment on whether they are still a RJ> factor to be considered. I gather that they are. For text. Which has its own problems already. Any text/* type *has* to make sense tothe user when displayed as if it were text/plain; charset=us-ascii. That is a poor fit for XML. RJ> If it is true that the Japanese dumb transcoding proxies are the only RJ> thing that actually may supercede the XML header, one approach RJ> might be to make a special case to reflect it, rather than a general RJ> framework. For example, to say "The charset parameter should RJ> not be used for */xml unless it a regional character set from RJ> locale with more than one common ASCII-derived encoding RJ> and which have transcoding proxies widely deployed. RJ> NOTE: at the current time, this means Japanese encodings RJ> in particular shift-JIS and EUCJ." Eek. Le tme think about that suggestion a little more. How is it different to saying the same thing in the xml encoding declaration. I really, really want to avoid the situation where an XML file is well formed over the wire but ceases to be well formed when the server or other backend, filesystem-based processor manipulates it because the charset parameter is not present and the encoding declaration is wrong. Transcoding proxies do exactly that - make XML documents not well formed. the solution is to stop the dumb proxies breaking documents and if you can't stop them, then just don't use text/xml.. RJ> (I don't know whether there are EBCDIC transcoding proxies RJ> in use too. I suspect not: the trend seems to be minor character RJ> sets to disappear in favour of Unicode at the source, and RJ> for recipients to be able to parse common character sets.) I think you are right. Thanks for the comments - appreciated. -- Chris mailto:chris@w3.org
Received on Tuesday, 8 April 2003 22:47:08 UTC