W3C home > Mailing lists > Public > www-tag@w3.org > April 2003

Re: internet media types and encoding

From: Chris Lilley <chris@w3.org>
Date: Wed, 9 Apr 2003 04:46:55 +0200
Message-ID: <2433378265.20030409044655@w3.org>
To: "Rick Jelliffe" <ricko@topologi.com>
CC: www-tag@w3.org

On Tuesday, April 8, 2003, 2:18:13 PM, Rick wrote:


RJ> Chris Lilley <chris@w3.org>

>> - request a modification the +XML media type registration to state that
>> encoding information MUST NOT be supplied by the server unless it is
>> known to be correct and agrees with any internal encoding information
>> in the content.

RJ> IIRC the extra thing to keep in mind is Japanese dumb transcoding proxies,
RJ> which may transcode text/* without rewriting the headers.

Yes, and I am aware that this use case is what drove the 'charset
parameter everywhere' design.

The solution to this is to use application/xml, not text/xml.

encoding proxies are unlikely to be altering image, video, application
and so forth.

RJ>  AFAIK
RJ> they are the only thing that may make it desirable to favour any explicit
RJ> MIME header charset over the XML encoding PI.

Yes, agreed. Which is not in itself as good reason to spread that
behaviour into the previously safe world of image, video, application
and so forth.

RJ> Maybe some Japanese expert can comment on whether they are still a
RJ> factor to be considered.

I gather that they are. For text. Which has its own problems already.
Any text/* type *has* to make sense tothe user when displayed as if it
were text/plain; charset=us-ascii.

That is a poor fit for XML.

RJ> If it is true that the Japanese dumb transcoding proxies are the only
RJ> thing that actually may supercede the XML header, one approach
RJ> might be to make a special case to reflect it, rather than a general
RJ> framework.  For example, to say "The charset parameter should 
RJ> not be used for */xml unless it a regional character set from
RJ> locale with more than one common ASCII-derived encoding
RJ> and which have transcoding proxies widely deployed. 
RJ> NOTE: at the current time, this means Japanese encodings
RJ> in particular shift-JIS and EUCJ."

Eek. Le tme think about that suggestion a little more. How is it
different to saying the same thing in the xml encoding declaration.

I really, really want to avoid the situation where an XML file is well
formed over the wire but ceases to be well formed when the server or
other backend, filesystem-based processor manipulates it because the
charset parameter is not present and the encoding declaration is
wrong.

Transcoding proxies do exactly that - make XML documents not well
formed. the solution is to stop the dumb proxies breaking documents
and if you can't stop them, then just don't use text/xml..


RJ> (I don't know whether there are EBCDIC transcoding proxies 
RJ> in use too. I suspect not: the trend seems to be minor character
RJ> sets to disappear in favour of Unicode at the source, and
RJ> for recipients to be able to parse common character sets.)

I think you are right. Thanks for the comments - appreciated.

-- 
 Chris                            mailto:chris@w3.org
Received on Tuesday, 8 April 2003 22:47:08 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:17 GMT