Re: UTF-16 and MIME text/*

"McDonald, Ira" wrote:
> 
> Hi Chris,
> 
> I hope you meant to say, XML which is encoded in UTF-16 should
> not be served as "text/xml". 

No, unfortunately, that was not the logical conclusion that I was able
to draw.

>  XML which is encoded in UTF-8 is
> perfectly safe to serve as "text/xml" and SHOULD be.

XML which is encoded in UTF-8 and XML which is encoded in UTF-16 can
both omit the encoding declaration. The server will have a hard time
telling them apart. In addition, since both UTF-8 and UTF-16 are
required to be supported, software might well convert between these two
encodings based on, for example, whichever gives the smaller file size.

> 
> Oddly, RFC 3023 (XML Media Types) actually discusses using
> "text/xml" with UTF-16 encoding ONLY over HTTP transport
> (how this could be safe for the receiver AFTER the resource
> is moved by HTTP transport is not explained in RFC 3023).

Yes, exactly. But the point is not the handling when text/xml is
recognised. The point is that text/* has a lot of (IMHO) unfortunate
rules which apply to the entire text/* hierarchy, and one of those is
the requirement to be able to blindly assume things about end of line
markers. So proxies, middleware, mail-to-we gateways and so forth are
unfortunately allowed to wreak havoc on XML files based on some rather
dated assumptions.

Thus, the safe way to ship XML and ensure end-to-end integrity is to use
a non-text type such as application/xml or soem more specific type such
as image/svg+xml, application/xhtml+xml, and so forth.

-- 
Chris

Received on Friday, 8 June 2001 18:04:40 UTC