W3C home > Mailing lists > Public > www-international@w3.org > April to June 2001

Re: UTF-16 and MIME text/*

From: Chris Lilley <chris@w3.org>
Date: Sat, 09 Jun 2001 00:03:48 +0200
Message-ID: <3B214BC4.4A283E31@w3.org>
To: "McDonald, Ira" <imcdonald@sharplabs.com>
CC: John Cowan <cowan@mercury.ccil.org>, Bjoern Hoehrmann <derhoermi@gmx.net>, www-international@w3.org, phoffman@imc.org

"McDonald, Ira" wrote:
> Hi Chris,
> I hope you meant to say, XML which is encoded in UTF-16 should
> not be served as "text/xml". 

No, unfortunately, that was not the logical conclusion that I was able
to draw.

>  XML which is encoded in UTF-8 is
> perfectly safe to serve as "text/xml" and SHOULD be.

XML which is encoded in UTF-8 and XML which is encoded in UTF-16 can
both omit the encoding declaration. The server will have a hard time
telling them apart. In addition, since both UTF-8 and UTF-16 are
required to be supported, software might well convert between these two
encodings based on, for example, whichever gives the smaller file size.

> Oddly, RFC 3023 (XML Media Types) actually discusses using
> "text/xml" with UTF-16 encoding ONLY over HTTP transport
> (how this could be safe for the receiver AFTER the resource
> is moved by HTTP transport is not explained in RFC 3023).

Yes, exactly. But the point is not the handling when text/xml is
recognised. The point is that text/* has a lot of (IMHO) unfortunate
rules which apply to the entire text/* hierarchy, and one of those is
the requirement to be able to blindly assume things about end of line
markers. So proxies, middleware, mail-to-we gateways and so forth are
unfortunately allowed to wreak havoc on XML files based on some rather
dated assumptions.

Thus, the safe way to ship XML and ensure end-to-end integrity is to use
a non-text type such as application/xml or soem more specific type such
as image/svg+xml, application/xhtml+xml, and so forth.

Received on Friday, 8 June 2001 18:04:40 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:20 UTC