- From: Chris Newman <Chris.Newman@Sun.COM>
- Date: Fri, 06 Dec 2002 13:13:41 -0800
- To: Marcin Hanclik <mhanclik@poczta.onet.pl>
- Cc: ietf-charsets@iana.org
begin quotation by Marcin Hanclik on 2002/11/25 21:09 +0100: > Your explanation means that you cannot send UTF-16 encoding, because it > cannot preserve CRLF. > You could not send any unicode characters (apart from UTF-8) in MIME > then!!! As Ned said, you can't send UTF-16 in the "text" top-level media type in MIME (with a notable exception for the HTTP variant of MIME), but you could use it in an "application/text" mediatype in SMTP and MIME. On the flip side, why would you want to? UTF-16 is a terrible encoding for interoperability. There are 3 published non-interoperable variants of UTF-16 (big-endian, little-endian, BOM/switch-endian) and only one of the variants can be auto-detected with any chance of success (and none of them can be auto-detected as well as UTF-8). It's not a fixed-width encoding, so you don't get the fixed-width benefits that UCS-4 would provide (unless you ignore a slew of plane-1 characters) and it doesn't have any of the useful characteristics of UTF-8 (nearly complete compatibility with code written to operate on 8-bit character strings). So this raises the question: why would any sensible protocol designer ever what to transport UTF-16 over the wire? There may be a few rare corner cases where it makes sense, but in general UTF-8 is superior in almost all instances. I suspect the only reason we see UTF-16 on the wire is because some programmers are too lazy to convert from an internal variant of UTF-16 to interoperable UTF-8 on the wire, and haven't thought through the bad consequences of their laziness. See RFC 2277 -- the IETF has a clear policy recommending UTF-8 with good reason. - Chris
Received on Friday, 6 December 2002 16:18:46 UTC