Re: UTF-16 and MIME text/* from Bjoern Hoehrmann on 2001-08-12 (www-international@w3.org from July to September 2001)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sun, 12 Aug 2001 03:13:51 +0200
To: John Cowan <cowan@mercury.ccil.org>
Cc: John Cowan <cowan@mercury.ccil.org>, www-international@w3.org, phoffman@imc.org
Message-ID: <ovkbnts2jhmqtne3st8bbrgp36h8upvr34@4ax.com>

* John Cowan wrote:
>> >>    RFC 2871 registers all UTF-16 charsets (UTF-16BE, UTF-16LE and
>> >> UTF-16) as not suitable for use in MIME content types under the
>> >> "text" top-level type. Why?

I'm sorry, maybe I need some more spoon-feeding on this...

># The canonical form of any MIME "text" subtype MUST always represent a
># line break as a CRLF sequence.  Similarly, any occurrence of CRLF in
># MIME "text" MUST represent a line break.  Use of CR and LF outside of
># line break sequences is also forbidden.
>#
># This rule applies regardless of format or character set or sets
># involved.

If you consider 0x00 0x0d 0x00 0x0a or 0x0d 0x00 0x0a 0x00 in the UTF-16
data, then this paragraph applies, since it refers to the _decoded_ form
of the data; RFC 2046 doesn't make restrictions on the encoded
form of the data. What do I miss?

>CR and LF here refer to the *octets* 0xD and 0xA respectively, as
>explained in section 4.1.2, not to the characters.

This sections deals with the Charset Parameter and deals with US-ASCII
but I can't read such a statement and I'm not sure if it would apply if
there were.
-- 
Björn Höhrmann { mailto:bjoern@hoehrmann.de } http://www.bjoernsworld.de
am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de
25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/

Received on Saturday, 11 August 2001 21:34:57 UTC