Re: UTF-16 and MIME text/* from John Cowan on 2001-08-12 (www-international@w3.org from July to September 2001)

From: John Cowan <cowan@mercury.ccil.org>
Date: Sun, 12 Aug 2001 15:12:33 -0400 (EDT)
To: Bjoern Hoehrmann <derhoermi@gmx.net>
CC: John Cowan <cowan@mercury.ccil.org>, www-international@w3.org, phoffman@imc.org
Message-Id: <E15W0f3-0003LV-00@mercury.ccil.org>

Bjoern Hoehrmann scripsit:

> If you consider 0x00 0x0d 0x00 0x0a or 0x0d 0x00 0x0a 0x00 in the UTF-16
> data, then this paragraph applies, since it refers to the _decoded_ form
> of the data; RFC 2046 doesn't make restrictions on the encoded
> form of the data. What do I miss?

No, it's the encoded form that is being restricted.  The whole point of
this is so that naive processors that understand only ASCII and text/plain
can at least figure out where the line breaks are, since for local presentation
purposes (not for retransmission) it may be necessary to convert the
standard line break (0xD 0xA) into something else.  Charsets such as
EBCDIC and UTF-16 (in all their flavors) break this rule and can't be used
in MIME text/* emails.

> >CR and LF here refer to the *octets* 0xD and 0xA respectively, as
> >explained in section 4.1.2, not to the characters.
> 
> This sections deals with the Charset Parameter and deals with US-ASCII
> but I can't read such a statement and I'm not sure if it would apply if
> there were.

See RFC 822 for formal definitions of CR, LF, and CRLF, where it is
made clear that they are octet based.

-- 
John Cowan                                   cowan@ccil.org
One art/there is/no less/no more/All things/to do/with sparks/galore
	--Douglas Hofstadter

Received on Sunday, 12 August 2001 15:12:29 UTC