Re: UTF-16 and MIME text/* from Yung-Fong Tang on 2001-08-14 (www-international@w3.org from July to September 2001)

From: Yung-Fong Tang <ftang@netscape.com>
Date: Tue, 14 Aug 2001 12:28:58 -0700
To: John Cowan <cowan@mercury.ccil.org>
CC: Bjoern Hoehrmann <derhoermi@gmx.net>, www-international@w3.org, phoffman@imc.org
Message-ID: <3B797BFA.F9A8552B@netscape.com>

John Cowan wrote:

> Bjoern Hoehrmann scripsit:
>
> > If you consider 0x00 0x0d 0x00 0x0a or 0x0d 0x00 0x0a 0x00 in the UTF-16
> > data, then this paragraph applies, since it refers to the _decoded_ form
> > of the data; RFC 2046 doesn't make restrictions on the encoded
> > form of the data. What do I miss?
>
> No, it's the encoded form that is being restricted.  The whole point of
> this is so that naive processors that understand only ASCII and text/plain
> can at least figure out where the line breaks are, since for local presentation
> purposes (not for retransmission) it may be necessary to convert the
> standard line break (0xD 0xA) into something else.  Charsets such as
> EBCDIC and UTF-16 (in all their flavors) break this rule and can't be used
> in MIME text/* emails.

If it is the "encoded form" that is being restricted, then you will still not see
0x00 0x0d 0x00 0x0a in the "encoded form" if you use the following, right ?

Content-Type: text/plain; charset=UTF-16
Content-transfer-encoding: base64


>
>
> > >CR and LF here refer to the *octets* 0xD and 0xA respectively, as
> > >explained in section 4.1.2, not to the characters.
> >
> > This sections deals with the Charset Parameter and deals with US-ASCII
> > but I can't read such a statement and I'm not sure if it would apply if
> > there were.
>
> See RFC 822 for formal definitions of CR, LF, and CRLF, where it is
> made clear that they are octet based.
>
> --
> John Cowan                                   cowan@ccil.org
> One art/there is/no less/no more/All things/to do/with sparks/galore
>         --Douglas Hofstadter

Received on Tuesday, 14 August 2001 15:31:50 UTC