RE: internationalization/ISO10646 question

> Your explanation means that you cannot send UTF-16 encoding, because it
> cannot preserve CRLF.
> You could not send any unicode characters (apart from UTF-8) in MIME then!!!

Reread my message more carefully. This isn't even close to what I'm saying.

What I'm saying is that there are specific restrictions on the text top level
type. These restrictions are there to insure that material labelled as text is
accessible to a broad class of generic text processing facilities. In
particular, such material is expected to be line oriented and represent line
breaks in one and only one way.

The restrictions on the text top level type do not extend to other top level
media types. If something is labelled as application/foo, for example, it can
use any sequence of octets it wants for a line break including not having a
defined line break at all and the CRLF sequence has no special predefined
meaning. Textual material labelled with some top level type other than text,
such as application/html, is therefore free to use whatever charset it likes,
including UTF-16LE, iso-10646-ucs-2, and so on.

> The media type you are writing about is to be used in the form:
> Content-Type: text/utf-16...

> and I mean:
> Content-Type: text/plain; charset="UTF-16"

Which is again an ilegal combination of media type and charset values.

> So I understand from your mail that BOM should be accepted when we have:
> Content-Type: text/plain; charset="iso-10646-ucs-2"

You understand wrong then. This is an invalid combination of elements, one
specifically prohibited by the standards. The handling of such things is not
specified by the standards. This immediately implies that a handler is free to
interpret this in any way it chooses. It could interpret it with a BOM, without
a BOM, it could interpret it as unlabelled binary data (which is what I'd
recommend it do), it could reject it as invalid, it could turn it into a three
part toccata and fugue and play it on the piano. Whatever. It's illegal.
Period!

				Ned

Received on Monday, 25 November 2002 15:24:57 UTC