Re: Registration of new charset "UTF-16" from Erik van der Poel on 1998-05-15 (ietf-charsets@w3.org from April to June 1998)

From: Erik van der Poel <erik@netscape.com>
Date: Fri, 15 May 1998 10:16:26 -0700
To: Chris Newman <Chris.Newman@INNOSOFT.COM>
Cc: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>, Larry Masinter <masinter@parc.xerox.com>, ietf-charsets@ISI.EDU, murata@fxis.fujixerox.co.jp, Tatsuo_Kobayashi@justsystem.co.jp
Message-id: <355C786A.D007FF3C@netscape.com>

Chris,

Thanks for engaging in this discussion.

Chris Newman wrote:

> On Fri, 15 May 1998, MURATA Makoto wrote:
> > Makoto:
> >       This character set is not permitted for use with MIME text/* media
> >       types.  However, the MIME-like mechanism of HTTP may use this
> >       character set for text/*.
>
> I prefer this one or anything along these general lines.
>
> > Erik:
> >       This charset is not suitable for use with text/* media types in
> >       protocols that are sensitive to the line break issues described in
> >       section 4.1.1 of RFC 2046 (MIME). However, this charset is suitable for
> >       use with text/* media types in other protocols. See also section 19.4.1
> >       of RFC 2068 (HTTP).
>
> HTTP is the only protocol which uses MIME but doesn't follow the text
> rules.  I hope there will never be another.  The rules for text media
> types are not email centric.  They were put in so that:
> (1) Any text media type could be displayed directly to the user without
>     interpretation (treated as text/plain).

Sure, let's just present the binary ones and zeroes directly to the users. They can
understand it! ;-)

Somewhat more seriously, what you wrote is quite ASCII-centric and/or
Latin-centric. Languages like Japanese do not use the ASCII character codes.

Nevertheless, you do have a point. If we sent raw UTF-16, and the stupid UA just
blurted that out onto the screen, even American users would be baffled. Now we
wouldn't want that, would we? I mean at least the American users should be spared,
since they invented ARPANet and all. Oops, perhaps that went too far. I'm just
kidding here. I assume you realize that. ObSmiley :-)

> (2) Text has a canonical form so that signatures can be more easily
>     verified.

Hmmm... Perhaps UTF-16 could also have some sort of canonical form defined for it,
so that signatures would be possible.

> (3) Even if the charset is unknown, dumping the message to the screen
>     might be useful.

Again, ASCII-centric and Latin-centric thinking.

> (4) It can normally be sent unencoded through line-oriented protocols
>     with line length limits.

Yup, there are such protocols. E.g. SMTP. Would it be a good idea to list such
protocols in some document?

> I wouldn't be surprised if there are other good reasons for the text rules
> that I'm not familiar with.

Me too. I haven't given the MIME specs a good read lately, but they probably have
some more info.

Actually, the 4 points that you raise above should probably be in some RFC, no? It
would be a waste to leave them in this obscure mailing list archive. Then again,
maybe these points are already documented in MIME.

> > Larry:
> >       In accordance with the rules on end-of-line convention and 'text/',
> >       UTF-16 is inappropriate for use with 'text' media types. Those media
> >       types which might be deployed with UTF-16 might consider registering an
> >       'application' type as well.
>
> This omits mention of the HTTP exception, which seems important to some.

Yeah, perhaps it's a waste of my time to fight this anti-HTTP sentiment. It's a
shame, though, that we seem to be leaning towards using UTF-16 with application/*
rather than text/*. It's inconsistent. And it's a sudden change from the text/html
that everybody's already used to. Sigh.

Larry's version seems fine to me. Perhaps a reference to the sections of MIME and
HTTP that explain all this could be added to the end of the document? He says,
giving it one more try.

Erik

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Friday, 15 May 1998 11:04:33 UTC