Re: Registration of new charset "UTF-16" from Erik van der Poel on 1998-05-19 (ietf-charsets@w3.org from April to June 1998)

From: Erik van der Poel <erik@netscape.com>
Date: Tue, 19 May 1998 11:20:29 -0700
To: Larry Masinter <masinter@parc.xerox.com>
Cc: Chris Newman <Chris.Newman@INNOSOFT.COM>, MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>, ietf-charsets@ISI.EDU, murata@fxis.fujixerox.co.jp, Tatsuo_Kobayashi@justsystem.co.jp
Message-id: <3561CD6D.7A6DD043@netscape.com>

Larry Masinter wrote:

> I sent out a poll to the HTTP working group: are there two independent
> interoperable implementations of the HTTP 'exception' that send and
> process text types that don't use CR, LF, or CRLF for end of line?
> If we can't find two independent interoperable implementations, we
> may have to remove the 'feature' before we can progress HTTP/1.1 to
> Draft Standard.

As others have said, the Netscape and Alis clients support UCS-2, and MSIE
supports it to some degree too. (At least as far as end-of-line issues are
concerned, which are relatively trivial.)

Netscape looks for the HTTP charset parameter, and recognizes the following
UCS-2-related charset names:

ISO-10646-UCS-2
csUnicode11
ISO-10646-UCS-BASIC
csUnicodeASCII
ISO-10646-Unicode-Latin1
csUnicodeLatin1
ISO-10646
ISO-10646-J-1

The first one is the "main" one. Do Alis and MS use these names too?

If there is no HTTP charset, we try to detect UCS-2 by looking for 0xFEFF and
0xFFFE (little-endian). An early implementation looked for zero bytes, but
this was unreliable since some people (Gopher, if I remember correctly)
actually use zero bytes in non-UCS-2 text.

It might be a good idea to do some more extensive UCS-2 interoperability
testing, including charset name testing, and end-of-line testing. Sounds like
Makoto has already done some testing.

Erik

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Tuesday, 19 May 1998 11:23:36 UTC