Re: revised "generic syntax" internet draft from Martin J. Duerst on 1997-04-27 (uri@w3.org from April 1997)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Sun, 27 Apr 1997 14:34:24 +0200 (MET DST)
To: Keld J|rn Simonsen <keld@dkuug.dk>
Cc: John C Klensin <klensin@mci.net>, Edward Cherlin <cherlin@newbie.net>, uri@bunyip.com
Message-Id: <Pine.SUN.3.96.970427141954.245B-100000@enoshima>

On Sat, 26 Apr 1997, Keld J|rn Simonsen wrote:

> "Martin J. Duerst" writes:
> 
> > > (iv) It is not hard to demonstrate that, in the medium to 
> > > long term, there are some requirements for character set 
> > > encoding for which Unicode will not suffice and it will be 
> > > necessary to go to multi-plane 10646
> > 
> > You are not the first or only one to notice this. Unicode
> > currently can encode planes 0 to 16 (for a total of about
> > one million codepoints) by a mechanism called surrogates
> > or UTF-16. Please check your copy of Unicode vol. 2.
> 
> Surely we are not talking Unicode, (an industry standard) but ISO 10646?
> IETF normally specifies ISO standards when available. 10646 is 32 bits.

We are usually (implicitly or explicitly) talking both ISO 10646 and
Unicode, as they are the same for most practical purposes. For official
specification, I agree that ISO 10646 is to be preferred. On the other
hand, a lot of actual systems (in those cases where the differences
actually matter) are closer to Unicode than ISO 10646, and also a lot
of Unicode/ISO 10646 systems are anounced/marketed using the name
"Unicode" rather than the number "10646".

My above remark was to point out that if we specify ISO 10646,
but an actual industry standard system uses Unicode, then not
only are the codepoints in the BMP the same, but also both
standards/systems will have an unified code space up to well
over a million codepoints.

In addition, for the whole equivalence/normalization question,
we will have to base our work on the equivalences defined in
Unicode, because there are no such equivalences defined in
ISO 10646.

I hope that in the above sense, an occasional reference to
Unicode in this discussion and in the resulting specs will
be tolerated (:-) even by the strongest ISO 10646 proponents,
and that all of us that know about the usefulness of a Universal
Character Set can work towards making the best use of it
in URLs.

Regards,	Martin.

Received on Sunday, 27 April 1997 08:34:42 UTC