- From: Martin J. Duerst <mduerst@ifi.unizh.ch>
- Date: Sun, 27 Apr 1997 14:34:24 +0200 (MET DST)
- To: Keld J|rn Simonsen <keld@dkuug.dk>
- Cc: John C Klensin <klensin@mci.net>, Edward Cherlin <cherlin@newbie.net>, uri@bunyip.com
On Sat, 26 Apr 1997, Keld J|rn Simonsen wrote: > "Martin J. Duerst" writes: > > > > (iv) It is not hard to demonstrate that, in the medium to > > > long term, there are some requirements for character set > > > encoding for which Unicode will not suffice and it will be > > > necessary to go to multi-plane 10646 > > > > You are not the first or only one to notice this. Unicode > > currently can encode planes 0 to 16 (for a total of about > > one million codepoints) by a mechanism called surrogates > > or UTF-16. Please check your copy of Unicode vol. 2. > > Surely we are not talking Unicode, (an industry standard) but ISO 10646? > IETF normally specifies ISO standards when available. 10646 is 32 bits. We are usually (implicitly or explicitly) talking both ISO 10646 and Unicode, as they are the same for most practical purposes. For official specification, I agree that ISO 10646 is to be preferred. On the other hand, a lot of actual systems (in those cases where the differences actually matter) are closer to Unicode than ISO 10646, and also a lot of Unicode/ISO 10646 systems are anounced/marketed using the name "Unicode" rather than the number "10646". My above remark was to point out that if we specify ISO 10646, but an actual industry standard system uses Unicode, then not only are the codepoints in the BMP the same, but also both standards/systems will have an unified code space up to well over a million codepoints. In addition, for the whole equivalence/normalization question, we will have to base our work on the equivalences defined in Unicode, because there are no such equivalences defined in ISO 10646. I hope that in the above sense, an occasional reference to Unicode in this discussion and in the resulting specs will be tolerated (:-) even by the strongest ISO 10646 proponents, and that all of us that know about the usefulness of a Universal Character Set can work towards making the best use of it in URLs. Regards, Martin.
Received on Sunday, 27 April 1997 08:34:42 UTC