Surrogates ? from Mike French on 2001-06-27 (www-xml-blueberry-comments@w3.org from June 2001)

From: Mike French <mfrench@atg.com>
Date: Wed, 27 Jun 2001 08:46:18 -0700
To: www-xml-blueberry-comments@w3.org
Message-ID: <3B39FFCA.877118E5@atg.com>

Does Unicode 3.1 require surrogates ?

I know that surrogate encoding schemes have existed for a while (ab initio),
and technically all Unicode processors should support them,
but AFAIK there were no actual blocks assigned in the full UCS-4 domain until 3.x.

If supporting 3.1 implicitly requires that UTF-16 and UTF-8 processing supports 
surrogates, because it has character blocks defined for the full UCS-4 domain,
then this will break a lot of character handling implementations in the real world.
For example, I bet quite a few UTF-8 converters only handle 1, 2 or 3-byte sequences
(enough to hold 16-bit data), not the full 6(?) needed for surrogates.
And I also know that most Unicode implementations use unsigned short 16-bit
integers to hold character data, not full 32-bit integers.

Anything that hastens the day when surrogates appear in XML,
either explicitly or implicitly, is a very bad idea !

Mike

P.S. Your Unicode link points to XPointer !
     Is this a circular meta-reference  ???

Received on Wednesday, 27 June 2001 11:47:04 UTC