- From: Mike French <mfrench@atg.com>
- Date: Wed, 27 Jun 2001 08:46:18 -0700
- To: www-xml-blueberry-comments@w3.org
Does Unicode 3.1 require surrogates ? I know that surrogate encoding schemes have existed for a while (ab initio), and technically all Unicode processors should support them, but AFAIK there were no actual blocks assigned in the full UCS-4 domain until 3.x. If supporting 3.1 implicitly requires that UTF-16 and UTF-8 processing supports surrogates, because it has character blocks defined for the full UCS-4 domain, then this will break a lot of character handling implementations in the real world. For example, I bet quite a few UTF-8 converters only handle 1, 2 or 3-byte sequences (enough to hold 16-bit data), not the full 6(?) needed for surrogates. And I also know that most Unicode implementations use unsigned short 16-bit integers to hold character data, not full 32-bit integers. Anything that hastens the day when surrogates appear in XML, either explicitly or implicitly, is a very bad idea ! Mike P.S. Your Unicode link points to XPointer ! Is this a circular meta-reference ???
Received on Wednesday, 27 June 2001 11:47:04 UTC