Errata in Appendix B from Steve Dahl on 1999-08-26 (xml-editor@w3.org from July to September 1999)

From: Steve Dahl <sdahl@goshawk.com>
Date: Wed, 25 Aug 1999 22:32:04 -0400
To: xml-editor@w3.org
Message-ID: <37C4A723.AE03E731@goshawk.com>

I don't understand the following quotes from Appendix B. They seem to
conflict with the general rules given for classifying Unicode
characters.

> The following characters are treated as name-start characters rather
than
> name characters, because the property file classifies them as
Alphabetic:
> [#x02BB-#x02C1], #x0559, #x06E5, #x06E6.

In the Unicode databases that I can find (Unicode 1.1 through 3.0),
these are all classified as Lm, which should make them name characters,
not name-start characters. Which property file was used to define the
XML spec? Where can I find a copy of that file?

> Character #x00B7 is classified as an extender, because the property
list
> so identifies it.

> Character #x0387 is added as a name character, because #x00B7 is its
> canonical equivalent.

These are both classified as Po characters in all of the Unicode
character databases I could find. They re not classified as Lm, which I
assume is what is meant by Extender. Therefore, it seems like they
should not be classified as name characters.

If these three line items are correct relative to current Unicode
definitions, what is the algorithm we should use to upgrade an XML
processor to Unicode 2.1? For which characters should we *not* trust the
Unicode Consortioum's character database for classification?

--
- Steve Dahl
sdahl@goshawk.com

Received on Wednesday, 25 August 1999 22:34:02 UTC