- From: James Clark <jjc@jclark.com>
- Date: Tue, 05 Dec 2000 15:23:36 +0700
- To: www-xml-schema-comments@w3.org
Some comments on Appendix F of Schema Part 2. 1. The section seems to be crying out for a formal grammar. 2. The definition of character class escapes should mention "block escapes". (It also should say that the "valid character class escapes *are* ..." not "include ...".) 3. The terminology in the description of category escapes is broken. "Lu", "Ll" etc are not character properties but are possible values of the "General Category" property. It is not satisfactory to say "the following table specifies the main character properties". There needs to be a precise statement of exactly what is allowed as a category escape. It seems like what you mean is any two-letter sequence that occurs as the value of the General Category property of some character, or the first letter of such a two-letter sequence. It would be helpful to refer to Section 4.5 of Unicode3. 4. It seems strange to have an escape for name characters but not for name start characters (the characters allowed at the beginning of a name). This means I cannot conveniently write a regex that matches XML names. (Or cannot I do it with \c 5. It would be helpful to say exactly where the definitive list of block names is to be found: in the Blocks.txt file of the Unicode Character Database (http://www.unicode.org/Public/UNIDATA/Blocks.txt). The Unicode standard itself doesn't quite do it: for example, the chart for 000-007F is enttiled "C0 Controls and Basic Latin", whereas Blocks.txt calls it simply "Basic Latin". 6. If I turn the prose description of character class subtraction into a grammar I get: character class ::= character class escape | character class expression character class expression ::= '[' character group ']' character group ::= positive character group | negative character group | character class subtraction negative character group ::= '^' , positive character group character class subtraction ::= (positive character group | negative character group) '-' character class expression which suggests that a character class subtraction looks like: [abc-[def]] If this is right, it's deeply confusing that the description of \w uses an incompatible syntax: [...]-[...]. It is also a pretty bizarre feature: is this really necessary? I couldn't find any mention of it in the Regexp documentation I consulted. Overloading '-' for two completely different operations doesn't seem like a good design. James
Received on Tuesday, 5 December 2000 03:25:26 UTC