- From: Kent M Pitman <kmp@harlequin.com>
- Date: Fri, 17 Apr 98 04:29:57 EDT
- To: xml-editor@w3.org
- Cc: kmp@harlequin.com
I reported this one last summer while you were in draft stage, but no action was taken. Maybe you were just too busy. Anyway, my problem with it all didn't go away. I have continued concern about and objection to the present [...] notational devices--both in the way they are defined and the way they are used... (1) The notation [a-zA-Z] is too-briefly described in chapter 6 as 'matches any character with a value in the range(s) indicated (inclusive).' I think this needs elaboration. At the VERY least, it should say 'from a to z and from A to Z'. (2) WORSE, the notation [a-zA-Z0-9_.:] is NOWHERE defined. Indeed, the notation [abc] is not even defined. (3) [^abc] is only scantily defined, although one must infer from context using superhuman skills that the "^" is part of the "not" notation and not part of the characters that are disallowed. Without more exposition, there is no way to discern that [^abc] doesn't mean Char - ( '^' | 'a' | 'b' | 'c' ) since there is no use of [...] shown and one might therefore assume that when hyphens are not present, there is an exclusion applied. (4) If you assume [abc] is defined as meaning the enclosed characters, then how do you know that [#x12-#x14] doesn't mean '#' | 'x' | '-' | '1' | '2' | '4' ? My conclusion is that you can't let this go without saying. It may be that people can figure this spec out pragmatically, but it is not the case that the spec really DEFINES a notation plainly. Personally, I would MUCH rather not see a hairy definition for []. I would rather see see a simple syntax definition of [], EVEN IF it led to more complex notations like: [a-z] | [A-Z] | [0-9] | '_' | '.' | ':' and even if instead of [^abc] you saw: Char - ('a' | 'b' | 'c') Another thing I like about "Char - ('a' | 'b' | 'c')" is that it makes clear what the set is that abc are being removed from. When you don't specify, it might mean Char or it might be some other set. Among other things, using a more cumbersome notation would encourage you to name these odd little collections of characters. Why on earth is "_", ".", and ":" allowed in one case but another arbitrary-looking set in another context?? If you named these better, and used descriptions like: lc-alpha | uc-alpha | digit | nameprefix in place of [a-zA-Z0-9_.:] it would make a lot more sense and would have a normative effect on the terminology used by parser-writers to describe these odd little sets. -kmp ----------- DISCLAIMER: The above are my personal feelings and not necessarily Harlequin's official position.
Received on Friday, 17 April 1998 04:26:39 UTC