- From: Kent Pitman <kmp@harlequin.com>
- Date: Sat, 20 Jun 1998 21:10:07 -0400 (EDT)
- To: www-style@w3.org
- Cc: kmp@harlequin.com
In PR-CSS2-19980324, in 4.1.1 Tokenization ... * It says that nmstart permits upper and lower case alphabetics, but it says that nmchar permits ONLY lower case alphabetics. Is the omission of uppercase for nmchar really intentional? For example, bullet 2 in 4.1.3 mentions A-Za-z, i.e. both cases, in seeming contradiction to the specified grammar. IF THIS DESCRIPTION OF NMCHAR IS WRONG, THE ERROR SEEMS SEVERE TO ME. * Do you really mean to allow unicode to only be specified in lowercase a-f? Personally, I find this gratuitous, since I prefer uppercase hex, but it's not fatal. * Also, am I right that 'escape' means to include Space through Tilde by use of the notation [ -~\200-\4177777]? Surely something more perspicuous could be done. I almost missed the use of hyphen as a connective visually, and thought this said "space and hyphen and tilde..." This is not strictly a bug, it's just really ugly, and made worse by your choice of font, in which - and ~ are virtually indistinguishable. * Same comment for string1 and string2, where it took me forever to figure out why A-Z are missing. I personally think using hyphen to string together anything other than conceptually meaningful sequences like a-f, a-z, and 0-9 is not really that good. I guess I can live with sequences of codes. I'd rather see \050-\177 than "(-~". * I find the apparent use of decimal to describe character codes in running text [... "space" (Unicode code 32), "tab" (9), ...] and octal in your tables [... \200-\4177777 ...] and the fact that css will ultimately expect me to write in hex (the presumed reason for specifying that macro token 'unicode' can take on [0-9a-f]) to be IMMENSELY confusing. You're using Hex, Decimal, and Octal on the same page with no indication about which is in use where. Whether this is a bug or not is hard to say, but speaking as an editor of language standards myself, this is a good way to confuse readers. You should really fix this, probably to use Hex uniformly since probably that aspect of the language is fixed and the rest can be adjusted most easily to match. (I abhor hex, but would rather see hex used consistently than a mix used in a way that makes it hard to know what's in use at any given time. e.g., perhaps like the XML spec, you could use [#x7F-#x10FFFF] instead of [\200-\4177777].) * Isn't the sequence " -~\200-\4177777" the SAME as the simpler sequence " -\4177777"? That is, isn't " -~" the same as "\040-\177" and aren't "\177" and "\200" one after another? I admit it's late at night as I write this and it's been a long time since I used octal, but it sure looks like something that could be usefully contracted. It looks otherwise like a split sequence instead of bascially "any unicode letter from space upward to 04177777". * Are the conventions for these notations like [...] that you're using defined anywhere? I looked quickly and didn't see htem. Is the remark about them being "Lex-style" the definition? What if I don't have a copy of Lex? Could you perhaps offer a pointer to a publicly accessible copy of its spec? Or could you explain the relevant parts so I don't need to go in search? An actual standard makes a good reference, but if Lex doesn't have an associated standard, it's probably not fair to assume your reader uses it. This is not a way to write a standard that stands on its own through the ages. If I'm looking at an obsolete document and there's a later fix, please do let me know. -kmp
Received on Sunday, 21 June 1998 14:40:48 UTC