- From: Joshua Cranmer <Pidgeot18@verizon.net>
- Date: Fri, 10 Jun 2011 09:52:07 -0700
- To: www-style@w3.org
- Message-id: <4DF24BB7.9000207@verizon.net>
On 6/10/2011 9:37 AM, Jack Smiley wrote: > Hi, > > I have three questions about the Lex regexes used to define the CSS > tokens (section 4.1.1, Tokenization) > > 1) What do the dashes mean in the character class of the second > alternate in the definition of URI > > |url\({w}([!#$%&*-\[\]-~]|{nonascii}|{escape})*{w}\) > > They're not escaped, so I'm assuming they're metacharacters (refer to > ranges), but ranges don't seem to make sense here (what's the > character range from * to [ or from ] to ~)?| Look up the ASCII table. In particular, * to [ is *+,-./;<=>?@[, with 0-9 and A-Z in there as well, and ]-~ is ]^_`{|}~, with a-z as well. In other words, it's every printable character escape space, ", $, ', (, and ). > > 3) Regarding the macro definition for nonascii, why does it go up to > octal 237? (what's special about 237?) Why not octal 177 (decimal 127 > -- standard ASCII) or octal 377 (decimal 255 -- extended ASCII)? Presumably, 238 and above is where you have individually invalid octets for UTF-8. -- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
Received on Friday, 10 June 2011 16:53:12 UTC