Re: [CSS21] questions about Lex regexes used to define tokens

On 6/10/2011 9:37 AM, Jack Smiley wrote:
> Hi,
>
> I have three questions about the Lex regexes used to define the CSS 
> tokens (section 4.1.1, Tokenization)
>
> 1) What do the dashes mean in the character class of the second 
> alternate in the definition of URI
>
> |url\({w}([!#$%&*-\[\]-~]|{nonascii}|{escape})*{w}\)
>
> They're not escaped, so I'm assuming they're metacharacters (refer to 
> ranges), but ranges don't seem to make sense here (what's the 
> character range from * to [ or from ] to ~)?|
Look up the ASCII table. In particular, * to [ is *+,-./;<=>?@[, with 
0-9 and A-Z in there as well, and ]-~ is
]^_`{|}~, with a-z as well.

In other words, it's every printable character escape space, ", $, ', (, 
and ).
>
> 3) Regarding the macro definition for nonascii, why does it go up to 
> octal 237? (what's special about 237?) Why not octal 177 (decimal 127 
> -- standard ASCII) or octal 377 (decimal 255 -- extended ASCII)?
Presumably, 238 and above is where you have individually invalid octets 
for UTF-8.

-- 
Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth

Received on Friday, 10 June 2011 16:53:12 UTC