Re: [css2.1] tokenizer syntax - handling escaped null in badstring

Le 07/10/2012 06:30, Glenn Adams a écrit :
> I'm referring to what the spec would have one do, as opposed to what UAs
> actually do. Do you agree the tokenizer rule as specified would consume
> an escaped NULL (whether or not a UA actually allows a NULL to get that
> far)?

Yes, this is my understanding of the regexps that define tokenizer. 
U+0000 matches the [^\n\r\f0-9a-f] part of the 'escape' macro and thus 
can be escaped with a back-slash. Or it can be unescaped, a normal 
character inside a quoted string, or a DELIM token outside.

If we ignore the "undefined" part, U+0000 in CSS behaves just like 
U+0001 and many other code points. And I think it should. Zero as a 
string terminator is not universal, it is only an implementation detail 
of some systems. Sure, we can accommodate such systems by allowing them 
to use U+FFFD or something, but I see no reason to make U+0000 be a 
terminator on systems that are perfectly fine with a null byte or 
codepoint in the middle of a string.

In any case, any change (from undefined) in this area will probably go 
in css3-syntax rather than CSS 2.1.

Simon Sapin

Received on Sunday, 7 October 2012 06:08:29 UTC