- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Tue, 23 Oct 2012 18:23:38 -0400
- To: "Tab Atkins Jr." <jackalmage@gmail.com>
- CC: www-style@w3.org
On 10/23/12 5:57 PM, Tab Atkins Jr. wrote: >> In any case once Gecko reaches end-of-escape it looks at the resulting hex >> value. If that value is 0, it outputs as many '0' as it hex digit chars. > > Ah, that violates the "only emit one token per call" invariant that I > was told was important. I don't see why. The "output" there is not tokens. It's a string. Specifically, the string the escape expands to. So as a concrete example, say you have something like this: abc\0000def Gecko would tokenize this as a single token: the identifier "abc0000def". Just like given: abc\61 def Gecko would tokenize as the single identifier token "abcadef". Note that in the typical string encodings people use (UTF-16 and UTF-8), a CSS escape can easily expand to multiple code units in general, so the only special thing about the \0 stuff is that it can expand into up to 6 code units, whereas most Unicode chars expand into at most 2 in UTF-16 and at most 4 in UTF-8. > A thought occurs to me, though - maybe it makes sense to be consistent > with my preferred treatment of literal nulls, and make \0 return > U+FFFD as well? I can probably live with that too. > I've reproduced a slightly better testcase as > http://www.xanthir.com/etc/css-null-testing/escaped-null-in-selector.html > > Here's a repro of what I get out of the CSSOM in FF: > > p { background-color: red; color: white; } > .one { background-color: green; } > .two { background-color: green; } > \0 .three { background-color: green; } > .four { background-color: green; } Ah, you're seeing a bug in the serializer there, looks like. Parsing the original text tokenizes an escaped null as a single identifier char, and puts a null in as the tag name in that third selector. But then when you serialize and identifier, Gecko does: 122 // Escape all characters below 0x20 And proceeds to snprintf with a format string of "\\%hX ", which is broken for null given how Gecko parses \0. > Heh, not quite. If FF encounters an escaped literal NULL inside of a > string or unquoted url, it truncates the string or url at that point. > It doesn't treat it as invalid, and otherwise parses the token > normally - it just throws away the contents of the token from the > escape onward. Ah, this is amusing. The parser actually keeps the null just fine. But for strings the object model stores them as a pointer and no length and relies on strlen to get the length, which of course truncates at embedded null whenever someone (which includes the rendering code) asks the object model for anything. -Boris
Received on Tuesday, 23 October 2012 22:24:07 UTC