- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Tue, 23 Oct 2012 14:57:23 -0700
- To: Boris Zbarsky <bzbarsky@mit.edu>
- Cc: www-style@w3.org
On Mon, Oct 22, 2012 at 8:02 PM, Boris Zbarsky <bzbarsky@mit.edu> wrote: > On 10/22/12 8:28 PM, Tab Atkins Jr. wrote: >> >> * Firefox forces "\0" to be interpreted as escaping a literal "0", >> even if more digits follow it. (More testing shows that they simply >> refuse to tokenize \0 as a hex escape, regardless of how many 0s there >> are. This means that I was lied to - they have to do 6-char lookahead >> when parsing stylesheets, not 3-char. ^_^) > > > There is no 6-char lookahead here. Pure look-behind. The code sees a > backslash, parses starts reading chars one at a time and counting how many > chars it has read, stopping when it reaches 6 chars or a non-hex-digit char. > Note that you have to keep track of how many chars you read, because if you > read all 6 chars you still have to go ahead and swallow the following > whitespace, if any. > > In any case once Gecko reaches end-of-escape it looks at the resulting hex > value. If that value is 0, it outputs as many '0' as it hex digit chars. Ah, that violates the "only emit one token per call" invariant that I was told was important. > All of this never requires more than 2-char lookahead that I can see. Maybe > even 1-char; it's a bit hard to tell from this code. > > Note that \0 or \000000 are not valid hex escapes in CSS2.1, which is why > Gecko never treats them as hex escapes, and I'm pretty surprised that WebKit > does so. Guess we never had a test in the test suite for little details > like section 4.1.3? ;) > > There's a nice code comment here about that: > > // "[at most six hexadecimal digits following a backslash] stand > // for the ISO 10646 character with that number, which must not be > // zero. (It is undefined in CSS 2.1 what happens if a style sheet > // does contain a character with Unicode codepoint zero.)" > // -- CSS2.1 section 4.1.3 > // > // Silently deleting \0 opens a content-filtration loophole (see > // bug 228856), so what we do instead is pretend the "cancels the > // meaning of special characters" rule applied. Yeah, I definitely don't want to actually *remove* any characters. A thought occurs to me, though - maybe it makes sense to be consistent with my preferred treatment of literal nulls, and make \0 return U+FFFD as well? >> Next, I tested an actual escaped null, that is, a \ followed by a null. > > ... > >> * Firefox appears to convert it into a \0, and then act normally. > > > That's ... odd. I would expect \ followed by an actual null, assuming the > null gets down to the CSS parser, to just keep the null as a character in > the tokenization, the same way that \w would work. Link to the testcase you > were using here? I've reproduced a slightly better testcase as http://www.xanthir.com/etc/css-null-testing/escaped-null-in-selector.html Here's a repro of what I get out of the CSSOM in FF: p { background-color: red; color: white; } .one { background-color: green; } .two { background-color: green; } \0 .three { background-color: green; } .four { background-color: green; } The actual source is identical, except that it has a literal NULL instead of "0 ". It seems that I'm not able to copy-paste the actual source - when I paste, it's truncated at the NULL. ^_^ >> * Firefox treats it as an invalid value and drops the declaration. >> Otherwise, acts as normal. > > Wouldn't this depend on the value? I'm pretty sure inside a string, say, > the null would just be preserved.... But of course if you do something like > this C string: "color: \\\0" then it's not a valid color and will be > dropped. Again, what was the actual test here? Heh, not quite. If FF encounters an escaped literal NULL inside of a string or unquoted url, it truncates the string or url at that point. It doesn't treat it as invalid, and otherwise parses the token normally - it just throws away the contents of the token from the escape onward. >> 2. Nobody does anything *useful* with nulls, so getting rid of them in >> the input string is almost certainly just fine. > > Modulo issues like https://bugzilla.mozilla.org/show_bug.cgi?id=228856 cited > in the above code comment. >> 3. I'd like to know why Firefox refuses to allow a hex-escaped null. > > Because that's what CSS2.1 specs, afaict. Yup, didn't realize that it was explicitly disallowed in the prose. That's silly. ~TJ
Received on Tuesday, 23 October 2012 21:58:11 UTC