Re: [css3-syntax] Null bytes and U+0000 from Tab Atkins Jr. on 2012-10-23 (www-style@w3.org from October 2012)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Tue, 23 Oct 2012 14:57:23 -0700
To: Boris Zbarsky <bzbarsky@mit.edu>
Cc: www-style@w3.org
Message-ID: <CAAWBYDASPzvJUtyNv=68tqjpAsx2ESR0nLTsm+dsygdih+9ccA@mail.gmail.com>
On Mon, Oct 22, 2012 at 8:02 PM, Boris Zbarsky <bzbarsky@mit.edu> wrote:
> On 10/22/12 8:28 PM, Tab Atkins Jr. wrote:
>>
>> * Firefox forces "\0" to be interpreted as escaping a literal "0",
>> even if more digits follow it.  (More testing shows that they simply
>> refuse to tokenize \0 as a hex escape, regardless of how many 0s there
>> are.  This means that I was lied to - they have to do 6-char lookahead
>> when parsing stylesheets, not 3-char. ^_^)
>
>
> There is no 6-char lookahead here.  Pure look-behind.  The code sees a
> backslash, parses starts reading chars one at a time and counting how many
> chars it has read, stopping when it reaches 6 chars or a non-hex-digit char.
> Note that you have to keep track of how many chars you read, because if you
> read all 6 chars you still have to go ahead and swallow the following
> whitespace, if any.
>
> In any case once Gecko reaches end-of-escape it looks at the resulting hex
> value.  If that value is 0, it outputs as many '0' as it hex digit chars.

Ah, that violates the "only emit one token per call" invariant that I
was told was important.

> All of this never requires more than 2-char lookahead that I can see. Maybe
> even 1-char; it's a bit hard to tell from this code.
>
> Note that \0 or \000000 are not valid hex escapes in CSS2.1, which is why
> Gecko never treats them as hex escapes, and I'm pretty surprised that WebKit
> does so.  Guess we never had a test in the test suite for little details
> like section 4.1.3?  ;)
>
> There's a nice code comment here about that:
>
>     // "[at most six hexadecimal digits following a backslash] stand
>     // for the ISO 10646 character with that number, which must not be
>     // zero. (It is undefined in CSS 2.1 what happens if a style sheet
>     // does contain a character with Unicode codepoint zero.)"
>     //   -- CSS2.1 section 4.1.3
>     //
>     // Silently deleting \0 opens a content-filtration loophole (see
>     // bug 228856), so what we do instead is pretend the "cancels the
>     // meaning of special characters" rule applied.

Yeah, I definitely don't want to actually *remove* any characters.

A thought occurs to me, though - maybe it makes sense to be consistent
with my preferred treatment of literal nulls, and make \0 return
U+FFFD as well?

>> Next, I tested an actual escaped null, that is, a \ followed by a null.
>
> ...
>
>> * Firefox appears to convert it into a \0, and then act normally.
>
>
> That's ... odd.  I would expect \ followed by an actual null, assuming the
> null gets down to the CSS parser, to just keep the null as a character in
> the tokenization, the same way that \w would work.  Link to the testcase you
> were using here?

I've reproduced a slightly better testcase as
http://www.xanthir.com/etc/css-null-testing/escaped-null-in-selector.html

Here's a repro of what I get out of the CSSOM in FF:

p { background-color: red; color: white; }
.one { background-color: green; }
.two { background-color: green; }
\0 .three { background-color: green; }
.four { background-color: green; }

The actual source is identical, except that it has a literal NULL
instead of "0 ".  It seems that I'm not able to copy-paste the actual
source - when I paste, it's truncated at the NULL. ^_^


>> * Firefox treats it as an invalid value and drops the declaration.
>> Otherwise, acts as normal.
>
> Wouldn't this depend on the value?  I'm pretty sure inside a string, say,
> the null would just be preserved....  But of course if you do something like
> this C string: "color: \\\0" then it's not a valid color and will be
> dropped.  Again, what was the actual test here?

Heh, not quite.  If FF encounters an escaped literal NULL inside of a
string or unquoted url, it truncates the string or url at that point.
It doesn't treat it as invalid, and otherwise parses the token
normally - it just throws away the contents of the token from the
escape onward.


>> 2. Nobody does anything *useful* with nulls, so getting rid of them in
>> the input string is almost certainly just fine.
>
> Modulo issues like https://bugzilla.mozilla.org/show_bug.cgi?id=228856 cited
> in the above code comment.


>> 3. I'd like to know why Firefox refuses to allow a hex-escaped null.
>
> Because that's what CSS2.1 specs, afaict.

Yup, didn't realize that it was explicitly disallowed in the prose.
That's silly.

~TJ
Received on Tuesday, 23 October 2012 21:58:11 UTC