Re: [css3-syntax] Null bytes and U+0000

Okay, I did some basic testing.  I was actually surprised - the
results were more varied than I had thought they would be.

All of these tests were done with the following HTML doc (external
stylesheet to avoid any possible complications with HTML

<!doctype html>
<link rel=stylesheet href=test.css>
var str = '';
window.onload = function() {
 [][0].cssRules).forEach(function(e) {
  str += e.cssText+'\n';
 document.querySelector('pre').textContent = str;

First, I tested putting a null inside of a stylesheet.  I wrote up a
basic sheet applying styles to some of the elements in the HTML doc,
then reopened in hex mode and added a null byte.  (Thanks, Sublime!)

* Firefox and Opera treats the null appear to just do normal
tokenization of the null as a DELIM token.  If it appears in a
selector, they throw away the whole rule (I presume because it doesn't
match the grammar of Selectors).  Elsewhere, they do normal
* IE and WebKit truncate the stylesheet at the point they discover the
null.  They literally just drop the rest of the file on the floor.

Next, I tested a hex-escaped null, written as \0.  If you put the
escaped null in most places, it does the expected thing, triggering
normal error-recovery because it's not valid anywhere.  The browsers
do weird things if you put it in a selector, though:

* WebKit treats it as an escaped null, no problem.  Notably, it does
*not* throw away rules which contain it in their selector (though it
obviously doesn't match anything) - you can examine the cssText and
see the null, clear as day.
* Firefox forces "\0" to be interpreted as escaping a literal "0",
even if more digits follow it.  (More testing shows that they simply
refuse to tokenize \0 as a hex escape, regardless of how many 0s there
are.  This means that I was lied to - they have to do 6-char lookahead
when parsing stylesheets, not 3-char. ^_^)
* IE and Opera both do something weird - if it appears in a selector,
they keep the rule, but it doesn't match anything, and they return the
empty string for cssText and selectorText on it.  The rule is
definitely kept, though - if you ask for one of the properties set in
it, it's properly returned.

Next, I tested an actual escaped null, that is, a \ followed by a null.

In a selector:
* IE does the same thing as \0.
* Chrome appears to truncate the sheet again, then *claims that the
stylesheet contains no rules*, despite clearly applying the
*preceding* rules to the page.
* Firefox appears to convert it into a \0, and then act normally.
* Opera claims that the selector is empty, but otherwise preserves the
rule.  (It doesn't match anything.)

In a value:
* IE ignores the rest of the property's value, but appears to treat it
as *not a syntax error*, and keep the rule in the CSSOM (just missing
some of the value).  Otherwise, acts as normal.
* Chrome acts the same as in a selector.
* Firefox treats it as an invalid value and drops the declaration.
Otherwise, acts as normal.
* Opera is doing something stupid that makes me think it's doing funny
cache stuff, so I can't trust it.

Here are my conclusions:
1. Browsers are remarkably divergent in behavior here, so I can
probably just spec something sane and be done with it.
2. Nobody does anything *useful* with nulls, so getting rid of them in
the input string is almost certainly just fine.
3. I'd like to know why Firefox refuses to allow a hex-escaped null.
If there's a good reason, I can disallow it.

My recommendations:
1. Go ahead and replace nulls in the input stream with U+FFFD.  Most
browsers do stupid, stupid things with nulls, and the one good browser
(FF) should act the same with U+FFFD as it does with U+0000.  Avoiding
the problem seems to be the easiest path to convergence.
2. Unless Firefox has a good reason to disallow \0 (like, the person
who authored their grammar was just overzealous), I'll allow \0 as a
valid hex escape.


Received on Tuesday, 23 October 2012 00:28:57 UTC