Re: [css3-syntax] Null bytes and U+0000 from Tab Atkins Jr. on 2012-10-23 (www-style@w3.org from October 2012)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Tue, 23 Oct 2012 16:14:31 -0700
To: Boris Zbarsky <bzbarsky@mit.edu>
Cc: www-style@w3.org
Message-ID: <CAAWBYDDmyWDC_pkHPf1kvEu7b8MyLLdi+=FrTXWusnfTvTorfw@mail.gmail.com>

Okay, based on my testing and other feedback on this thread, I've
taken a simple hardline stance:

1. Literal NULLs in the input stream are pre-processed into U+FFFD
before the tokenizer sees them.  This affects both bare NULLs and
"escaped" NULLs (with a U+005C \ preceding it), because it happens
before escapes are processed.

2. Hex-escaped NULLs (\0) actually return U+FFFD, same as a hex-escape
for a character beyond the maximum allowed codepoint.

This is now captured in the Syntax draft.

This approach avoids NULLs showing up anywhere in a CSS document
beyond the initial read from the network - everywhere past that, it's
perfectly safe to use C-based string APIs that assume
null-termination.  *Every* browser evinces some bugs related to this,
so avoiding the problem seems like a good idea.

These rules are different than what any current browser does, but they
all do something different (and the only "convergence" is that some
browsers simply always truncate the stylesheet at the first NULL, so
NULLs are practically unusable in stylesheets anyway).

Is everyone okay with this?

~TJ

Received on Tuesday, 23 October 2012 23:23:42 UTC