Re: [css3-syntax] Null bytes and U+0000 from Glenn Adams on 2012-10-24 (www-style@w3.org from October 2012)

From: Glenn Adams <glenn@skynav.com>
Date: Wed, 24 Oct 2012 08:03:11 +0800
To: "Tab Atkins Jr." <jackalmage@gmail.com>
Cc: Boris Zbarsky <bzbarsky@mit.edu>, www-style@w3.org
Message-ID: <CACQ=j+fOrS_z6stj+zdG18saDUoG6n+ajuwHe9r3sDPz+47h7A@mail.gmail.com>

On Wed, Oct 24, 2012 at 7:14 AM, Tab Atkins Jr. <jackalmage@gmail.com>wrote:

> Okay, based on my testing and other feedback on this thread, I've
> taken a simple hardline stance:
>
> 1. Literal NULLs in the input stream are pre-processed into U+FFFD
> before the tokenizer sees them.  This affects both bare NULLs and
> "escaped" NULLs (with a U+005C \ preceding it), because it happens
> before escapes are processed.
>
> 2. Hex-escaped NULLs (\0) actually return U+FFFD, same as a hex-escape
> for a character beyond the maximum allowed codepoint.
>
> This is now captured in the Syntax draft.
>
> This approach avoids NULLs showing up anywhere in a CSS document
> beyond the initial read from the network - everywhere past that, it's
> perfectly safe to use C-based string APIs that assume
> null-termination.  *Every* browser evinces some bugs related to this,
> so avoiding the problem seems like a good idea.
>
> These rules are different than what any current browser does, but they
> all do something different (and the only "convergence" is that some
> browsers simply always truncate the stylesheet at the first NULL, so
> NULLs are practically unusable in stylesheets anyway).
>
> Is everyone okay with this?
>

i'll need to look at the actual language you drafted, but sounds good in
general

Received on Wednesday, 24 October 2012 00:04:00 UTC