Re: [css3-syntax] Null bytes and U+0000 from Tab Atkins Jr. on 2012-10-23 (www-style@w3.org from October 2012)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Tue, 23 Oct 2012 15:49:13 -0700
To: Boris Zbarsky <bzbarsky@mit.edu>
Cc: www-style@w3.org
Message-ID: <CAAWBYDBiLOScVkP=5wScw9NKf+J3TJN1R=iETF46WRMt-y9YmA@mail.gmail.com>

On Tue, Oct 23, 2012 at 3:23 PM, Boris Zbarsky <bzbarsky@mit.edu> wrote:
> On 10/23/12 5:57 PM, Tab Atkins Jr. wrote:
>>> In any case once Gecko reaches end-of-escape it looks at the resulting
>>> hex
>>> value.  If that value is 0, it outputs as many '0' as it hex digit chars.
>>
>> Ah, that violates the "only emit one token per call" invariant that I
>> was told was important.
>
> I don't see why.  The "output" there is not tokens.  It's a string.
> Specifically, the string the escape expands to.

Ah, you're right of course.  For any token where an escape is valid,
it's valid to have digits after the escape.  Never mind, then.

>> A thought occurs to me, though - maybe it makes sense to be consistent
>> with my preferred treatment of literal nulls, and make \0 return
>> U+FFFD as well?
>
> I can probably live with that too.

Given that even FF, which has the sanest treatment of NULL overall,
still screws it up in a few places, I think it's probably best to just
sanitize NULLs out of the character stream entirely.  It's just too
hard to ensure that you're not using C string APIs *somewhere* in the
chain that'll screw up with the NULL.

~TJ

Received on Tuesday, 23 October 2012 22:50:00 UTC