Re: [cssom] serializing U+0000 NULL from Tab Atkins Jr. on 2015-11-29 (www-style@w3.org from November 2015)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Sat, 28 Nov 2015 17:10:41 -0800
To: Simon Pieters <simonp@opera.com>
Cc: www-style list <www-style@w3.org>, Richard Gibson <richard.gibson@gmail.com>, Glenn Adams <glenn@skynav.com>, Boris Zbarsky <bzbarsky@mit.edu>
Message-ID: <CAAWBYDDHf8dgVrnVu0V2Mvu-J0UnSPDN=YxJssMbKQ-=F8n55A@mail.gmail.com>

On Tue, Nov 24, 2015 at 8:19 AM, Simon Pieters <simonp@opera.com> wrote:
> On Thu, 19 Nov 2015 14:58:31 +0100, Richard Gibson
> <richard.gibson@gmail.com> wrote:
>
>> I'm adding a selector escape function to Sizzle (the selection library
>> backing jQuery), and originally planned to match
>> https://drafts.csswg.org/cssom/#the-css.escape()-method but ran into a
>> snag
>> regarding U+0000 NULL [1]. The draft calls for throwing an
>> InvalidCharacterError exception in §2.1, but that doesn't fit with our
>> library. It doesn't seem to fit within CSS, either, though. \0 is a valid
>> escape per CSS Syntax [2], although it is returned as U+FFFD REPLACEMENT
>> CHARACTER, and all major browsers but one now respect it
>> (e.g., jQuery("<div><span
>> data-attr='&#xFFFD;'/></div>")[0].querySelector("[data-attr='\\0 ']")
>> returns the span)—the holdout is Safari, in which [data-attr='\\0 '] is
>> treated as valid but never matches anything.
>>
>> So I'm wondering if you'd be willing to escape NULL rather than throwing
>> an
>> exception. And if not, perhaps you could shed some light on why the
>> exception was added in the first place?
>>
>> A brief history of NULL:
>> * 2010 dbaron includes NULL in proposed escaping language (presumably as
>> "\0 "): https://lists.w3.org/Archives/Public/www-style/2010Feb/0162.html
>> * 2011 CSSOM WD includes escaping NULL:
>> http://www.w3.org/TR/2011/WD-cssom-20110712/#common-serializing-idioms
>> * 2012 (not directly related) Tab suggests a codification in input
>> preprocessing of replacement with U+FFFD REPLACEMENT CHARACTER:
>> https://lists.w3.org/Archives/Public/www-style/2012Oct/0687.html
>> * 2013 CSSOM WD introduces InvalidCharacterError exceptions for escaping
>> NULL (rethrown by CSS.escape):
>> http://www.w3.org/TR/2013/WD-cssom-20131205/#common-serializing-idioms
>> * 2015 no change:
>> https://drafts.csswg.org/cssom/#common-serializing-idioms
>>
>> [1] https://github.com/jquery/sizzle/pull/364#discussion_r44619782
>> [2]
>>
>> http://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#consume-an-escaped-code-point0
>> and http://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#escaping
>
>
> This was apparently changed in
> https://github.com/w3c/csswg-drafts/commit/d83fba15cef8e0afc5b826cab41fa36293fb4c2f
> (before CSS.escape existed).
>
> I suppose Glenn changed it because \0 was not valid CSS 2.1, but I don't
> find any direct statement regarding CSSOM from that month's archive. A
> related email is
> https://lists.w3.org/Archives/Public/www-style/2012Oct/0646.html
>
> I don't mind changing back to escaping U+0000 as \0, or maybe \uFFFD
> directly?

Since all CSS parsing accepts U+0000 or the "\0" escape sequence, and
just converts it to U+FFFD, CSS.escape() should allow it as well.  I'm
fine with doing an eager replacement with U+FFFD, or else just
escaping it as \0, whichever is simpler in the spec.

~TJ

Received on Sunday, 29 November 2015 01:11:28 UTC