Re: [css2.1] tokenizer syntax - handling escaped null in badstring from Glenn Adams on 2012-10-07 (www-style@w3.org from October 2012)

From: Glenn Adams <glenn@skynav.com>
Date: Sun, 7 Oct 2012 09:56:03 +0800
To: Simon Sapin <simon.sapin@kozea.fr>
Cc: WWW Style <www-style@w3.org>
Message-ID: <CACQ=j+dKV0kz70g_xQmH03iGC4gDCf2FGXtCgtK=uPVCa9SWAA@mail.gmail.com>

On Sat, Oct 6, 2012 at 6:32 PM, Simon Sapin <simon.sapin@kozea.fr> wrote:

> Le 06/10/2012 05:58, Glenn Adams a écrit :
>
>  The current tokenizer syntax [1] specifies:
>>
>> escape          {unicode}|\\[^\r\n\f0-9a-f]
>> badstring1      \"([^\n\r\f\\"]|\\{nl}|{**escape})*\\?
>>
>> Given the following input string:
>>
>> < U+0022 (QUOTATION MARK), U+005C (REVERSE SOLIDUS), U+0000 (NULL) >
>>
>> Does the < U+005C, U+0000 > match escape or does it match the final \\?
>> ? That is, should U+0000 be treated as an escapable character or as EOF
>> (EOS)? The above grammar suggests the former.
>>
>> [1] http://www.w3.org/TR/CSS2/**grammar.html<http://www.w3.org/TR/CSS2/grammar.html>
>>
>
>
> The closest spec text I could find is in §4.1.3:
>
>  (It is undefined in CSS 2.1 what happens if a style sheet does
>> contain a character with Unicode codepoint zero.)
>>
>
> Although it is in a paragraph about hexadecimal escapes, I guess it could
> apply to you example too.


OK, but as the current syntax is written for the escape non-terminal, it
will definitely match an escaped NULL. I would have preferred to see NULL
excluded from escaping, i.e., always treating it as EOF/EOS for the purpose
of defining normative tokenization processing.

Received on Sunday, 7 October 2012 01:56:58 UTC