- From: Glenn Adams <glenn@skynav.com>
- Date: Wed, 10 Oct 2012 20:35:51 +0800
- To: Simon Sapin <simon.sapin@kozea.fr>
- Cc: www-style@w3.org
- Message-ID: <CACQ=j+f9pdHRJS=w7t+hwGOwOwfBhTzq5Az7HpxtHKDJYXvOgw@mail.gmail.com>
On Wed, Oct 10, 2012 at 3:37 PM, Simon Sapin <simon.sapin@kozea.fr> wrote: > Le 10/10/2012 01:58, Glenn Adams a écrit : > > it would seem a bit easier to not have to admit < \\, NUL > for >> implementation reasons; there is really no loss of functionality if this >> is not supported, since if the author really wants a NUL, then can just >> use < \\, 0, SPACE > or perhaps < \\, 0 > if the context permits. >> > > What would exactly mean to "not admit" a sequence of codepoints? Abort > completely the tokenizer and throw away the rest of the stylesheet, have > the usual error-recovery, or maybe something else? At present, the CSS2.1 tokenizer grammar specifies escape{unicode}|\\[^\n\r\f0-9a-f] which accepts the following inputs as a legal escape, each of which contain a UNICODE C0 Control Character U+005C U+0000 U+005C U+0001 ... U+005C U+0008 U+005C U+000B U+005C U+000E ... U+005C U+001F I'm suggesting that the sequence U+005C U+0000 should *not* be accepted as an escape, which would mean that if it were encountered, it would handled just like other syntax errors in CSS2.1, e.g., the longest matching rule would exclude such an escape when attempting to read a non-terminal that contains such an escape to take an example, let's say we are trying to match badstring1 as follows badstring1 \"([^\n\r\f\\"]|\\{nl}|{escape})*\\? and our input string is < U+0022 (QUOTATION MARK), U+005C (REVERSE SOLIDUS), U+0000 (NULL) > we would match only the following (if we don't accept escaped NULL) < U+0022 (QUOTATION MARK), U+005C (REVERSE SOLIDUS) > which would then leave U+0000 as the next unconsumed input character but we also have a related problem, which is whether to accept U+0000 as an unescaped input character let's say our input were instead < U+0022 (QUOTATION MARK), U+0000 (NULL), U+005C (REVERSE SOLIDUS), U+0000 (NULL) > we will now match badstring1 as < U+0022 (QUOTATION MARK), U+0000 (NULL), U+005C (REVERSE SOLIDUS) > this anomaly (of accepting unescaped NULL but not accepting escaped NULL) is due to the expression [^\n\r\f\\"] which matches all C0 code points except for U+000A (\n), U+000C (\f), U+000D (\r), and thus matches an unescaped U+0000. to summarize, the current syntax (for badstring1) matches (consumes) both U+005C U+0000 and U+0000; so if we were to remove the escaped form, we would probably want to remove the unescaped form i can't personally think of any reason to admit either of these in a CSS input stream when if the author really wishes to include a U+0000, they can do so simply by using the unicode escape form, i.e., \0
Received on Wednesday, 10 October 2012 12:36:40 UTC