W3C home > Mailing lists > Public > www-style@w3.org > June 2009

[css2.1] eliminating arbitrary back-up in lexical rules

From: Zack Weinberg <zweinberg@mozilla.com>
Date: Tue, 16 Jun 2009 12:46:12 -0700
To: W3C Emailing list for WWW Style <www-style@w3.org>
Message-ID: <20090616124612.206fd193@mozilla.com>
The CSS 2.1 core lexical productions for COMMENT and URI tokens can
absorb an arbitrary amount of text but then fail to match, because
their terminating punctuation is missing.  This requires a conforming
implementation to back up an arbitrary distance and restart, which can
be very difficult to implement.  As no syntactically correct document
contains un-terminated COMMENTs or URIs, the extra code required is

Empirically, nobody seems to implement backing up for un-terminated
comments, and browsers are not consistent about backing up for URIs
lacking the close paren (it is not possible to test what happens for
quoted URIs lacking the close quote, because the conformant parse after
backing up would simply absorb all the following text in an INVALID
token).  See the attached test case.

I would like to propose that the following additional INVALID
productions be added to the core tokenization rules to avoid this
awkward requirement for implementors.  With this change in place, it is
still necessary to back up more than one character in some cases when
CDO, CDC, or UNICODE-RANGE fail to match, but not to back up over an
arbitrary amount of text.

New macros:

invalid-comment1    \/\*[^*]*\*+([^/*][^*]*\*+)*
invalid-comment2    \/\*[^*]*(\*+[^/*][^*]*)*

invalid-url1    url\({w}([!#$%&*-~]|{nonascii}|{escape})*{w}
invalid-url2    url\({w}{invalid}

Changed production:

INVALID    {invalid}|{invalid-comment1}|{invalid-comment2}

Make analogous changes to Appendix G.  For clarity, it might be
nice to rename the existing {invalid} macro to {invalid-string}, as


Received on Tuesday, 16 June 2009 19:46:52 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:38:27 UTC