- From: Zack Weinberg <zweinberg@mozilla.com>
- Date: Tue, 16 Jun 2009 12:46:12 -0700
- To: W3C Emailing list for WWW Style <www-style@w3.org>
- Message-ID: <20090616124612.206fd193@mozilla.com>
The CSS 2.1 core lexical productions for COMMENT and URI tokens can absorb an arbitrary amount of text but then fail to match, because their terminating punctuation is missing. This requires a conforming implementation to back up an arbitrary distance and restart, which can be very difficult to implement. As no syntactically correct document contains un-terminated COMMENTs or URIs, the extra code required is pointless. Empirically, nobody seems to implement backing up for un-terminated comments, and browsers are not consistent about backing up for URIs lacking the close paren (it is not possible to test what happens for quoted URIs lacking the close quote, because the conformant parse after backing up would simply absorb all the following text in an INVALID token). See the attached test case. I would like to propose that the following additional INVALID productions be added to the core tokenization rules to avoid this awkward requirement for implementors. With this change in place, it is still necessary to back up more than one character in some cases when CDO, CDC, or UNICODE-RANGE fail to match, but not to back up over an arbitrary amount of text. New macros: invalid-comment1 \/\*[^*]*\*+([^/*][^*]*\*+)* invalid-comment2 \/\*[^*]*(\*+[^/*][^*]*)* invalid-url1 url\({w}([!#$%&*-~]|{nonascii}|{escape})*{w} invalid-url2 url\({w}{invalid} Changed production: INVALID {invalid}|{invalid-comment1}|{invalid-comment2} |{invalid-url1}|{invalid-url2} Make analogous changes to Appendix G. For clarity, it might be nice to rename the existing {invalid} macro to {invalid-string}, as well. zw
Attachments
- text/html attachment: unterm-css.html
Received on Tuesday, 16 June 2009 19:46:52 UTC