Re: [css2.1] eliminating arbitrary back-up in lexical rules from L. David Baron on 2009-08-06 (www-style@w3.org from August 2009)

From: L. David Baron <dbaron@dbaron.org>
Date: Wed, 5 Aug 2009 21:32:46 -0700
To: W3C Emailing list for WWW Style <www-style@w3.org>
Message-ID: <20090806043246.GA15593@pickering.dbaron.org>

On Tuesday 2009-06-16 12:46 -0700, Zack Weinberg wrote:
> invalid-url1    url\({w}([!#$%&*-~]|{nonascii}|{escape})*{w}
> invalid-url2    url\({w}{invalid}

It's worth noting that this proposal *still* doesn't eliminate all
arbitrary backtracking in the formal tokenizer (although I think it
produces the correct results).  In particular, it still requires
that:

  url(arbitrarily-long-text f)

be tokenized starting with a FUNCTION token, and then using
parenthesis-matching in the parser.

It's possible this could be fixed using a third invalid-url token,
though, although that would give different results for the case:
  url(foo {)
and perhaps also for the case:
  url(foo ()
than would doing the parenthesis/brace/bracket matching according to
the parsing rules.

I think I actually prefer using the parsing rules for
parenthesis/bracket/brace matching.  One way to represent this
"formally" may be by giving up on representing url() as a single
token, but instead switching the tokenizer into a different state
(what flex calls start conditions) while inside url().

-David

-- 
L. David Baron                                 http://dbaron.org/
Mozilla Corporation                       http://www.mozilla.com/

Received on Thursday, 6 August 2009 04:33:22 UTC