[css21] URL token grammar doesn't match reality from Tab Atkins Jr. on 2012-05-08 (www-style@w3.org from May 2012)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Tue, 8 May 2012 15:47:48 +0200
To: www-style list <www-style@w3.org>
Message-ID: <CAAWBYDDTfVCTaSrLNEm8C81ffw3iC-1RP_U9AUzpzwmTBxk1-A@mail.gmail.com>

The CSS2.1 Core Grammar currently specifies that the only way to get a
URL token is with the literal characters "u", "r", and "l"
(case-insensitive.  If you escape any of them, you'll instead get a
FUNCTION token.

This doesn't match reality - IE, FF, and Opera all allow the
characters to be escaped and still invoke the normal URL token
parsing.  Here's a testcase of several things that help distinguish
between the two:
http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1519  IE,
FF, and Opera all return results consistent with always using the
special "unquoted url" production.

I propose we change the Core Grammar as follows:

1. Import the U, R, and L productions from Appendix G.
2. Change the URI token production to:
  {U}{R}{L}\({w}{string}{w}\)
  |{U}{R}{L}\({w}([!#$%&*-\[\]-~]|{nonascii}|{escape})*{w}\)

(We may need to do the same for the leading "u" on the UNICODE-RANGE
token.  I haven't tested to see yet.)

~TJ

Received on Tuesday, 8 May 2012 13:48:38 UTC