RE: [css2.1] Comments within url() tokens

See: http://www.w3.org/TR/CSS21/syndata.html#comments

Specifically: 4.1.9 Comments

'Comments begin with the characters "/*" and end with the characters "*/".
They may occur anywhere between tokens, and their contents have no influence
on the rendering. Comments may not be nested.'

And: 4.1.1 Tokenization
'COMMENT tokens do not occur in the grammar (to keep it readable), but any
number of these tokens may appear anywhere between other tokens.'

This literally means that comments can appear anywhere (except within a
token), the only rule that overrides this is in 4.1, 'In this specification,
the expressions "immediately before" or "immediately after" mean with no
intervening whitespace or comments'.

In general, if there is ambiguity between the prose and the grammar, the
prose wins. If you look at the grammar very literally and ignore the prose,
the only place comments are explicitly allowed is between the "!" and the
"important".

Where this gets interesting is that both 4.1.1 and Appendix G.2 define
"url(http://example.com/)" as a single token, which means that comments are
not allowed within. I know that Gecko tokenizes this as three separate
tokens: "url(", "http://example.com/", and ")"; a FUNCTION token, a URL
token and a symbol token.

We could update the spec to allow comments there, although changes to the
core grammar are not taken lightly. The problem is that URLs have quite
different tokenization rules from all other CSS tokens and there is overlap
here. For example, http:/example.com/*xx*/ is a perfectly valid URL
(according to rfc1738), now, is this a URL or a URL followed by a comment?

It could be relatively safe to allow comments within a url() function so
long as there is whitespace between the comment and the URL part (unless the
URL is quoted), something like:

{U}{R}{L}"("({w}|{comment})*{string}({w}|{comment})*")"	{return URI;}
{U}{R}{L}"("{w}({comment}{s})*{url}({s}{comment})*{w}")"	{return
URI;}

If all the major browsers already support this, the change could possibly be
made... although this requires potentially unlimited look-ahead to determine
if the first "/*...*/" you find is a comment or a URL (which is generally
frowned upon), ie: url(/*comment*/ /*another*/ relative_url.png)
vs
url(/*comment*/)
which is actually a perfectly valid URL.

I think it would safest to not make this change and leave all comments
within URL functions illegal. 

Peter


PS. 
Looking more closely at the grammar it looks like the "url" token is
incorrectly specified as well:
url		([!#$%&*-~]|{nonascii}|{escape})*

There seem to be a few missing allowable characters, like [_a-z0-9], '/',
'.', and ':'. These seem to be important IMO. Something like:
url		([_a-z0-9!#$%&*-~/:]|"."|{nonascii}|{escape})*

Unless I'm missing something here...


-----Original Message-----
From: www-style-request@w3.org [mailto:www-style-request@w3.org] On Behalf
Of Zack Weinberg
Sent: Wednesday, September 24, 2008 4:44 PM
To: www-style@w3.org
Subject: [css2.1] Comments within url() tokens


There are longstanding bugs against Mozilla (since at least 2006) which
presume that comments are allowed to appear within whitespace within a
URI token; that is, this is a valid URI token:

 url( /**/ http://example.com/ /**/ )

I do not see text licensing this in CSS2.1.  Both the core (4.1.1) and
the appendix G tokenization rules for URI permit only {w} between the
parentheses and the text of the URL (whether or not quoted). Section
4.3.4 says "optional whitespace" is allowed, and defines "whitespace"
by reference as the S terminal, which does not include comments.

I suspect that for interoperability's sake comments ought to be
allowed...

zw

Received on Friday, 26 September 2008 22:36:51 UTC