Re: [css3-syntax] Escaping U, R or L in url() tokens from Simon Sapin on 2013-01-04 (www-style@w3.org from January 2013)

From: Simon Sapin <simon.sapin@kozea.fr>
Date: Fri, 04 Jan 2013 14:08:56 +0100
To: Bjoern Hoehrmann <derhoermi@gmx.net>
CC: www-style@w3.org
Message-ID: <50E6D468.6050208@kozea.fr>
Le 03/01/2013 18:07, Bjoern Hoehrmann a écrit :
> Allowing escapes while keeping the `url(...)` notation a single token
> would complicate the tokenizer without adding anything that people do
> care about using.

It’s not that complicated when expressed in terms of a state machine, 
like the new[1] css3-syntax does. See my patch attached to another email 
in this thread:

http://lists.w3.org/Archives/Public/www-style/2013Jan/0019.html

[1] By new I mean the 2012 ED not yet published as a WD, as opposed to 
the 2003 WD.
http://dev.w3.org/csswg/css3-syntax/


> The selection of tokens is probably more of an acci-
> dent, there are tokens for constructs CSS syntax is actually using,
> like there is a token for `|=` in CSS 2.1, but there is none for `^=`,
> which is a new construct in CSS3.

I argued for removing those in css3-syntax. `|=` is actually useful to 
disambiguate an attribute operator from a namespace separator in an 
attribute name. (Without it, a selector parser becomes more complex or 
requires more token look-ahead or both.) Other attribute operators can 
just be two DELIM tokens without much impact.


> If CSS syntax could be designed from scratch, it's quite possible the
> `url(...)` notation would be just like any other functional notation,
> and no token on its own. That would avoid some of the issues mentioned
> in <http://lists.w3.org/Archives/Public/www-style/2010Jul/0499.html>,
> but I suspect it's too late for that now (see the linked examples).

Yes it would have been a good to make url() a normal function (it would 
maybe require quoted strings though.) And yes it is too late.


> Allowing escapes in 'url' should not be a problem though, but I think
> the CSS Working Group should first decide whether it actually wants the
> tokenizer to be frozen, in which case this should be kept as-is, or if
> it wants to modify it to accomodate new constructs like `^=` above, in
> which case it could go either way.

I think it possible to change the tokenizer now in the new css3-syntax 
and I argue that CSS 2.1 has issues that should be fixed (like the sign 
not being part of NUMBER tokens.)

But I also that such changes are costly and we should refrain from 
making more of them after css3-syntax has stabilized a bit.

This is not the case now AFAIK, but the tokenizer could become 
effectively frozen in the future if we expose tokens in API, maybe for 
variables or for a lower-level cssom-values.


> (I do note, once more, that we are not blessed with independently use-
> able CSS parsers that are fully conforming, and as a result we lack
> tools like validators and pretty printers that you can use even when
> the style sheets you want to process contain unusual syntax; and when
> the core syntax is changed every couple of months, that situation isn't
> likely to improve.)

tinycss (which I made for using in WeasyPrint and CairoSVG) is 
independently usable and mostly conforming:

http://packages.python.org/tinycss/

I’d like to make it fully conforming once css3-syntax stabilizes a bit, 
maybe at the next WD.

Most changes to tinycss’s tokenizer (eg. having INTEGER and NUMBER 
tokens as now vs. NUMBER tokens with an 'is integer' flag as spec’ed.) 
break API compatibility, so for now I recommend requiring a specific 
version if you use it.

Cheers,
-- 
Simon Sapin
Received on Friday, 4 January 2013 13:09:24 UTC