Re: [css3-values] calc() and whitespaces around '+' and '-' (again)

(12/05/10 17:37), L. David Baron wrote:
> I think if we want to change this, we should just change the
> dimension token throughout CSS

I don't like this at first because it doesn't solve Nth parsing on the
way, but after thinking for a while, I think this is quite nice because:

  - An argument[1] to require whitespaces around '+' and '-' was that we
would want to add keywords to calc() in the future. This direction
certainly avoids that problem at all.
  - We shouldn't cater to edge cases in Nth parsing.

I was wondering if this will break contents on the Web because something
like 'background-position: 10em-2em;' would start to work after this
proposal, so I ran a grep against dotnetdotcom's web200904[2] and found
that none of the pages has something like that. (Though admittedly, this
collection consists of mostly HTML files and it would have been better
if we had a public .css collection.)

I'll list several options in this direction:

A1. Part of Andrei Polushin's proposal[3]

  nmstart   [_a-z]|{nonascii}|{escape}
  nmchar    [_a-z0-9-]|{nonascii}|{escape}

  alpha     [a-z]|{nonascii}|{escape}
  alnum     [_a-z0-9]|{nonascii}|{escape}

  restrict  {alpha}{alnum}*
  simple    {nmstart}{nmchar}*
  prefixed  [_-]{restrict}[-]{simple}
  unit      {restrict}|{prefixed}

  %%

  {num}{unit}   {return DIMENSION;}

In other words, a dash allowed in the unit when there's *another* dash
(for vendor prefix). I skipped the part for IDENT.

A2. Simply change to

  {num}{alnum}*   {return DIMENSION;}


Of these two, I would say I like A2 better.

B. Or we can be even more aggressive

  {num}{alpha}*   {return DIMENSION;}

This is obviously more dangerous in terms of breaking the Web. There are
about 30 pages out of 600k pages in web200904 that have declarations
like "padding: 0px0px;" and the like. I check almost all of them. Mostly
are no longer accessible or fixed (well, this collection was made three
years ago). Some of them have no effect whether they are successfully
parsed or not. There is only one declaration that would be affected, but
it is a 'MARGIN:1px3px' on a standalone element and has no visual
difference whether it's parsed or not.

The advantage of this is that CSS minimizer can be significantly
benefited and it is also more consistent because you can now do
'padding: 10%10%' but not 'padding: 10px10px'.


(12/05/10 17:37), L. David Baron wrote:
> (rather than making the tokenizer context-sensitive, which is a huge
> pain),

(12/05/10 23:51), Tab Atkins Jr. wrote:
> On Thu, May 10, 2012 at 11:27 AM, Kang-Hao (Kenny) Lu
> <kennyluck@csail.mit.edu> wrote:
>> 3. You cannot do tokenization and parsing as two passes as parsing
>> calc() changes the sate of the tokenizer.
>> For 3., it isn't a concern for Gecko as far as I can tell, but I
>> don't about other browsers.
>
> I'm not sure what all browsers do, but at the very least it makes it
> harder to spec. ^_^  I suspect that browsers probably generally use
> integrated tokenizer/parsers, but simpler implementations that aren't
> as perf-sensitive might use separate ones, as I think they're easier.

For what it's worth, WebKit already does mode switching[4] stuff for Nth
parsing (though admitted it also has a bunch of crazyness and I wouldn't
be surprised if gets rewritten again* eventually), and I don't think it
would be too difficult for Gecko too.

But I agree that changing DIMENSION would be better. I only worry if
people would say we don't want to touch the core grammar because it's
been there for 10+ years.


[1] http://lists.w3.org/Archives/Public/www-style/2009Apr/0005
[2] http://dotnetdotcom.org/#inde (This file was also used in research
around the quriks mode document.)
[3] http://lists.w3.org/Archives/Public/www-style/2008Mar/0179
[4]
http://trac.webkit.org/browser/trunk/Source/WebCore/css/CSSParser.h?rev=116752#L390
* It was rewritten from a machine generated lexer to hand-coded one 3
months ago, it seems.


Cheers,
Kenny

Received on Friday, 11 May 2012 12:52:53 UTC