Re: [css3-values] inaccurate statements about syntax/grammar from Kang-Hao (Kenny) Lu on 2012-04-06 (www-style@w3.org from April 2012)

From: Kang-Hao (Kenny) Lu <kennyluck@csail.mit.edu>
Date: Fri, 06 Apr 2012 08:14:11 +0800
To: "Tab Atkins Jr." <jackalmage@gmail.com>
CC: Simon Sapin <simon.sapin@kozea.fr>, WWW Style <www-style@w3.org>
Message-ID: <4F7E3553.8090105@csail.mit.edu>

(12/04/06 2:28), Tab Atkins Jr. wrote:
> On Thu, Apr 5, 2012 at 5:31 AM, Simon Sapin <simon.sapin@kozea.fr> wrote:
>> I agree with Kenny that "component value" is ill-defined.
>>
>> According to section 2.1 of V&U, <'background-position'> is a single
>> component but "top left" (which is a valid <'background-position'>) is made
>> of two components. Which is it?
> 
> Hm, yeah, that is kind of inconsistent.  fantasai, we'll have to work
> on this definition a bit more.  :/

I never thought that was a definition :p It is not marked up with a <dfn>.

>> For implementing a parser, I found useful to have an intermediate step
>> between tokenization and parsing: turn a flat sequence of tokens into a
>> "regrouped" tree of {} [] () pairs and atomic tokens. The algorithm is
>> something like this:
>>
>> For {, [, ( and FUNCTION tokens, find the matching }, ] or ) token. The
>> whole sub-sequence is replaced by a single "grouped token" which contains a
>> list of tokens of everything between the start and end tokens. That list is
>> recursively "regrouped".
>>
>> Having this tree structure makes much easier things like error recovery
>> (ignore a whole at-rule and its {} body) or parsing of functional values.
>>
>> All this is only an implementation detail, but could a similar concept be
>> useful to define "component value" in the spec? rgb(0, 127, 255) would be a
>> single component that contains 5 sub-components (assuming you count commas
>> but not white space). Nested functional notation (like rgb() in a gradient)
>> would form a tree of components.
> 
> Yes, that's a more useful definition of "token" for spec purposes,
> once you rise above raw grammar concerns.  (Honestly, I kinda want to
> just write an explicit parser a la HTML and have it emit tokens like
> that.)

I fully support such an effort. In particular, I am looking forward to
the "Tree Construction" part, as I share a lot of Peter Moulder's
questions about error handling in, in particular, block parsing[1]. The
current grammar+rule based approach just makes me feel like I am just
too stupid and the spec is too smart. I am looking forward to a state
machine that I can happily trace it.

>> Apparently, UAs are expected to somehow find consecutive DELIM and
>> {NUMBER,DIMENSION,PERCENTAGE} tokens to handle signed numbers. The situation
>> is not as bad as if "-2" was an IDENT, but I still think that the core
>> grammar should be changed to include the optional sign in the {num} macro.
>> This would make many details much easier to deal with, and the only
>> effective change would be that "+/**/100px" would not be valid anymore. But
>> does any web page rely on that?
> 
> If they do, they deserve any pain we cause them.  I agree that the
> sign should be part of the NUM token.

No, they don't, and users of those UAs (WebKit and Opera) don't deserve
any performance regression just because spec writers don't want to say
"If a UA treats <number> as a single token, the UA must do A. Otherwise,
the UA must do B."

As I said, in this case, I think we should make this a multiple choice.
The pain should be on the spec writers.

But if WebKit and Opera people and other implementers of the current
grammar think this can be easily fixed, I'll withdraw my opinion.


Cheers,
Kenny

Received on Friday, 6 April 2012 00:14:42 UTC