Re: [css3-values] inaccurate statements about syntax/grammar

(12/04/06 8:14), Kang-Hao (Kenny) Lu wrote:
>>> For implementing a parser, I found useful to have an intermediate step
>>> between tokenization and parsing: turn a flat sequence of tokens into a
>>> "regrouped" tree of {} [] () pairs and atomic tokens. The algorithm is
>>> something like this:
>>>
>>> For {, [, ( and FUNCTION tokens, find the matching }, ] or ) token. The
>>> whole sub-sequence is replaced by a single "grouped token" which contains a
>>> list of tokens of everything between the start and end tokens. That list is
>>> recursively "regrouped".
>>>
>>> Having this tree structure makes much easier things like error recovery
>>> (ignore a whole at-rule and its {} body) or parsing of functional values.
>>>
>>> All this is only an implementation detail, but could a similar concept be
>>> useful to define "component value" in the spec? rgb(0, 127, 255) would be a
>>> single component that contains 5 sub-components (assuming you count commas
>>> but not white space). Nested functional notation (like rgb() in a gradient)
>>> would form a tree of components.
>>
>> Yes, that's a more useful definition of "token" for spec purposes,
>> once you rise above raw grammar concerns.  (Honestly, I kinda want to
>> just write an explicit parser a la HTML and have it emit tokens like
>> that.)
> 
> I fully support such an effort. In particular, I am looking forward to
> the "Tree Construction" part, as I share a lot of Peter Moulder's
> questions about error handling in, in particular, block parsing[1]. The
> current grammar+rule based approach just makes me feel like I am just
> too stupid and the spec is too smart. I am looking forward to a state
> machine that I can happily trace it.

Forgot the link > <

[1] http://lists.w3.org/Archives/Public/www-style/2011Jan/0068


Cheers,
Kenny

Received on Friday, 6 April 2012 00:24:55 UTC