- From: Kang-Hao (Kenny) Lu <kennyluck@csail.mit.edu>
- Date: Fri, 06 Apr 2012 08:24:26 +0800
- To: "Tab Atkins Jr." <jackalmage@gmail.com>
- CC: Simon Sapin <simon.sapin@kozea.fr>, WWW Style <www-style@w3.org>
(12/04/06 8:14), Kang-Hao (Kenny) Lu wrote: >>> For implementing a parser, I found useful to have an intermediate step >>> between tokenization and parsing: turn a flat sequence of tokens into a >>> "regrouped" tree of {} [] () pairs and atomic tokens. The algorithm is >>> something like this: >>> >>> For {, [, ( and FUNCTION tokens, find the matching }, ] or ) token. The >>> whole sub-sequence is replaced by a single "grouped token" which contains a >>> list of tokens of everything between the start and end tokens. That list is >>> recursively "regrouped". >>> >>> Having this tree structure makes much easier things like error recovery >>> (ignore a whole at-rule and its {} body) or parsing of functional values. >>> >>> All this is only an implementation detail, but could a similar concept be >>> useful to define "component value" in the spec? rgb(0, 127, 255) would be a >>> single component that contains 5 sub-components (assuming you count commas >>> but not white space). Nested functional notation (like rgb() in a gradient) >>> would form a tree of components. >> >> Yes, that's a more useful definition of "token" for spec purposes, >> once you rise above raw grammar concerns. (Honestly, I kinda want to >> just write an explicit parser a la HTML and have it emit tokens like >> that.) > > I fully support such an effort. In particular, I am looking forward to > the "Tree Construction" part, as I share a lot of Peter Moulder's > questions about error handling in, in particular, block parsing[1]. The > current grammar+rule based approach just makes me feel like I am just > too stupid and the spec is too smart. I am looking forward to a state > machine that I can happily trace it. Forgot the link > < [1] http://lists.w3.org/Archives/Public/www-style/2011Jan/0068 Cheers, Kenny
Received on Friday, 6 April 2012 00:24:55 UTC