- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Thu, 5 Apr 2012 11:28:33 -0700
- To: Simon Sapin <simon.sapin@kozea.fr>
- Cc: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>, WWW Style <www-style@w3.org>
On Thu, Apr 5, 2012 at 5:31 AM, Simon Sapin <simon.sapin@kozea.fr> wrote: > I agree with Kenny that "component value" is ill-defined. > > According to section 2.1 of V&U, <'background-position'> is a single > component but "top left" (which is a valid <'background-position'>) is made > of two components. Which is it? Hm, yeah, that is kind of inconsistent. fantasai, we'll have to work on this definition a bit more. :/ > For implementing a parser, I found useful to have an intermediate step > between tokenization and parsing: turn a flat sequence of tokens into a > "regrouped" tree of {} [] () pairs and atomic tokens. The algorithm is > something like this: > > For {, [, ( and FUNCTION tokens, find the matching }, ] or ) token. The > whole sub-sequence is replaced by a single "grouped token" which contains a > list of tokens of everything between the start and end tokens. That list is > recursively "regrouped". > > Having this tree structure makes much easier things like error recovery > (ignore a whole at-rule and its {} body) or parsing of functional values. > > All this is only an implementation detail, but could a similar concept be > useful to define "component value" in the spec? rgb(0, 127, 255) would be a > single component that contains 5 sub-components (assuming you count commas > but not white space). Nested functional notation (like rgb() in a gradient) > would form a tree of components. Yes, that's a more useful definition of "token" for spec purposes, once you rise above raw grammar concerns. (Honestly, I kinda want to just write an explicit parser a la HTML and have it emit tokens like that.) >> From what I can tell, the only way to parse "-2" is as an IDENT, not a >> NUMBER, and the only way to parse "+2" is as a DELIM and a NUMBER. >> >> I'm either quite mistaken (in which case I'd appreciate a >> clarification!) or this is a 2.1 error that's needed fixing forever, >> and all implementations have just hacked around it. If the latter, >> then once we fix it (presumably to put an optional +/- in the >> definition of the NUMBER token), your #1 will be addressed because >> it'll be a single token, as will your following comments about signs >> in<percentage> (which I've elided for brevity). > > As Kenny said, "-2" does not match [-]?{nmstart}{nmchar}* and is not an > IDENT. It is DELIM and a NUMBER just like "+2". Yeah, y'all are right. > Apparently, UAs are expected to somehow find consecutive DELIM and > {NUMBER,DIMENSION,PERCENTAGE} tokens to handle signed numbers. The situation > is not as bad as if "-2" was an IDENT, but I still think that the core > grammar should be changed to include the optional sign in the {num} macro. > This would make many details much easier to deal with, and the only > effective change would be that "+/**/100px" would not be valid anymore. But > does any web page rely on that? If they do, they deserve any pain we cause them. I agree that the sign should be part of the NUM token. ~TJ
Received on Thursday, 5 April 2012 18:29:21 UTC