Re: [css-syntax] Comments on "Parser Algorithms" section

On Tue, Jun 25, 2013 at 4:12 PM, L. David Baron <dbaron@dbaron.org> wrote:
> I'm a little concerned about the "component value" terminology; I'd
> prefer using a term that doesn't involve "value".  But I don't have
> a better idea right now.

As Simon said, we're using this term because it's *precisely* what V&U
means by the term "component value" when it uses it.

> I think it would be clearer if, in "Consume a simple block", the
> sentence:
>   # Create a simple block with its associated token set to the current
>   # input token.
> said:
>   # Create a simple block with its associated token set to the current
>   # input token and its value set to the empty list.
> to make it clear that there are two pieces in the data model of a
> simple block.  (And, in general, it might be better to more clearly
> define the data model of these objects by describing all the parts
> of them when they're created, or perhaps even more explicitly.)

Added.  Note that all of the component values do have their data
models described explicitly, if you follow their definition.  (I've
now added a link to the definition in the "consume a simple block"
section.)

> "Consume a qualified rule" is a bit unclear as to the type of the
> value; I think think the opening sentence should probably replace
> "nothing" with "empty list", since the value is a list.

Fixed.

> I also find the spec lacks a bit of precision as to what consuming a
> token actually means in terms of the current input token and the
> next input token.  Many parts of the parsing spec say "consume the
> next input token", which presumably makes it the current input token
> unless there's been a previous "reconsume the current input token".
> I think the spec should be more explicit about this, and also about
> the fact that it would be an error for the spec to say "reconsume
> the current input token" when it's already done so since the last
> "consume the next input token".

I've tried to make it clearer.  Previously, I was waffling between
there being a dequeue of tokens that you pushed and popped off of, and
there just being a list where you tracked your current position.  I've
changed to make it clearer that the spec's chosen abstraction is a
list + position.

"Reconsume the current input token" is now just an instruction to
ignore the next "consume the current input token", and leave the
current input token unchanged.  This should make it clearer that there
shouldn't ever be two successive reconsumptions, and that you don't
need to actually be able to push back an arbitrary number of tokens.

> I think there are also some cases where the wording gets things
> wrong in terms of this model.  For example, I think the first
> sentence of "Consume a declaration" which is currently:
>   # Create a new declaration with its name set to the value of the
>   # current input token.
> should have an additional "and consume the next input token" at the
> end.  Then the following sentence which currently says:
>   # Repeatedly consume 〈whitespace〉s until a non-〈whitespace〉 is
>   # reached.
> should instead say something like:
>   # Until the current input token is non-<whitespace>, repeatedly
>   # consume the next input token.
> Without such a change, it makes the "Consume a declaration" prose
> seem like it errors out every time since the initial identifier is
> not a <colon>.

I've done a thorough review, and fixed up what I found.  I should have
good token hygiene now across the parser.

~TJ

Received on Wednesday, 26 June 2013 23:29:09 UTC