Re: [css3-syntax] Thoughts on proposed Syntax module from L. David Baron on 2012-08-29 (www-style@w3.org from August 2012)

From: L. David Baron <dbaron@dbaron.org>
Date: Tue, 28 Aug 2012 18:03:25 -0700
To: "Kang-Hao (Kenny) Lu" <kanghaol@oupeng.com>
Cc: WWW Style <www-style@w3.org>
Message-ID: <20120829010325.GA19154@crum.dbaron.org>
On Wednesday 2012-08-29 07:59 +0800, Kang-Hao (Kenny) Lu wrote:
> (12/08/29 5:28), L. David Baron wrote:
> > As I said at the face-to-face meeting, I think the approach that the
> > specification takes to CSS's (), [], and {} matching rules is going
> > in the wrong direction.  I think the normative specification text
> > for these should be the general statements about how the processing
> > works, and not the code-like form of the current specification,
> > since I really *don't* want bugs in the specification that break the
> > general rules to end up being codified in the specification.  I
> > worry that ending up with exceptions to these rules could prevent us
> > from making general improvements to parsing technology that would
> > otherwise (without exceptions) be possible.  (For example, we might
> > at some point in the future have generated parsers based on two
> > different but parallel state machines, one describing the correct
> > syntax and another describing the error handling behavior (for when
> > the first state machine goes into a failure state) -- done in a way
> > that a state in the error handling state machine can be determined
> > at parser generation time from the state in the correct-input state
> > machine.)
> 
> It's a bit difficult to me to understand this concern without specific
> examples of "bugs in the specification that break the general rules". Do
> you mean, for example, error handling for BAD_URI should be undefined?

No, I mean that that it's very easy to forget to write that it's
time to look for a matching closing ), or closing }, when handling
error cases.  (Example below, after your next quote.)

Or, in Tab's spec that accepts almost anything without going into an
error mode, it's easy to forget to put in special handling for a
closing } or ).  (Note that this model of accepting almost anything is
the opposite of the way parsing works in Gecko.  It also means that
Tab's spec so far really only describes the Chapter 4 grammar, and not
how parsing actually works for the different types of rules and the
different types of values.  It doesn't unify the Chapter 4 grammar with
how things actually get parsed, which I thought was one of the major
goals of rewriting how we specify parsing.)

For example, Tab's spec produces incorrect results for the
following:
  @media screen { @foo bar } p { color: green }
since it produces the data structure shown by the indentation:
  @media screen {
    @foo bar } p { color: green } /* an invalid @media rule */
  } /* implied by EOF */
instead of the data structure shown by the indentation:
  @media screen {
    @foo bar /* an invalid @media rule */
  }
  p { color: green }
as required by CSS 2.1 (although I admit CSS 2.1 is only clear for
the case of an unknown at-keyword, but not for a known at-keyword
with invalid syntax following it; e.g., s/@foo bar/@media print/),
since http://dev.w3.org/csswg/css3-syntax/#at-rule-mode0 doesn't
have any special handling for the } token conditional on whether the
at-rule being parsed is nested inside another rule.  (At the top
level of a style sheet, on the other hand,
  @foo bar } p { color: green }
would be a single at-rule.)

> > That said, I think the problems with this approach don't show up
> > much in the material currently specified; I think most of the
> > problems appear when describing how to parse the syntax of all the
> > property values, which is where the bulk of CSS parsing logic lives.
> 
> Hmm... I am intrigued. Do you have specific examples?

For example, the entirety of parsing CSS media queries in @media
rules happens inside the
  # anything else
  #   consume a primitive and append the returned value to the
  #   prelude of the current rule.  Remain in this mode.
in http://dev.w3.org/csswg/css3-syntax/#at-rule-mode0 .  So does the
entirety of parsing media queries in @import rules.  So does the
entirety of parsing the argument to an @supports rule.

If how all of these things were parsed were actually defined in this
specification, then it would be very easy to forget to mention that,
when encountering an unexpected token (say, finding an opening '('
where there should have been a ':') in the middle of a media query,
that one needs to reprocess the current token and then enter an
error mode that involves matching a ).  Or perhaps, if it's the
containing mode that needs to do the () matching rather than the
current mode, then the spec needs to say to return an error, return
to the content mode, *and reconsume the current token* (which
matters quite a bit if the unexpected token is a ')').

-David

-- 
𝄞   L. David Baron                         http://dbaron.org/   𝄂
𝄢   Mozilla                           http://www.mozilla.org/   𝄂
Received on Wednesday, 29 August 2012 01:03:51 UTC