Re: [CSS21] WD 4.1.6, 4.2: parsing of blocks from Peter Moulder on 2011-03-18 (www-style@w3.org from March 2011)

From: Peter Moulder <peter.moulder@monash.edu>
Date: Fri, 18 Mar 2011 12:35:17 +0000
To: www-style@w3.org
Message-id: <20110318123517.GA7727@bowman.infotech.monash.edu.au>
I acknowledge that the issues raised in this thread relate only to
invalid stylesheets, not valid ones, and so could reasonably be given a
lower priority than many other issues.

(Though concerning the importance of "only affects invalid stylesheets":
 evidently WG members place at least non-zero importance on uniform error
 recovery behaviour, which would be why these provisions were added in the
 first place (they weren't in CSS2.0), and is also in evidence in the
 discussion of this Issue in the minutues from day 1 of the F2F.)

One argument for devoting more attention to issues in syndata.html than
to the rest of the spec is that many other specs (including SVG) depend
on CSS 2.1 for parsing behaviour; while syndata.html is fairly
self-contained, which tends to make its issues easier to resolve than
most things in chapters 8 onwards.

Information relevant to how quick or safe it would be to address any of
the issues raised in this thread:

  Since the initial discussion, I've posted some test cases and some
  results and corresponding proposals.

  Some of the tests had uniform behaviour among the UAs I tested,
  suggesting that the proposals related to those tests should be 
  relatively safe.

  Other tests did have divergent behaviour (subject to confirmation
  with current development versions).  If still present in current
  development versions of major UAs then that would be less safe/quick
  to address now.

In the rest of this message I give reasons why I don't consider the
issue resolved, though I'll make no further comment on prioritization,
and I understand that there are reasons to want to advance CSS2.1
despite still having unresolved issues.


On 2011-01-07, Boris Zbarsky wrote:

> On 1/7/11 2:40 PM, Peter Moulder wrote:
> > The first paragraph of §4.1.6 claims that between the delimiting braces
> > of a block "there may be any tokens, except that [brackets must be
> > in nested matched pairs]".
> >
> > This conflicts with the grammar, which says that a block can't
> > contain BAD_STRING, BAD_URI or BAD_COMMENT tokens (i.e. not just "any
> > tokens")
> 
> It's not clear to me how one would even get any of those three tokens, 
> given the end-of-stylesheet rules (which are applied before the 
> tokenizer runs as far as I can tell; that's the only way they make any 
> sense at all).  Am I missing something?

According to the tests I ran some time after Boris wrote the above, the
answer that user agents seem to have taken seems to be that BAD_COMMENT
tokens can never occur, whereas BAD_STRING and BAD_URI tokens can arise
based on what characters occur in their "content", e.g. if an
unescaped [\r\n\f] occurs when reading a string.

> > Relatedly, the direction to "[observe] the rules for matching pairs of
> > [bracketing and quotation characters]" is unclear on what should
> > occur when encountering the wrong closing bracketing character:
> > it isn't clear whether it should parse as if that closing bracketing
> > character were removed, or as if extra closing bracketing characters
> > were inserted.  For example, if the following illegal sequence counts
> > as "while parsing a statement", it isn't clear whether the "end of the
> > statement" occurs at the first or second ‘}’:
> >
> >    { ... ( ... } ... ) ... }
> 
> The second; I thought that was pretty clear.  Only the last open bracket 
> can be closed, so if you see a close bracket that doesn't match, you 
> just read it and don't close things.

I don't see where that's specified in the existing text.

The current text says to "[observe] the rules for matching pairs", but
that isn't possible when the input doesn't form matching pairs.

I see in www-style discussions that there was a proposal to add a
statement to make things behave as Boris describes above, but I don't
see it having being added; it looks as if the only outcome of that
thread for specification text was a clarification about "}" in style
attributes (in the css-style-attr spec, not CSS2.1 as such).

> > The following lines each contain a malformed statement, but it isn't
> > clear where the malformed statement ends, and hence whether the
> > p{color:blue} is to apply or not:
> >
> >    } p{color:blue}
> >    }} p{color:blue}
> >    }{} p{color:blue}
> 
> Just reading the spec without trying to read stuff into it, seems to me 
> that the last one of those should apply; the previous two will be syntax 
> errors due to failures to parse a selector.
> 
> For what it's worth, that's interoperably implemented in at least 
> Presto, Gecko, and Webkit, so apparently there wasn't much of an 
> understanding problem here at least on the part of implementors....

Justin Rogers of Microsoft apparently found it unclear (and indeed
apparently favoured a different interpretation) when he wrote to
www-style in
http://lists.w3.org/Archives/Public/www-style/2007Dec/0167.html;
and Anne van Kesteren of Opera replied to concur that "That [i.e. the
behaviour of an example similar to the one I gave above] does indeed
seem not very well defined in the specification".

[The text of §4.2 then wasn't exactly the same as in the current WD
 text, but the first sentence of the "Malformed declarations" rule was
 identical to now according to 0156.html written on the same day, so I
 don't expect that any subsequent clarifications are relevant to the
 clarity of the "matching" rule.]

I don't think one can read much into any agreement in behaviour of
current major UAs on this point: I believe that they all predate the
addition of the error recovery text to the spec, in which case
their implementations would either predate the text, or would be
influenced by www-style discussions and/or the test suite and the like
rather than just the actual text.  E.g. the thread described above
shows Presto's current behaviour to be a result of testing rather than
the text of the specification.

(FWIW, I too am an implementor, and am a native english speaker, and
I found a number of the error recovery provisions in §4.2 at least
unclear, and in some cases my guess as to the "best" interpretation
differed from the behaviour I subsequently saw when testing.)

> > Similar comments apply to the corresponding phrases "while parsing a
> > declaration" and "end of the declaration", i.e. it isn't clear what
> > those phrases mean.
> 
> Agreed.
> 
> > As another example, consider:
> >
> >    p { margin:0; color: red -->  ; }
> >
> > When we encounter the -->  token, it isn't clear whether we are in fact
> > "parsing a statement" or "parsing a declaration"
> 
> "parsing a declaration".  Why is this unclear?

I see I wrote unclearly myself: what I meant is, it isn't clear whether
we're "parsing a declaration" (because one could argue that we're past
the end of the sequence of tokens that can form a declaration); and,
less importantly, it isn't clear whether we're "parsing a statement",
because: (i) in some sense the unexpected token means we evidently
aren't parsing a statement, so it isn't really clear what it is we're
parsing; and because (ii) by a literal reading, one might think that we
were parsing a statement even if we're also parsing a declaration.

(Bert Bos wrote in
 http://lists.w3.org/Archives/Public/www-style/2009May/0054.html that:

 : The intention is that the rule for malformed declarations takes
 : precedence over that for malformed statements, as it comes
 : first in the spec. Thus an unexpected token in a declaration
 : causes just the declaration to be ignored, not the whole statement.

 However, I imagine that most readers wouldn't see that intent in the
 existing text.  Granted, they would presumably guess the intended
 precedence from the fact that the two rules conflict with each other,
 though I think they would take it as loose wording rather than noticing
 the intended "precedence by order" reading.)

> > The behaviour I see in gecko, konqueror and webkit is that it's
> > like ‘p { margin:0; }’, even though one could reasonably argue that
> > ‘color: red’ is not a malformed declaration and should not be
> > discarded.
> 
> How is it not a malformed declaration?

Boris may have misread the above; I'm sure he agrees that "color: red"
isn't by itself a malformed declaration, even if "color: red -->"
clearly isn't a well-formed declaration.  [As I think Boris is alluding
to below, the above isn't quite the same as saying "clearly is a
malformed declaration".]

> Is the issue just that the [string] "color: red -->" doesn't match
> the "declaration" production in the grammar?

That's one reason to question whether it's a malformed declaration,
yes: i.e. it isn't a declaration, so one might question whether the
"-->" is an error that happens "while parsing a declaration".

pjrm.
Received on Friday, 18 March 2011 12:35:49 UTC