Re: CSS parser recovery

Ian Hickson wrote to <www-style@w3.org> on 8 January 2003 in
"CSS parser recovery"
(<mid:Pine.LNX.4.21.0301050148140.4908-100000@dhalsim.dreamhost.com>):

> [...] there are two alternatives (as far as I can see):

>    1. UAs develop proprietary error recovery, and we end up with CSS being
>       the same mess as HTML, despite Hakon and Bert's efforts in making a
>       forward compatible grammar.

While I, too, prefer to avoid a variety of error recovery mechanisms,
we face this variety as a currently permissible option. CSS2 specifies
a core grammar, but makes neither regulation nor even comment on
handling input that fails to match the core grammar. This sort of
error handling is entirely at the discretion of CSS2 user agents.
CSS1, never having specified a fallback grammar, leaves the
possibility that everything be accepted as some mangled form of CSS.
Then again, the argument is equally viable that CSS1 subjects user
agents to minimal requirements regarding acceptance of input, so that
CSS1 user agents may reject any input that is not clearly within the
syntax of CSS.

>    2. We say that unparsable CSS must be totally ignored, which results in
>        (a) authors finding their CSS can go from totally working to
>            totally not working when they add one character (much like XML,
>            but with no immediate feedback since CSS is in support files)

I am willing to promise two things. First, authors will hate the
discovery that their style sheets suddenly and totally fail to work as
intended. Second, if the market-leading user agent implements such
strict error handling, unparsable CSS will be corrected, the errors
disappearing within months. (As Arjun Ray has noted, most authors
follow the implementation.) In the long run, then, unparsable CSS
becomes a negligible problem.

>        (b) incremental parsers having to throw out everything they've
>            parsed so far when finding unparsable material

If we follow your second alternative, this is indeed a consequence. I
would argue, though, that this second alternative is not the last
alternative. We can specify that the parser accept the maximal initial
sequence of tokens matching the 'stylesheet' production. This approach
offers some leniency to authors yet ensure that the constructs are
well defined, whether ultimately ignored or applied.

> [...] new syntax is being added to CSS all the time with the
> expectation that it can be mixed with older CSS and having
> it work predictably in all CSS UAs.

If the new syntax conforms to the core grammar defined in CSS2, it is
accepted and there is no problem. If the new syntax does not conform
to the core grammar, this is quite a problem.  As I have noted, CSS2
makes no provisions for input that does not conform to the core
grammar, so already "having it work predictably in all CSS UAs" is out
of the question.

I have to wonder why a person would choose to violate the core
grammar when designing new syntax.  I find the core grammar perfectly
adequate to express the constructs that CSS requires.

Moreover, I have always taken CSS2:4.1.1, "Tokenization"
(<http://www.w3.org/TR/1998/REC-CSS2-19980512/syndata.html#tokenization>),
at its word when it announced a certain future:

    All levels of CSS -- level 1, level 2, and any future levels --
    use the same core syntax. This allows UAs to parse (though not
    completely understand) style sheets written in levels of CSS that
    didn't exist at the time the UAs were created. Designers can use
    this feature to create style sheets that work with older user
    agents, while also exercising the possibilities of the latest
    levels of CSS.      

Perhaps I was a fool to believe that statement, but I hope not.

-- 
Etan Wexler <mailto:ewexler@stickdog.com>

Received on Friday, 10 January 2003 00:54:47 UTC