- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Wed, 11 Apr 2012 17:33:10 -0700
- To: www-style list <www-style@w3.org>
Over the past week or so, I've been writing a spec for the CSS parsing algorithm in the model of the HTML parsing algorithm - that is, as an explicit state machine handling input character by character, rather than as a grammar. I've currently gotten v1.0 of the tokenizer finished, temporarily stored at <http://dev.w3.org/csswg/css3-syntax/parsing.html>. I'll start on the tree-builder next that actually produces stylesheets. My primary reason for doing this was to eliminate the existing undefined behavior in the syntax - the grammar is designed to be pretty permissive and accept a lot of currently-invalid CSS, but it's still easy to construct documents that don't match the grammar, and the interpretation of these document is currently undefined. There's no good reason for this to be undefined, except that it's hard to define grammars that match every possible bytestream. My secondary reason was to make error-handling clearer and better-defined, and hopefully easier to extend. Kenny Lu has raised issues about error-handling being unclear in some cases. As well, every time we want to add something new to the Core Grammar, we have to be *extremely* careful, because adding to a complex grammar is a non-trivial process that can easily accidentally introduce errors. A well-design state-machine can e much easier to extend. I'm currently willfully violating the Core Grammar in one place - numbers that start with a + or - are parsed as a single NUMBER token, rather than a DELIM followed by a NUMBER. The only consequence of this is that you can't put a comment between the sign and the number any longer. This was already not allowed by at least one major browser, plus it's a ridiculous thing to do, so there shouldn't be any compatibility concerns. Are there any standing concerns with the tokenizing stage that I might be able to answer or resolve? Otherwise, I'll forge forward and start writing the tree-builder tomorrow. Once I finish both of these, I'll move them into the Syntax spec and start updating/maintaining that. This is necessary work, since Syntax is *supposed* to be one of our bedrock specs. ~TJ
Received on Thursday, 12 April 2012 00:33:59 UTC