[css3-syntax] Reviving the spec, starting with the parser

Over the past week or so, I've been writing a spec for the CSS parsing
algorithm in the model of the HTML parsing algorithm - that is, as an
explicit state machine handling input character by character, rather
than as a grammar.

I've currently gotten v1.0 of the tokenizer finished, temporarily
stored at <http://dev.w3.org/csswg/css3-syntax/parsing.html>.  I'll
start on the tree-builder next that actually produces stylesheets.

My primary reason for doing this was to eliminate the existing
undefined behavior in the syntax - the grammar is designed to be
pretty permissive and accept a lot of currently-invalid CSS, but it's
still easy to construct documents that don't match the grammar, and
the interpretation of these document is currently undefined.  There's
no good reason for this to be undefined, except that it's hard to
define grammars that match every possible bytestream.

My secondary reason was to make error-handling clearer and
better-defined, and hopefully easier to extend.  Kenny Lu has raised
issues about error-handling being unclear in some cases.  As well,
every time we want to add something new to the Core Grammar, we have
to be *extremely* careful, because adding to a complex grammar is a
non-trivial process that can easily accidentally introduce errors.  A
well-design state-machine can e much easier to extend.

I'm currently willfully violating the Core Grammar in one place -
numbers that start with a + or - are parsed as a single NUMBER token,
rather than a DELIM followed by a NUMBER.  The only consequence of
this is that you can't put a comment between the sign and the number
any longer.  This was already not allowed by at least one major
browser, plus it's a ridiculous thing to do, so there shouldn't be any
compatibility concerns.

Are there any standing concerns with the tokenizing stage that I might
be able to answer or resolve?

Otherwise, I'll forge forward and start writing the tree-builder
tomorrow.  Once I finish both of these, I'll move them into the Syntax
spec and start updating/maintaining that.  This is necessary work,
since Syntax is *supposed* to be one of our bedrock specs.

~TJ

Received on Thursday, 12 April 2012 00:33:59 UTC