Re: Error Parsing from David Woolley on 2004-10-23 (www-style@w3.org from October 2004)

From: David Woolley <david@djwhome.demon.co.uk>
Date: Sat, 23 Oct 2004 16:14:49 +0100 (BST)
To: www-style@w3.org
Message-Id: <200410231514.i9NFEor00268@djwhome.demon.co.uk>

> Since tokenization is normatively defined [1] and the first step in
> parsing the CSS, there is not much to do at higher levels except for

OK I see what you mean.  Normal table driven lexical analyzers would
detect a comment start token then use ad hoc code to eat the rest
of the comment.

If one really wants to fully specify using regular expressions, I think
you need to introduce a broken comment token.  I'm not sure if you
could specify that in egrep type syntax.  In any case, the real problem
is that this needs indefinite lookahead at the moment, whereas most
of the other constructs can be resolved with one character of lookahead,
e.g. u r l is the start of an identifier, but also the start of u r l (
but you only need to look at one more character to decide which way to
go.

> applying the Rules for handling parsing errors.
 
> replies), it's solely the specification that implementors can consult.

Surely there have to be two reference implementations for the specification
to be at the stage that it is at!

In practice, though, no future standard is going to overload /*, so
it won't really matter how invalid comments are handled.  (It may start
putting directives in comments, if past precedent is anything to go
by!)

> And as such, it should faithfully describe the expected behaviour and
> leave no room for interpretation.

Whilst it is probably the case here that the ambiguity requires resolving,
one doesn't want to define the expected behaviour beyond the point where
future extensions are protected.  Doing so would constrain implementation
details, e.g. as the current lexical rules, taken literally, would force
an indefinitely large backtrack buffer to exist (or seeking/refetching).

> 
> I'm really not raising these issues to annoy people on this list, but in
> an honest effort to identify possibly insufficiently specified corners or
> unexpected consequences of parts of the specification and have them

But I think you attacked the wrong part; the problem, as I see it, is
with the lexical rules.

Received on Saturday, 23 October 2004 17:50:00 UTC