- From: David Woolley <david@djwhome.demon.co.uk>
- Date: Sat, 23 Oct 2004 16:14:49 +0100 (BST)
- To: www-style@w3.org
> Since tokenization is normatively defined [1] and the first step in > parsing the CSS, there is not much to do at higher levels except for OK I see what you mean. Normal table driven lexical analyzers would detect a comment start token then use ad hoc code to eat the rest of the comment. If one really wants to fully specify using regular expressions, I think you need to introduce a broken comment token. I'm not sure if you could specify that in egrep type syntax. In any case, the real problem is that this needs indefinite lookahead at the moment, whereas most of the other constructs can be resolved with one character of lookahead, e.g. u r l is the start of an identifier, but also the start of u r l ( but you only need to look at one more character to decide which way to go. > applying the Rules for handling parsing errors. > replies), it's solely the specification that implementors can consult. Surely there have to be two reference implementations for the specification to be at the stage that it is at! In practice, though, no future standard is going to overload /*, so it won't really matter how invalid comments are handled. (It may start putting directives in comments, if past precedent is anything to go by!) > And as such, it should faithfully describe the expected behaviour and > leave no room for interpretation. Whilst it is probably the case here that the ambiguity requires resolving, one doesn't want to define the expected behaviour beyond the point where future extensions are protected. Doing so would constrain implementation details, e.g. as the current lexical rules, taken literally, would force an indefinitely large backtrack buffer to exist (or seeking/refetching). > > I'm really not raising these issues to annoy people on this list, but in > an honest effort to identify possibly insufficiently specified corners or > unexpected consequences of parts of the specification and have them But I think you attacked the wrong part; the problem, as I see it, is with the lexical rules.
Received on Saturday, 23 October 2004 17:50:00 UTC