Re: [css-syntax][selectors] Added a COLUMN token to Syntax from Zack Weinberg on 2013-04-05 (www-style@w3.org from April 2013)

From: Zack Weinberg <zackw@panix.com>
Date: Fri, 5 Apr 2013 09:27:33 -0400
To: Simon Sapin <simon.sapin@exyr.org>
Cc: "Kang-Hao (Kenny) Lu" <kanghaol@oupeng.com>, "Tab Atkins Jr." <jackalmage@gmail.com>, www-style list <www-style@w3.org>
Message-ID: <CAKCAbMhnZho8vBYy71UhugsvS6vRo=UO+-3L0LheCaou4LkfqQ@mail.gmail.com>

On Fri, Apr 5, 2013 at 3:24 AM, Simon Sapin <simon.sapin@exyr.org> wrote:
> I don’t see much harm either, but the underlying point is about similar
> differences that wouldn’t be detectable. For example, what if my parser has
> separate INTEGER and NUMBER tokens rather than having a type flag on NUMBER
> tokens? What if it represents percentages as DIMENSION tokens with '%' for
> the unit, rather than as a separate token?
>
> As long as tokens/component values are not exposed to the platform, these
> are only an implementation details. I do believe that exposing them
> eventually (maybe only on variables) is the way to go to enable CSS
> polyfills, but that would effectively freeze the tokenizer.
>
> (This is also a concern for me, as author of a parsing library where CSS
> tokens are part of the public API.)

Gecko does have a few such divergences.  For instance, CDO and CDC are
merged.  I've also occasionally thought about giving all
syntactically-meaningful DELIMs their own token codes, or even perhaps
doing the same for all syntactically-meaningful identifiers.  It might
well make the parser go faster.  I don't see any huge difficulty in
hiding these internal divergences from a public API that exposed
tokens -- you just need a reverse mapping.  Of course it's nice to not
have to do that.

I tend to think that the tokenizer should be considered mostly frozen,
but I don't see any harm in adding new "punctuators" (to borrow a term
from C) as necessary.  An alternative (a la Smalltalk) would be to
declare that any two-character sequence of DELIM characters -- that
is, ASCII punctuation excluding ,;:()[]{} -- is a single token.  That
would be future-proof, but we'd have to audit the existing grammar
carefully to make sure it doesn't do anything it shouldn't.

zw

Received on Friday, 5 April 2013 13:27:58 UTC