W3C home > Mailing lists > Public > www-style@w3.org > May 2012

Re: [css3-syntax] Preference for parser speccing?

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Mon, 28 May 2012 08:08:47 -0700
Message-ID: <CAAWBYDCRYbZH1i71aeLQAkVS+F9e7NzcOBNSP8QYBm=f49wReg@mail.gmail.com>
To: Simon Sapin <simon.sapin@kozea.fr>
Cc: www-style@w3.org
On Sun, May 27, 2012 at 11:21 AM, Simon Sapin <simon.sapin@kozea.fr> wrote:
> Le 26/05/2012 00:23, Tab Atkins Jr. a écrit :
>> So the question is simply, as someone implementing or maintaining a
>> parser, which style is more useful to read?
> As an implementer for WeasyPrint and tinycss, I prefer very much to cleanly
> separate the various steps. Dividing a complex problem into smaller problems
> makes it easier to think about.
> This means that the tokenizer and parser only communicate through a
> well-defined API, and that API is as small as possible. In this case, the
> tokenizer turns a flat sequence of Unicode codepoints into a flat sequence
> of tokens. The parser turns these tokens into some higher-level data
> structure. The tokenizer does not know anything about the parser. (Turning
> bytes into codepoints is yet another step, that I separate from the
> tokenizer.)
> This does *not* mean that the tokenizer has to be finished and all the
> tokens in memory before the parser can start. There can be some kind of
> iterator where tokens are generated on demand. But this is only an
> implementation detail.


> This leaves the problem of :nth-*(). I can’t find the reference, but I
> remember reading a suggestion on this list: the tokens between '(' and ')'
> could be serialized back to an Unicode string, and tokenized again by a
> different tokenizer.

Yes, that's one way to do it.  You can reconstruct the text of the
an+b from the tokens adequately enough to do this.  The only details
you'll lose is comments and exactly what sort of whitespace is used.

Received on Monday, 28 May 2012 15:09:39 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:38:59 UTC