W3C home > Mailing lists > Public > www-style@w3.org > May 2012

Re: [css3-syntax] Preference for parser speccing?

From: Simon Sapin <simon.sapin@kozea.fr>
Date: Sun, 27 May 2012 20:21:11 +0200
Message-ID: <4FC27097.8060308@kozea.fr>
To: www-style@w3.org
Le 26/05/2012 00:23, Tab Atkins Jr. a écrit :
> So the question is simply, as someone implementing or maintaining a
> parser, which style is more useful to read?

As an implementer for WeasyPrint and tinycss, I prefer very much to 
cleanly separate the various steps. Dividing a complex problem into 
smaller problems makes it easier to think about.

This means that the tokenizer and parser only communicate through a 
well-defined API, and that API is as small as possible. In this case, 
the tokenizer turns a flat sequence of Unicode codepoints into a flat 
sequence of tokens. The parser turns these tokens into some higher-level 
data structure. The tokenizer does not know anything about the parser. 
(Turning bytes into codepoints is yet another step, that I separate from 
the tokenizer.)

This does *not* mean that the tokenizer has to be finished and all the 
tokens in memory before the parser can start. There can be some kind of 
iterator where tokens are generated on demand. But this is only an 
implementation detail.

This leaves the problem of :nth-*(). I can’t find the reference, but I 
remember reading a suggestion on this list: the tokens between '(' and 
')' could be serialized back to an Unicode string, and tokenized again 
by a different tokenizer.

-- 
Simon Sapin
Received on Sunday, 27 May 2012 18:21:40 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 17:20:54 GMT