- From: Simon Sapin <simon.sapin@kozea.fr>
- Date: Sun, 27 May 2012 20:21:11 +0200
- To: www-style@w3.org
Le 26/05/2012 00:23, Tab Atkins Jr. a écrit : > So the question is simply, as someone implementing or maintaining a > parser, which style is more useful to read? As an implementer for WeasyPrint and tinycss, I prefer very much to cleanly separate the various steps. Dividing a complex problem into smaller problems makes it easier to think about. This means that the tokenizer and parser only communicate through a well-defined API, and that API is as small as possible. In this case, the tokenizer turns a flat sequence of Unicode codepoints into a flat sequence of tokens. The parser turns these tokens into some higher-level data structure. The tokenizer does not know anything about the parser. (Turning bytes into codepoints is yet another step, that I separate from the tokenizer.) This does *not* mean that the tokenizer has to be finished and all the tokens in memory before the parser can start. There can be some kind of iterator where tokens are generated on demand. But this is only an implementation detail. This leaves the problem of :nth-*(). I can’t find the reference, but I remember reading a suggestion on this list: the tokens between '(' and ')' could be serialized back to an Unicode string, and tokenized again by a different tokenizer. -- Simon Sapin
Received on Sunday, 27 May 2012 18:21:40 UTC