- From: Kang-Hao (Kenny) Lu <kennyluck@csail.mit.edu>
- Date: Fri, 01 Jun 2012 03:43:24 +0800
- To: "Tab Atkins Jr." <jackalmage@gmail.com>
- CC: WWW Style <www-style@w3.org>
(12/05/26 6:08), Tab Atkins Jr. wrote: > This is just a general preference question for implementors. > > When I pick up Syntax again, would it be better for me to write the > parsing section as if it the tokenizing was already completely done, > or interleaved with the tokenizing like the HTML parser is? What > would be more useful? I can do either, but I'd rather not have to > switch partway through, or even after I'm totally done. Mind sharing an example about the choices here? Or is this a general survey about how we should write the parser section? (12/05/26 6:23), Tab Atkins Jr. wrote: > HTML, for example, is defined with the tokenizing interleaved with the > parsing. This is necessary for HTML, because some tags change the > tokenizing rules - if you see a <script>, you stop parsing like HTML > and instead just parse everything as text until you see the </script>, Note that when you say "interleaved" here, the tokenizing section of the HTML spec uses phrases like "Emit the XXX token" instead of "call parser routine YYY" so it's still quite clean to me (though whether that will make people think that the parser wouldn't change the state of the tokenizer is another issue). It is true that the HTML parser changes the state of the tokenizer from time to time during tree construction, but given that the CSS parser mostly (except :nth-*) doesn't change the state of the tokenizer, I can't quite imagine how you could write this in an interleaved way without a concrete example... Cheers, Kenny
Received on Thursday, 31 May 2012 19:44:11 UTC