- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Thu, 31 May 2012 13:28:46 -0700
- To: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>
- Cc: WWW Style <www-style@w3.org>
On Thu, May 31, 2012 at 12:43 PM, Kang-Hao (Kenny) Lu <kennyluck@csail.mit.edu> wrote: > (12/05/26 6:08), Tab Atkins Jr. wrote: >> This is just a general preference question for implementors. >> >> When I pick up Syntax again, would it be better for me to write the >> parsing section as if it the tokenizing was already completely done, >> or interleaved with the tokenizing like the HTML parser is? What >> would be more useful? I can do either, but I'd rather not have to >> switch partway through, or even after I'm totally done. > > Mind sharing an example about the choices here? Or is this a general > survey about how we should write the parser section? > > (12/05/26 6:23), Tab Atkins Jr. wrote: >> HTML, for example, is defined with the tokenizing interleaved with the >> parsing. This is necessary for HTML, because some tags change the >> tokenizing rules - if you see a <script>, you stop parsing like HTML >> and instead just parse everything as text until you see the </script>, > > Note that when you say "interleaved" here, the tokenizing section of the > HTML spec uses phrases like "Emit the XXX token" instead of "call parser > routine YYY" so it's still quite clean to me (though whether that will > make people think that the parser wouldn't change the state of the > tokenizer is another issue). Note that HTML explicitly defines that the parser must be reentered after every emitted token. It doesn't need any explicit callbacks, because it's a general rule. > It is true that the HTML parser changes the state of the tokenizer from > time to time during tree construction, but given that the CSS parser > mostly (except :nth-*) doesn't change the state of the tokenizer, I > can't quite imagine how you could write this in an interleaved way > without a concrete example... Same as HTML, basically, where the parser explicitly says "hey, tokenizer, give me one more token using the XXX tokenizing mode". ~TJ
Received on Thursday, 31 May 2012 20:29:35 UTC