- From: Thomas Broyer <t.broyer@gmail.com>
- Date: Fri, 22 Dec 2006 08:38:48 +0100
2006/12/22, Ian Hickson: > On Thu, 21 Dec 2006, Thomas Broyer wrote: > > > > Why is the DOCTYPE marked "in error" in the former case? > > Because otherwise this document: > > <!DOCTYPEH > > ...would emit a DOCTYPE that is not in error (since the token would be > emitted before the bit at the end of the DOCTYPE name state). Doh! right. > > In other words, why would <!DOCTYPE html> be "in error" while > > <!DOCTYPE Html> wouldn't? > > Both would be not in error, because of the sentence at the end of the > DOCTYPE name state. OK, now understood (thanks you Simon for having enlighted me) > On Thu, 21 Dec 2006, Thomas Broyer wrote: > > > > But it also has this note, which is quite confusing: "Because lowercase > > letters in the name are uppercased by the algorithm above, the "HTML" > > letters are actually case-insensitive relative to the markup." > > How is it confusing? I would clarify it, but I don't know what is > confusing. Maybe there's no need to clarify it, it might just have been me? > > It remains that the tokenization stage is a bit confusing? > > Yes. The tree construction stage is even worse. Just implement it exactly > as written with no interpretation and you should be fine. ;-) My "problem" is that I'm not implementing an "emitting" parser (? la SAX) but a "pulling" parser, so I'm stopping as soon as I've found a token and return true to say "hey, I've changed the TokenType, Name, Value, etc. properties to reflect a new token". ...so I'm interpreting ;-) Re tree construction, I'm about to implemented it in two parts: in the "pull parser" when possible (handling omitted tags and misnested formatting elements) and in a "tree fixer" otherwise (move the <meta> and <link> into <head>, etc.) -- Thomas Broyer
Received on Thursday, 21 December 2006 23:38:48 UTC