[whatwg] Bug in "Before DOCTYPE name state"? from Ian Hickson on 2007-06-19 (public-whatwg-archive@w3.org from June 2007)

From: Ian Hickson <ian@hixie.ch>
Date: Tue, 19 Jun 2007 00:41:22 +0000 (UTC)
Message-ID: <Pine.LNX.4.64.0706190039240.26929@dhalsim.dreamhost.com>

On Fri, 22 Dec 2006, Thomas Broyer wrote:
> 2006/12/22, Ian Hickson:
> > On Thu, 21 Dec 2006, Thomas Broyer wrote:
> > >
> > > Why is the DOCTYPE marked "in error" in the former case?
> >
> > Because otherwise this document:
> >
> >    <!DOCTYPEH
> >
> > ...would emit a DOCTYPE that is not in error (since the token would be 
> > emitted before the bit at the end of the DOCTYPE name state).
> 
> Doh! right.

This changed recently, by the way, if someone could check that the spec 
still is indeed causing the right errors to be flagged that would be 
great. (I think it is, though some errors moved from the tokeniser to the 
tree construction phase.)


> > > In other words, why would <!DOCTYPE html> be "in error" while 
> > > <!DOCTYPE Html> wouldn't?
> >
> > Both would be not in error, because of the sentence at the end of the 
> > DOCTYPE name state.
> 
> OK, now understood (thanks you Simon for having enlighted me)

Note that this is now handled quite differently.


> > On Thu, 21 Dec 2006, Thomas Broyer wrote:
> > >
> > > But it also has this note, which is quite confusing: "Because 
> > > lowercase letters in the name are uppercased by the algorithm above, 
> > > the "HTML" letters are actually case-insensitive relative to the 
> > > markup."
> >
> > How is it confusing? I would clarify it, but I don't know what is 
> > confusing.
> 
> Maybe there's no need to clarify it, it might just have been me?

Ok.


> > > It remains that the tokenization stage is a bit confusing?
> >
> > Yes. The tree construction stage is even worse. Just implement it 
> > exactly as written with no interpretation and you should be fine. ;-)
> 
> My "problem" is that I'm not implementing an "emitting" parser (? la 
> SAX) but a "pulling" parser, so I'm stopping as soon as I've found a 
> token and return true to say "hey, I've changed the TokenType, Name, 
> Value, etc. properties to reflect a new token". ...so I'm interpreting 
> ;-)
> 
> Re tree construction, I'm about to implemented it in two parts: in the 
> "pull parser" when possible (handling omitted tags and misnested 
> formatting elements) and in a "tree fixer" otherwise (move the <meta> 
> and <link> into <head>, etc.)

How has that worked for you? Is the spec ok for that approach?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Monday, 18 June 2007 17:41:22 UTC