- From: Ian Hickson <ian@hixie.ch>
- Date: Wed, 19 Jul 2006 00:20:44 +0000 (UTC)
On Mon, 17 Jul 2006, J. King wrote: > > The bogus DOCTYPE state consumes all characters until it gets to EOF or a '>' > character. I presume this means that the following DOCTYPE: > > <!DOCTYPE html blah "http://some<invalid>URI"> > > ...would finish at the first > and emit character tokens for 'URI">'. Correct. That's compatible with the rendering that that DOCTYPE causes in Safari, Opera, and Mozilla. (In Mozilla the DOCTYPE actually ends at the "<", so you have an <invalid> element in the DOM too. In Safari the DOCTYPE can end at a "<" only if it preceeded by a space. The spec doesn't have any "<" magic for DOCTYPEs.) > Similarly, I imagine this sequence: > > <!DOCTYPE html blah <html lang="en"><head> > > ...would not produce a start-tag token for 'html'. Correct, although in Mozilla and Safari it actually does. I doubt this is a big deal since in IE there is, as you propose, somewhat more complex DOCTYPE parsing at work, and so the DOCTYPEs end up containing the entirety of your examples. (Of course, IE then treats them as comments, not as DOCTYPEs, in the DOM.) > Is this what browsers do, or is this an oversight? It's compatible with what some browsers do. It was intentional, at least. I believe it's actually compatible with the SGML parsing rules, too, though I may be mistaken about that and don't have a copy of Goldfarb around to check. > Even if it -is- what browsers do, this behaviour would lead conformance > checkers to report the wrong kinds of errors; I would suggest a more > complex parsing of DOCTYPEs is necessary. Well, anything other than <!DOCTYPE HTML> is invalid, so there'll already be at least one parse error -- the DOCTYPE being invalid. Conformance checkers are, of course, allowed to go out of their way to make their errors more understandable. FWIW, my implementation, which has had very little work put into its error handling, reported: 16: Parse error: unexpected character while tokenising end of DOCTYPE. 41: Parse error: errorneous document type declaration. ...on your first example, and: 16: Parse error: unexpected character while tokenising end of DOCTYPE. 36: Parse error: errorneous document type declaration. ...on your second (and no other errors). Those don't seem like the wrong kinds of errors. :-) -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 18 July 2006 17:20:44 UTC