[whatwg] [WebApps] Parsing: bogus DOCTYPE state

The bogus DOCTYPE state consumes all characters until it gets to EOF or a  
'>' character.  I presume this means that the following DOCTYPE:

  <!DOCTYPE html blah "http://some<invalid>URI">

...would finish at the first > and emit character tokens for 'URI">'.   
Similarly, I imagine this sequence:

  <!DOCTYPE html blah <html lang="en"><head>

...would not produce a start-tag token for 'html'.

Is this what browsers do, or is this an oversight?  Even if it -is- what  
browsers do, this behaviour would lead conformance checkers to report the  
wrong kinds of errors; I would suggest a more complex parsing of DOCTYPEs  
is necessary.

-- 
J. King
http://jking.dark-phantasy.com/

Received on Monday, 17 July 2006 09:16:15 UTC