[whatwg] [WebApps] Parsing: bogus DOCTYPE state

From: J. King <jking@dark-phantasy.com>
Date: Mon, 17 Jul 2006 12:16:15 -0400
Message-ID: <op.tcuctdylplbshj@briann>
The bogus DOCTYPE state consumes all characters until it gets to EOF or a  
'>' character.  I presume this means that the following DOCTYPE:

  <!DOCTYPE html blah "http://some<invalid>URI">

...would finish at the first > and emit character tokens for 'URI">'.   
Similarly, I imagine this sequence:

  <!DOCTYPE html blah <html lang="en"><head>

...would not produce a start-tag token for 'html'.

Is this what browsers do, or is this an oversight?  Even if it -is- what  
browsers do, this behaviour would lead conformance checkers to report the  
wrong kinds of errors; I would suggest a more complex parsing of DOCTYPEs  
is necessary.

J. King
Received on Monday, 17 July 2006 09:16:15 UTC

