- From: J. King <jking@dark-phantasy.com>
- Date: Mon, 17 Jul 2006 12:16:15 -0400
The bogus DOCTYPE state consumes all characters until it gets to EOF or a '>' character. I presume this means that the following DOCTYPE: <!DOCTYPE html blah "http://some<invalid>URI"> ...would finish at the first > and emit character tokens for 'URI">'. Similarly, I imagine this sequence: <!DOCTYPE html blah <html lang="en"><head> ...would not produce a start-tag token for 'html'. Is this what browsers do, or is this an oversight? Even if it -is- what browsers do, this behaviour would lead conformance checkers to report the wrong kinds of errors; I would suggest a more complex parsing of DOCTYPEs is necessary. -- J. King http://jking.dark-phantasy.com/
Received on Monday, 17 July 2006 09:16:15 UTC