[whatwg] Parsing processing instructions in HTML syntax: 10.2.4.44 Bogus comment state

The handling of processing instructions in the XHTML syntax seems
reasonably well-defined; but it feels a little off in the HTML syntax.
Briefly it seems that <? causes the parser to go into Bogus comment
state, which is fair enough. (I wouldn't really recommend that anyone
use processing instructions in HTML syntax anyway.) However the parser
comes out of that state at the first >. Because processing
instructions can contain > and terminate only at the two character
sequence ?> this could cause PI processing to terminate early and
leave a lot more error handling and a confused parser state in the
text yet to come.

It might be wise to add a separate processing instruction state that
would consume all characters up to the first occurrence of ?> instead
of reusing Bogus comment state. The parser could still emit a comment
token containing the processing instruction text. The goal here is not
to enable processing instructions in the HTML syntax. It's simply an
effort to ensure that if one does slip in by mistake we more
accurately detect what the author or generator likely intended as the
end of the processing instruction.

-- 
Elliotte Rusty Harold
elharo at ibiblio.org

Received on Tuesday, 2 March 2010 02:44:55 UTC