[whatwg] Parsing processing instructions in HTML syntax: 10.2.4.44 Bogus comment state

On 3/2/2010 6:54 PM, Ian Hickson wrote:
> On Tue, 2 Mar 2010, Elliotte Rusty Harold wrote:
>    
>> The handling of processing instructions in the XHTML syntax seems
>> reasonably well-defined; but it feels a little off in the HTML syntax.
>>      
> There's no such thing as processing instructions in text/html.
>
> There was such a thing in HTML4, because of its SGML heritage, though it
> was explicitly deprecated.
>
>    

Doesn't seem deprecated per 
http://www.w3.org/TR/html401/appendix/notes.html#h-B.3.6

>> Briefly it seems that<? causes the parser to go into Bogus comment
>> state, which is fair enough. (I wouldn't really recommend that anyone
>> use processing instructions in HTML syntax anyway.) However the parser
>> comes out of that state at the first>. Because processing instructions
>> can contain>  and terminate only at the two character sequence ?>  this
>> could cause PI processing to terminate early and leave a lot more error
>> handling and a confused parser state in the text yet to come.
>>      
> In HTML4, PIs ended at the first>, not at ?>. "<?target data>" is the
> syntax of PIs when the SGML options used by HTML4 are applied.
>
> In any case, the parser in HTML5 is based on what browsers do, which is
> also to terminate at the first>. It's unlikely that we can change that,
> given backwards-compatibility needs.
>
> There's a simple workaround: don't use PIs in text/html, since they don't
> exist in HTML5 at all, and don't send XML as text/html, since XML and HTML
> have different syntaxes and aren't compatible.
>
>    

In http://dev.w3.org/html5/html4-differences/ , it says:

"HTML5 defines an HTML syntax that is compatible with HTML4 and XHTML1 
documents published on the Web, but is not compatible with the more 
esoteric SGML features of HTML4, such as processing instructions 
<http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.3.6> 
and shorthand markup 
<http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.3.7>."

This seems to me to suggest that backward compatibility can be broken as 
far as processing instructions (i.e., requiring ?> and not merely > to 
close a processing instruction). If not, then it doesn't seem clear from 
the specification that processing instructions are indeed not allowed 
because the parsing model does allow them, and with processing 
instructions being platform-specific by definition and not apparently 
explicitly prohibited by HTML5 (unless that is what you are trying to 
say here), HTML5 syntax does seem to be compatible with them. But if you 
are trying to prohibit them for any use whatsoever yet still technically 
allow them to be ignored for compatibility, it seems this would 
contradict the statement at http://dev.w3.org/html5/html4-differences/ 
that "there is no longer a need for marking features "deprecated"". Or 
is the difference that these are forbidden from doing anything but will 
be allowed (and ignored) indefinitely into the future in future versions 
of HTML?

Btw, I've added a talk section at the wiki page 
http://wiki.whatwg.org/wiki/Talk:HTML_vs._XHTML#Harmony to suggest 
covering XHTML<->HTML compatibility guidelines specifically, so would 
appreciate a reply there, so I know whether we can begin edits in this 
vein on the page.

thanks,
Brett

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20100318/5ba8fc0b/attachment-0001.htm>

Received on Wednesday, 17 March 2010 21:10:02 UTC