W3C home > Mailing lists > Public > whatwg@whatwg.org > March 2010

[whatwg] Parsing processing instructions in HTML syntax: 10.2.4.44 Bogus comment state

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 22 Mar 2010 23:15:48 +0000 (UTC)
Message-ID: <Pine.LNX.4.64.1003222307490.4055@ps20323.dreamhostps.com>
On Thu, 18 Mar 2010, Brett Zamir wrote:
> On 3/2/2010 6:54 PM, Ian Hickson wrote:
> > On Tue, 2 Mar 2010, Elliotte Rusty Harold wrote:
> > >
> > > The handling of processing instructions in the XHTML syntax seems 
> > > reasonably well-defined; but it feels a little off in the HTML 
> > > syntax.
> >
> > There's no such thing as processing instructions in text/html.
> > 
> > There was such a thing in HTML4, because of its SGML heritage, though 
> > it was explicitly deprecated.
> 
> Doesn't seem deprecated per 
> http://www.w3.org/TR/html401/appendix/notes.html#h-B.3.6

Section B.3.3 says, speaking of SGML features with limited support, which 
at the time of that section's writing included PIs, that "We recommend 
that authors avoid using all of these features". Section 3.2 specifically 
says "The appendix lists some SGML features that are not widely supported 
by HTML tools and user agents and should be avoided".


> > > Briefly it seems that<? causes the parser to go into Bogus comment 
> > > state, which is fair enough. (I wouldn't really recommend that 
> > > anyone use processing instructions in HTML syntax anyway.) However 
> > > the parser comes out of that state at the first>. Because processing 
> > > instructions can contain> and terminate only at the two character 
> > > sequence ?> this could cause PI processing to terminate early and 
> > > leave a lot more error handling and a confused parser state in the 
> > > text yet to come.
> >
> > In HTML4, PIs ended at the first>, not at ?>. "<?target data>" is the 
> > syntax of PIs when the SGML options used by HTML4 are applied.
> > 
> > In any case, the parser in HTML5 is based on what browsers do, which 
> > is also to terminate at the first>. It's unlikely that we can change 
> > that, given backwards-compatibility needs.
> > 
> > There's a simple workaround: don't use PIs in text/html, since they 
> > don't exist in HTML5 at all, and don't send XML as text/html, since 
> > XML and HTML have different syntaxes and aren't compatible.
> 
> In http://dev.w3.org/html5/html4-differences/ , it says:
> 
> "HTML5 defines an HTML syntax that is compatible with HTML4 and XHTML1 
> documents published on the Web, but is not compatible with the more 
> esoteric SGML features of HTML4, such as processing instructions 
> <http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.3.6> 
> and shorthand markup 
> <http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.3.7>."
> 
> This seems to me to suggest that backward compatibility can be broken as 
> far as processing instructions (i.e., requiring ?> and not merely > to 
> close a processing instruction).

Backwards compatibility with legacy content can only be broken in extreme 
cases (e.g. for security reasons). That's one of the fundamental design 
goals of HTML5.


> If not, then it doesn't seem clear from the specification that 
> processing instructions are indeed not allowed because the parsing model 
> does allow them, and with processing instructions being 
> platform-specific by definition and not apparently explicitly prohibited 
> by HTML5 (unless that is what you are trying to say here), HTML5 syntax 
> does seem to be compatible with them.

HTML5 prohibits PIs in text/html. See:

   http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#writing

...and notice how PIs are not listed as a possible syntax element.


> But if you are trying to prohibit them for any use whatsoever yet still 
> technically allow them to be ignored for compatibility, it seems this 
> would contradict the statement at 
> http://dev.w3.org/html5/html4-differences/ that "there is no longer a 
> need for marking features "deprecated"". Or is the difference that these 
> are forbidden from doing anything but will be allowed (and ignored) 
> indefinitely into the future in future versions of HTML?

They are forbidden but are ignored in this (and probably future) 
version(s) of HTML.


> Btw, I've added a talk section at the wiki page 
> http://wiki.whatwg.org/wiki/Talk:HTML_vs._XHTML#Harmony to suggest 
> covering XHTML<->HTML compatibility guidelines specifically, so would 
> appreciate a reply there, so I know whether we can begin edits in this 
> vein on the page.

Please feel free to edit the wiki or add new pages! Everyone is welcome to 
edit the wiki.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 22 March 2010 16:15:48 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:21 UTC