- From: Ian Hickson <ian@hixie.ch>
- Date: Wed, 7 Apr 2010 23:55:02 +0000 (UTC)
On Wed, 3 Mar 2010, Brett Zamir wrote: > On 3/2/2010 6:54 PM, Ian Hickson wrote: > > On Tue, 2 Mar 2010, Elliotte Rusty Harold wrote: > > > > > Briefly it seems that<? causes the parser to go into Bogus comment > > > state, which is fair enough. (I wouldn't really recommend that > > > anyone use processing instructions in HTML syntax anyway.) However > > > the parser comes out of that state at the first>. Because processing > > > instructions can contain> and terminate only at the two character > > > sequence ?> this could cause PI processing to terminate early and > > > leave a lot more error handling and a confused parser state in the > > > text yet to come. > > > > In HTML4, PIs ended at the first>, not at ?>. "<?target data>" is the > > syntax of PIs when the SGML options used by HTML4 are applied. > > > > In any case, the parser in HTML5 is based on what browsers do, which > > is also to terminate at the first>. It's unlikely that we can change > > that, given backwards-compatibility needs. > > Are there really a lot of folks out there depending on old HTML4-style > processing instructions not being broken? Not knowingly, but I wouldn't at all be surprised if there were lots of pages that triggered this, yes. People rely on all kinds of weird things. (See for example the sample from Philip below.) > Given that as I understand it such HTML4 processing instructions were > not even used by any standard at that time, and with XHTML 1.0+ > processing instructions bringing into practice the XML form, and > especially with all of the progress made in X/HTML5 on harmonizing HTML > and XHTML, I'd think that it'd really be ideal if this issue would not > get in the way (along with the unfortunate loss of external DTDs)... In practice this issue shouldn't get in the way anyway, since PIs aren't allowed in HTML. > As long as website creators have the freedom to be sloppy Authors don't have the freedom to be sloppy. > why not go a little further to make XML compatibility better? XML compatibility isn't a goal. There is a minor goal of making it possible to transition easily from XHTML to HTML. PI-like syntax in XHTML is only used for two purposes: - the XML declaration, which can simply be removed when publishing HTML, and which if not removed will just be ignored (since it never contains a ">" character, so ending on the first ">" is fine). - the XML Stylesheet PI, which needs to be converted to a <link> element anyway, so isn't a problem. > It'd be a whole lot more appealing to work in both environments out of > the box than deal with complex server-side conversion solutions... I don't really understand why you would ever use a PI to be honest. On Wed, 3 Mar 2010, Philip Taylor wrote: > > Yes, e.g. a load of pages like > http://www.forex.com.cn/html/2008-01/821561.htm (to pick one example at > random) say: > > <?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> > > and don't have the string "?>" anywhere. Indeed. On Fri, 5 Mar 2010, Brett Zamir wrote: > > Ok, fair enough. But while it is great that HTML5 seeks to be > transitional and backwards compatible, HTML5 (thankfully) already breaks > compatibility for the sake of XML compatibility (e.g., localName or > getElementsByTagNameNS). This is actually just for implementation sanity, it's not about XML syntax compatibility. > It seems to me that there should still be a role of eventually > transitioning into something more full-featured in a fundamental, > language-neutral way (e.g., supporting a fuller subset of XML's features > such as external entities and yes, XML-style processing instructions); > extensible, including the ability to include XML from other namespaces > which may also encourage or rely on using their own XML processing > instructions, for those who wish to experiment or supplement the HTML > standard behavior; and more harmonious and compatible with a simpler > syntax (i.e., XML's)--even if the more complex syntax is more prominent > and continues to be supported indefinitely. People can use XML if they want, but I don't really see a path from today's HTML to a generic language that doesn't break backwards compatibility. If you're ok with breaking back-compat, though, there's no need to worry about HTML at all. Just use XHTML. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 7 April 2010 16:55:02 UTC