- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Wed, 29 Nov 2006 18:05:04 +0200
On Nov 28, 2006, at 23:20, Sam Ruby wrote: > In HTML5, there are a number of elements with a content model of > empty: area, base, br, col, command, embed, hr, img, link, meta, > and param. > > If HTML5 were changed so that these elements -- and these elements > alone -- permitted an optional trailing slash character, what > percentage of the web would be parsed differently? Obviously, 0% with parsers that opt to implement the HTML5 parsing algorithm with error recovery as opposed to Draconian error handling-- except for the detail whether error-reporting parsers report an error or not. (In theory, this is an issue for non-browser UAs that opt to implement Draconian error handling. In practice, even my mostly Draconian parser treats this particular error as non-fatal, because it is so common and so easily recoverable.) > The basis for my question is the observation that the web browsers > that I am familiar with apparently already operate in this fashion, > this usage seems to have crept into quite a number of diverse > places, and all this is coupled with Lachlan's observations[3] on > what it would take to change the popular WordPress application to > produce HTML5 compliant output. WordPress is a soup-in-soup-out system that shouldn't be trying to produce the XML syntax in the first place. But now that WP is using it, the question becomes: which is more costly: asking the WP developers to change their system or to adjust the definition of conformance so that WP looks conforming more easily. Anyway, as Lachlan already pointed out, whether or not the useless slash should be allowed on elements whose content model is empty is not an issue of technical damage to parsing interoperability but about damage to the mental model of confused authors. So the cost to consider is the cost of the confusion. > As a side benefit of this change, I believe that I could modify my > weblog to be simultaneously both HTML5 and XHTML5 compliant, modulo > the embedded SVG content, something that would needs to be > discussed separately. I am against blurring the distinction between the XML serialization and the HTML serialization. The infamous Appendix C didn't bring about good things. Having a text/html serialization that is also parseable as XML doesn't work from the UA point of view, because reality requires UAs to parse text/html using an HTML parser. Now, since UAs can't use an XML parser for parsing text/html anyway, it becomes useless for content providers to ensure that their text/html content is XML- parseable. Restricting the XML syntactic sugar, such as the use of CDATA sections or <foo/> vs. <foo></foo> on the application/xhtml+xml side would be wrong in principle, because it is wrong for a higher-layer spec to micromanage lower-layer syntactic sugar or, worse, give differences in syntactic sugar a difference in meaning. In practice, limiting XML details of the application/xhtml+xml serialization would be useless, because it is processed using XML processors which are required to support full syntactic sugar anyway. I think that your blog system is a special case. Considering that I have seen the Yellow Screen of Death on your blog, it appears that you aren't using an isolated serializer that could be swapped. However, the reason why your site works is that it is built vastly more competently than other systems that don't use an isolated serializer *and* because you are both the developer and the deployer and you care about these issues, you can and do fix bugs quickly. That just doesn't work with systems that aren't constantly managed by the developer. So no offense intended, but I think that what would work for you (or Jacques Distler) isn't generalizable. Rather, a warning to the effect of "professional driver on closed road" would be appropriate. :-) -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/
Received on Wednesday, 29 November 2006 08:05:04 UTC