[whatwg] Allow trailing slash in always-empty HTML5 elements?

On Nov 28, 2006, at 23:20, Sam Ruby wrote:

> In HTML5, there are a number of elements with a content model of  
> empty: area, base, br, col, command, embed, hr, img, link, meta,  
> and param.
>
> If HTML5 were changed so that these elements -- and these elements  
> alone -- permitted an optional trailing slash character, what  
> percentage of the web would be parsed differently?

Obviously, 0% with parsers that opt to implement the HTML5 parsing  
algorithm with error recovery as opposed to Draconian error handling-- 
except for the detail whether error-reporting parsers report an error  
or not. (In theory, this is an issue for non-browser UAs that opt to  
implement Draconian error handling. In practice, even my mostly  
Draconian parser treats this particular error as non-fatal, because  
it is so common and so easily recoverable.)

> The basis for my question is the observation that the web browsers  
> that I am familiar with apparently already operate in this fashion,  
> this usage seems to have crept into quite a number of diverse  
> places, and all this is coupled with Lachlan's observations[3] on  
> what it would take to change the popular WordPress application to  
> produce HTML5 compliant output.

WordPress is a soup-in-soup-out system that shouldn't be trying to  
produce the XML syntax in the first place. But now that WP is using  
it, the question becomes: which is more costly: asking the WP  
developers to change their system or to adjust the definition of  
conformance so that WP looks conforming more easily.

Anyway, as Lachlan already pointed out, whether or not the useless  
slash should be allowed on elements whose content model is empty is  
not an issue of technical damage to parsing interoperability but  
about damage to the mental model of confused authors. So the cost to  
consider is the cost of the confusion.

> As a side benefit of this change, I believe that I could modify my  
> weblog to be simultaneously both HTML5 and XHTML5 compliant, modulo  
> the embedded SVG content, something that would needs to be  
> discussed separately.

I am against blurring the distinction between the XML serialization  
and the HTML serialization. The infamous Appendix C didn't bring  
about good things.

Having a text/html serialization that is also parseable as XML  
doesn't work from the UA point of view, because reality requires UAs  
to parse text/html using an HTML parser. Now, since UAs can't use an  
XML parser for parsing text/html anyway, it becomes useless for  
content providers to ensure that their text/html content is XML- 
parseable.

Restricting the XML syntactic sugar, such as the use of CDATA  
sections or <foo/> vs. <foo></foo> on the application/xhtml+xml side  
would be wrong in principle, because it is wrong for a higher-layer  
spec to micromanage lower-layer syntactic sugar or, worse, give  
differences in syntactic sugar a difference in meaning. In practice,  
limiting XML details of the application/xhtml+xml serialization would  
be useless, because it is processed using XML processors which are  
required to support full syntactic sugar anyway.

I think that your blog system is a special case. Considering that I  
have seen the Yellow Screen of Death on your blog, it appears that  
you aren't using an isolated serializer that could be swapped.  
However, the reason why your site works is that it is built vastly  
more competently than other systems that don't use an isolated  
serializer *and* because you are both the developer and the deployer  
and you care about these issues, you can and do fix bugs quickly.  
That just doesn't work with systems that aren't constantly managed by  
the developer.

So no offense intended, but I think that what would work for you (or  
Jacques Distler) isn't generalizable. Rather, a warning to the effect  
of "professional driver on closed road" would be appropriate. :-)

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/

Received on Wednesday, 29 November 2006 08:05:04 UTC