Are new void elements really a good idea?

We recently discussed the problems of getting XSLT 1.0 processors to 
produce valid HTML5, with the well-known issue of producing the HTML5 
header.

While we were discussing this, Henri pointed out that there are more 
problems to solve, for instance the addition of new void elements in 
HTML5. I think we should discuss this separately, thus this mail.

I'll try to explain the problem first from the point of view of a 
someone trying to create HTML *programmatically*. When I say 
*programmatically*, I don't mean printf-style code, just concatenating 
lines of output, but generic code can that can serialize some kind of 
object representation of the document, such as (gasp) DOM, or code that 
is event-driven as a SAX output handler.

The maybe most important change from SGML to XML was that a producer 
doesn't need to know the DTD of the vocabulary in order to produce a 
parseable document: for instance, in XML there are no void elements. All 
elements simply are parsed the same way.

In HTML (originally designed as an SGML vocabulary) things work 
differently. Elements can be void (such as <meta> or <br>), and 
producers need to know that. In theory, a producer could use the HTML 
DTD to find that out (is anybody doing that?). In practice, a common 
approach is to hardwire the set of void elements present in HTML4, and 
to assume that any other element can be serialized as non-void. This is 
the approach used in XSLT 1.0, and also in common Java serializers that 
just use the XSLT's modules. I haven't checked, but I wouldn't be 
surprised if other libraries worked the same way.

This means that anytime a new void element is introduced into HTML, all 
these implementations need to be updated. Not good. Thus, one would hope 
that all new elements simply are defined so that existing libraries can 
produce them.

So, coming to HTML5, why exactly are we introducing new void elements 
such as <eventsource>, knowing that existing code will need to be 
updated to produce them? The same way HTML5 tries to protect existing 
Web content, it *should* also try to protect existing code, avoiding 
totally needless updates.

Furthermore, what's the expectation for future iterations of HTML5, or 
HTML6? Will there be more void elements, again requiring changes in 
existing producers?

As far as I can tell, there are at least two ways to avoid the problem:

1) Do not introduce new void elements, and state, once for all, that no 
new elements will be added beyond those in HTML4.

2) Keep introducing new void elements, but always allow non-void 
notation, such as

   <eventsource source="foo"></eventsource>

instead of

   <eventsource source="foo">


And yes, the other alternative it to always produce XHTML which doesn't 
have any of these problems. But then, it doesn't work in IE.

BR, Julian

Received on Saturday, 30 August 2008 18:25:33 UTC