- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Sat, 30 Aug 2008 20:24:41 +0200
- To: "public-html@w3.org" <public-html@w3.org>
We recently discussed the problems of getting XSLT 1.0 processors to produce valid HTML5, with the well-known issue of producing the HTML5 header. While we were discussing this, Henri pointed out that there are more problems to solve, for instance the addition of new void elements in HTML5. I think we should discuss this separately, thus this mail. I'll try to explain the problem first from the point of view of a someone trying to create HTML *programmatically*. When I say *programmatically*, I don't mean printf-style code, just concatenating lines of output, but generic code can that can serialize some kind of object representation of the document, such as (gasp) DOM, or code that is event-driven as a SAX output handler. The maybe most important change from SGML to XML was that a producer doesn't need to know the DTD of the vocabulary in order to produce a parseable document: for instance, in XML there are no void elements. All elements simply are parsed the same way. In HTML (originally designed as an SGML vocabulary) things work differently. Elements can be void (such as <meta> or <br>), and producers need to know that. In theory, a producer could use the HTML DTD to find that out (is anybody doing that?). In practice, a common approach is to hardwire the set of void elements present in HTML4, and to assume that any other element can be serialized as non-void. This is the approach used in XSLT 1.0, and also in common Java serializers that just use the XSLT's modules. I haven't checked, but I wouldn't be surprised if other libraries worked the same way. This means that anytime a new void element is introduced into HTML, all these implementations need to be updated. Not good. Thus, one would hope that all new elements simply are defined so that existing libraries can produce them. So, coming to HTML5, why exactly are we introducing new void elements such as <eventsource>, knowing that existing code will need to be updated to produce them? The same way HTML5 tries to protect existing Web content, it *should* also try to protect existing code, avoiding totally needless updates. Furthermore, what's the expectation for future iterations of HTML5, or HTML6? Will there be more void elements, again requiring changes in existing producers? As far as I can tell, there are at least two ways to avoid the problem: 1) Do not introduce new void elements, and state, once for all, that no new elements will be added beyond those in HTML4. 2) Keep introducing new void elements, but always allow non-void notation, such as <eventsource source="foo"></eventsource> instead of <eventsource source="foo"> And yes, the other alternative it to always produce XHTML which doesn't have any of these problems. But then, it doesn't work in IE. BR, Julian
Received on Saturday, 30 August 2008 18:25:33 UTC