Re: ISSUE-54 (html5-doctype-vs-xslt): XSLT 1.0 can not generate HTML5 documents [HTML 5 spec]

Some time ago 
Henri wrote:

> ...
> The text output mode of XSLT in not meant for tree languages  
> serialized as tags and is, therefore, inappropriate for HTML5.


> The XML output mode of XSLT isn't suited for producing HTML5, because  
> in HTML5 elements with no content and void elements are distinct.


> The HTML output mode of XSLT is seriously flawed when it comes to  
> divorcing the tree representation from the serialization in the  
> programming model:
> HTML5 defines HTML elements to go into the " 
> xhtml" namespace in order to abstract away the difference of  
> serialization from programs that operate on a namespace-aware tree  
> representation. HTML5 parsers that expose XML APIs to allow unified  
> application internals regardless of whether the data came in as text/ 
> html or application/xhtml+xml put HTML elements in the " 
> " per spec. Moreover, with support for MathML and SVG, there can also  
> be element nodes in those namespaces. Programs operating on trees  
> shouldn't have to have different code throughout depending on whether  
> the program is targeted at text/html or application/xhtml+xml.

I agree that this is a worthwhile goal.

> It should follow that the sane way to create XSLT programs that deal  
> with HTML5 would be to write those programs to output HTML elements in  
> the "" namespace, MathML elements in the " 
> " namespace and SVG elements in the "" and  
> not let the choice of text/html vs. application/xhtml+xml output mode  
> permeate all the code of the XSLT program. Unfortunately, the HTML  
> output mode of XSLT wants HTML elements to appear in no namespace.


The fix for this problem is to either first generate XHTML, and then run 
another XSLT-based postprocessing step, or to auto-generate one XSLT 
from the other (for instance, the HTML-generating one from the 
XHTML-generating one).

This is a known issue, and people have been dealing with it for a long time.

> With Saxon 9, this has two practical problems even without SVG and  
> MathML: Void elements aren't serialized as void elements with  
> method='html' when they are in the ""  
> namespace and prefixes aren't removed on elements than aren't in no  
> namespace (and chances are you want to use prefixes with elements in  
> the XSLT program).

Well, that's what you get for not using the right namespace :-).

> XSLT 2.0 method='xhtml' in Saxon 9 doesn't have the former problem for  
> old void elements but does have the latter problem.
> All in all, the XSLT built-in serialization modes as implemented in a  
> popular (the most popular?) server-side processor (on the browser  
> side, serialization doesn't matter as the output tree is a DOM  
> straight away) are pretty seriously broken as far as producing HTML5  
> output goes if you don't want to put all the HTML elements in the  
> wrong namespace throughout the XSLT program and pass-through input.

I totally disagree that this means it is broken. Use it the way it was 
designed, and the problem goes away.

> (If you put elements in the wrong namespace throughout your code base  
> of XSLT programs, upgrading to XHTML becomes even harder than it  
> already is for other reasons.)

Again, that is something that can be automated (particularly well with 
XSLT, btw).

> I think the right way to deal with this is to define an HTML5 output  
> method for XSLT. In the interim, the right way is to take DOM or SAX  
> output from the XSLT processor and to run a DOM-to-HTML5 or SAX-to- 
> HTML5 serializer outside the XSLT processor. (I intend to ship a  
> foreign content-enhanced serializer with the next release of  

I agree that a new HTML5 output mode may be needed at some point of 
time. The problem is that we won't be getting it anytime soon.

> Even with an HTML5-specific serializer, there's the problem that an  
> XSLT program can create trees that aren't round-trippably serializable  
> as text/html. However, this is not a problem introduced by HTML5. It's  
> a problem method='html' has regardless of the existence of HTML5.

So that's irrelevant for this discussion.

> Finally, in the bug report, interim workarounds based on disabling  
> output escaping were countered as disabling output escaping being an  
> optional feature. I'm tempted to point out that serialization as a  
> whole is an optional feature in XSLT 2.0, so the whole premise of the  
> bug report is that you are using an optional feature (a built-in  
> output mode). (I do think that disabling output escaping isn't a  
> proper solution.)

In yesterday's conference call, another problem was mentioned: HTML5 
introduces new void elements 
(<>), which of course the 
HTML output mode in existing XSLT engines does not know about.

Thoughts on that:

- we have evidence that one HTML-producing framework (XSLT) is affected 
by that, so it wouldn't come as a surprise when they are more -- 
wouldn't it be better to define these elements so that the non-empty 
syntax is allowed as well?

- what's the extensibility story for future versions of HTML (HTML6)? 
Can they introduce new empty elements, causing the same type of breakage 

BR, Julian

Received on Friday, 22 August 2008 16:41:18 UTC