Re: ISSUE-54 (html5-doctype-vs-xslt): XSLT 1.0 can not generate HTML5 documents [HTML 5 spec]

On Jun 26, 2008, at 11:33, HTML Issue Tracking Issue Tracker wrote:

> ISSUE-54 (html5-doctype-vs-xslt): XSLT 1.0 can not generate HTML5  
> documents [HTML 5 spec]

> Of XSLT's output modes (XML/HTML/TEXT), none can currently be used  
> to produce the HTML5 doctype string, as defined in < 
> >.

I disagree with the simplified framing of the issue, since it gives  
the wrong idea of how little fixing is needed and where the sensible  
place for the fix is. The doctype is the least of the problems with  

The text output mode of XSLT in not meant for tree languages  
serialized as tags and is, therefore, inappropriate for HTML5.

The XML output mode of XSLT isn't suited for producing HTML5, because  
in HTML5 elements with no content and void elements are distinct.

The HTML output mode of XSLT is seriously flawed when it comes to  
divorcing the tree representation from the serialization in the  
programming model:

HTML5 defines HTML elements to go into the " 
xhtml" namespace in order to abstract away the difference of  
serialization from programs that operate on a namespace-aware tree  
representation. HTML5 parsers that expose XML APIs to allow unified  
application internals regardless of whether the data came in as text/ 
html or application/xhtml+xml put HTML elements in the " 
" per spec. Moreover, with support for MathML and SVG, there can also  
be element nodes in those namespaces. Programs operating on trees  
shouldn't have to have different code throughout depending on whether  
the program is targeted at text/html or application/xhtml+xml.

It should follow that the sane way to create XSLT programs that deal  
with HTML5 would be to write those programs to output HTML elements in  
the "" namespace, MathML elements in the " 
" namespace and SVG elements in the "" and  
not let the choice of text/html vs. application/xhtml+xml output mode  
permeate all the code of the XSLT program. Unfortunately, the HTML  
output mode of XSLT wants HTML elements to appear in no namespace.  
With Saxon 9, this has two practical problems even without SVG and  
MathML: Void elements aren't serialized as void elements with  
method='html' when they are in the ""  
namespace and prefixes aren't removed on elements than aren't in no  
namespace (and chances are you want to use prefixes with elements in  
the XSLT program).

XSLT 2.0 method='xhtml' in Saxon 9 doesn't have the former problem for  
old void elements but does have the latter problem.

All in all, the XSLT built-in serialization modes as implemented in a  
popular (the most popular?) server-side processor (on the browser  
side, serialization doesn't matter as the output tree is a DOM  
straight away) are pretty seriously broken as far as producing HTML5  
output goes if you don't want to put all the HTML elements in the  
wrong namespace throughout the XSLT program and pass-through input.  
(If you put elements in the wrong namespace throughout your code base  
of XSLT programs, upgrading to XHTML becomes even harder than it  
already is for other reasons.)

I think the right way to deal with this is to define an HTML5 output  
method for XSLT. In the interim, the right way is to take DOM or SAX  
output from the XSLT processor and to run a DOM-to-HTML5 or SAX-to- 
HTML5 serializer outside the XSLT processor. (I intend to ship a  
foreign content-enhanced serializer with the next release of  

Even with an HTML5-specific serializer, there's the problem that an  
XSLT program can create trees that aren't round-trippably serializable  
as text/html. However, this is not a problem introduced by HTML5. It's  
a problem method='html' has regardless of the existence of HTML5.

Finally, in the bug report, interim workarounds based on disabling  
output escaping were countered as disabling output escaping being an  
optional feature. I'm tempted to point out that serialization as a  
whole is an optional feature in XSLT 2.0, so the whole premise of the  
bug report is that you are using an optional feature (a built-in  
output mode). (I do think that disabling output escaping isn't a  
proper solution.)

Henri Sivonen

Received on Thursday, 26 June 2008 11:27:56 UTC