Re: ISSUE-54 (html5-doctype-vs-xslt): XSLT 1.0 can not generate HTML5 documents [HTML 5 spec] from Julian Reschke on 2008-08-22 (public-html@w3.org from August 2008)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Fri, 22 Aug 2008 18:40:19 +0200
To: public-html@w3.org
Message-ID: <48AEEBF3.2010302@gmx.de>
Some time ago 
(<http://lists.w3.org/Archives/Public/public-html/2008Jun/0320.html>), 
Henri wrote:

> ...
> The text output mode of XSLT in not meant for tree languages  
> serialized as tags and is, therefore, inappropriate for HTML5.

Yes.

> The XML output mode of XSLT isn't suited for producing HTML5, because  
> in HTML5 elements with no content and void elements are distinct.

Yes.

> The HTML output mode of XSLT is seriously flawed when it comes to  
> divorcing the tree representation from the serialization in the  
> programming model:
> 
> HTML5 defines HTML elements to go into the "http://www.w3.org/1999/ 
> xhtml" namespace in order to abstract away the difference of  
> serialization from programs that operate on a namespace-aware tree  
> representation. HTML5 parsers that expose XML APIs to allow unified  
> application internals regardless of whether the data came in as text/ 
> html or application/xhtml+xml put HTML elements in the "http://www.w3.org/1999/xhtml 
> " per spec. Moreover, with support for MathML and SVG, there can also  
> be element nodes in those namespaces. Programs operating on trees  
> shouldn't have to have different code throughout depending on whether  
> the program is targeted at text/html or application/xhtml+xml.

I agree that this is a worthwhile goal.

> It should follow that the sane way to create XSLT programs that deal  
> with HTML5 would be to write those programs to output HTML elements in  
> the "http://www.w3.org/1999/xhtml" namespace, MathML elements in the "http://www.w3.org/1998/Math/MathML 
> " namespace and SVG elements in the "http://www.w3.org/2000/svg" and  
> not let the choice of text/html vs. application/xhtml+xml output mode  
> permeate all the code of the XSLT program. Unfortunately, the HTML  
> output mode of XSLT wants HTML elements to appear in no namespace.

Yes.

The fix for this problem is to either first generate XHTML, and then run 
another XSLT-based postprocessing step, or to auto-generate one XSLT 
from the other (for instance, the HTML-generating one from the 
XHTML-generating one).

This is a known issue, and people have been dealing with it for a long time.

> With Saxon 9, this has two practical problems even without SVG and  
> MathML: Void elements aren't serialized as void elements with  
> method='html' when they are in the "http://www.w3.org/1999/xhtml"  
> namespace and prefixes aren't removed on elements than aren't in no  
> namespace (and chances are you want to use prefixes with elements in  
> the XSLT program).

Well, that's what you get for not using the right namespace :-).

> XSLT 2.0 method='xhtml' in Saxon 9 doesn't have the former problem for  
> old void elements but does have the latter problem.
> 
> All in all, the XSLT built-in serialization modes as implemented in a  
> popular (the most popular?) server-side processor (on the browser  
> side, serialization doesn't matter as the output tree is a DOM  
> straight away) are pretty seriously broken as far as producing HTML5  
> output goes if you don't want to put all the HTML elements in the  
> wrong namespace throughout the XSLT program and pass-through input.

I totally disagree that this means it is broken. Use it the way it was 
designed, and the problem goes away.

> (If you put elements in the wrong namespace throughout your code base  
> of XSLT programs, upgrading to XHTML becomes even harder than it  
> already is for other reasons.)

Again, that is something that can be automated (particularly well with 
XSLT, btw).

> I think the right way to deal with this is to define an HTML5 output  
> method for XSLT. In the interim, the right way is to take DOM or SAX  
> output from the XSLT processor and to run a DOM-to-HTML5 or SAX-to- 
> HTML5 serializer outside the XSLT processor. (I intend to ship a  
> foreign content-enhanced serializer with the next release of  
> XSLT4HTML5.)

I agree that a new HTML5 output mode may be needed at some point of 
time. The problem is that we won't be getting it anytime soon.

> Even with an HTML5-specific serializer, there's the problem that an  
> XSLT program can create trees that aren't round-trippably serializable  
> as text/html. However, this is not a problem introduced by HTML5. It's  
> a problem method='html' has regardless of the existence of HTML5.

So that's irrelevant for this discussion.

> Finally, in the bug report, interim workarounds based on disabling  
> output escaping were countered as disabling output escaping being an  
> optional feature. I'm tempted to point out that serialization as a  
> whole is an optional feature in XSLT 2.0, so the whole premise of the  
> bug report is that you are using an optional feature (a built-in  
> output mode). (I do think that disabling output escaping isn't a  
> proper solution.)

In yesterday's conference call, another problem was mentioned: HTML5 
introduces new void elements 
(<http://www.w3.org/html/wg/html5/#void-elements>), which of course the 
HTML output mode in existing XSLT engines does not know about.

Thoughts on that:

- we have evidence that one HTML-producing framework (XSLT) is affected 
by that, so it wouldn't come as a surprise when they are more -- 
wouldn't it be better to define these elements so that the non-empty 
syntax is allowed as well?

- what's the extensibility story for future versions of HTML (HTML6)? 
Can they introduce new empty elements, causing the same type of breakage 
again?

BR, Julian
Received on Friday, 22 August 2008 16:41:18 UTC