- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Thu, 26 Jun 2008 14:27:13 +0300
- To: HTML Issue Tracking WG <public-html@w3.org>
On Jun 26, 2008, at 11:33, HTML Issue Tracking Issue Tracker wrote: > ISSUE-54 (html5-doctype-vs-xslt): XSLT 1.0 can not generate HTML5 > documents [HTML 5 spec] > Of XSLT's output modes (XML/HTML/TEXT), none can currently be used > to produce the HTML5 doctype string, as defined in <http://www.w3.org/html/wg/html5/#the-doctype > >. I disagree with the simplified framing of the issue, since it gives the wrong idea of how little fixing is needed and where the sensible place for the fix is. The doctype is the least of the problems with XSLT and HTML5. The text output mode of XSLT in not meant for tree languages serialized as tags and is, therefore, inappropriate for HTML5. The XML output mode of XSLT isn't suited for producing HTML5, because in HTML5 elements with no content and void elements are distinct. The HTML output mode of XSLT is seriously flawed when it comes to divorcing the tree representation from the serialization in the programming model: HTML5 defines HTML elements to go into the "http://www.w3.org/1999/ xhtml" namespace in order to abstract away the difference of serialization from programs that operate on a namespace-aware tree representation. HTML5 parsers that expose XML APIs to allow unified application internals regardless of whether the data came in as text/ html or application/xhtml+xml put HTML elements in the "http://www.w3.org/1999/xhtml " per spec. Moreover, with support for MathML and SVG, there can also be element nodes in those namespaces. Programs operating on trees shouldn't have to have different code throughout depending on whether the program is targeted at text/html or application/xhtml+xml. It should follow that the sane way to create XSLT programs that deal with HTML5 would be to write those programs to output HTML elements in the "http://www.w3.org/1999/xhtml" namespace, MathML elements in the "http://www.w3.org/1998/Math/MathML " namespace and SVG elements in the "http://www.w3.org/2000/svg" and not let the choice of text/html vs. application/xhtml+xml output mode permeate all the code of the XSLT program. Unfortunately, the HTML output mode of XSLT wants HTML elements to appear in no namespace. With Saxon 9, this has two practical problems even without SVG and MathML: Void elements aren't serialized as void elements with method='html' when they are in the "http://www.w3.org/1999/xhtml" namespace and prefixes aren't removed on elements than aren't in no namespace (and chances are you want to use prefixes with elements in the XSLT program). XSLT 2.0 method='xhtml' in Saxon 9 doesn't have the former problem for old void elements but does have the latter problem. All in all, the XSLT built-in serialization modes as implemented in a popular (the most popular?) server-side processor (on the browser side, serialization doesn't matter as the output tree is a DOM straight away) are pretty seriously broken as far as producing HTML5 output goes if you don't want to put all the HTML elements in the wrong namespace throughout the XSLT program and pass-through input. (If you put elements in the wrong namespace throughout your code base of XSLT programs, upgrading to XHTML becomes even harder than it already is for other reasons.) I think the right way to deal with this is to define an HTML5 output method for XSLT. In the interim, the right way is to take DOM or SAX output from the XSLT processor and to run a DOM-to-HTML5 or SAX-to- HTML5 serializer outside the XSLT processor. (I intend to ship a foreign content-enhanced serializer with the next release of XSLT4HTML5.) Even with an HTML5-specific serializer, there's the problem that an XSLT program can create trees that aren't round-trippably serializable as text/html. However, this is not a problem introduced by HTML5. It's a problem method='html' has regardless of the existence of HTML5. Finally, in the bug report, interim workarounds based on disabling output escaping were countered as disabling output escaping being an optional feature. I'm tempted to point out that serialization as a whole is an optional feature in XSLT 2.0, so the whole premise of the bug report is that you are using an optional feature (a built-in output mode). (I do think that disabling output escaping isn't a proper solution.) -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Thursday, 26 June 2008 11:27:56 UTC