Re: ISSUE-54 (html5-doctype-vs-xslt): XSLT 1.0 can not generate HTML5 documents [HTML 5 spec] from Henri Sivonen on 2008-08-28 (public-html@w3.org from August 2008)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 28 Aug 2008 13:08:31 +0300
To: Jirka Kosek <jirka@kosek.cz>
Cc: HTML Issue Tracking WG <public-html@w3.org>
Message-Id: <E6CB2E03-834A-44B3-82D9-FEDAB747C26B@iki.fi>
On Jul 5, 2008, at 00:44, Jirka Kosek wrote:

> Henri Sivonen wrote:
>
>> I disagree with the simplified framing of the issue, since it gives  
>> the wrong idea of how little fixing is needed and where the  
>> sensible place for the fix is. The doctype is the least of the  
>> problems with XSLT and HTML5.
>
> Hi Henri, actually there are two issues. One is very simple -- how to
> allow producing of HTML5 compliant output with *existing* XSLT  
> language
> and its implementation. This issue is very important because it is  
> very
> common approach for producing HTML content. Moreover even HTML WG
> charter explicitly states that "legacy implementation" of "classic  
> HTML"
> should be taken into account. And XSLT could be considered as such  
> legacy application.

To use the existing XSLT language and implementations, one needs to  
use <xsl:text disable-output-escaping="yes">. It's the  
document.write() of the XML community, so it's ugly. However, it can  
be done. It is an optional feature, but then having a serializer at  
all in an XSLT processor is optional. For the optionality to be a  
problem, it would need to be shown that there are notable  
implementations that do implement the optional <xsl:output  
method="html"/> but don't implement the optional disable-output- 
escaping="yes".

> Of course there is second issue on which you really elaborate in your
> email and this is how to extend some *future version* of XSLT language
> and its implementation to support all bits of HTML5. I almost agree  
> with
> your analysis on this issue.

The issues can be fixed without changing the XSLT language. I released  
version 1.1.0 of the Validator.nu HTML Parser the other day. The  
package comes with a sample program that uses an unmodified XSLT  
engine (whatever you have set as the TrAX default) with an HTML5  
parser and an HTML5 serializer. There's running code for addressing  
the issues *today*.
http://about.validator.nu/htmlparser/

On the serialization side, it is up to the programmer of the XSLT  
transformation to make sure that the output tree is conforming XHTML5  
+ SVG 1.1 + MathML 2.0. If it isn't, the serialization results can be  
wildly wrong. However, this isn't worse than <xsl:output method="html"/ 
 >, since it, too, produces wrong results if the XSLT programmer  
doesn't make sure the output trees are sanely shaped HTML 4 trees.

>> HTML5 defines HTML elements to go into the "http://www.w3.org/1999/xhtml 
>> " namespace in order to abstract away the difference of  
>> serialization from programs that operate on a namespace-aware tree  
>> representation. HTML5 parsers that expose XML APIs to allow unified  
>> application internals regardless of whether the data came in as  
>> text/html or application/xhtml+xml put HTML elements in the "http://www.w3.org/1999/xhtml 
>> " per spec. Moreover, with support for MathML and SVG, there can  
>> also be element nodes in those namespaces. Programs operating on  
>> trees shouldn't have to have different code throughout depending on  
>> whether the program is targeted at text/html or application/xhtml 
>> +xml.
>
> On the other hand, in past HTML (4 and previous) has not been using
> anything like namespaces while XHTML used this concept. If you have
> existing XSLT code that emits HTML and you want to use few new  
> elements
> introduced in HTML5 why you should also start thinking about  
> namespaces?

Starting to think about namespaces is not cool, but XSLT is on the XML  
side of the fence, and XML has namespaces, so XHTML5 has them.

> You simply want to add those few new tags into your stylesheet and  
> modify public identifier to make it clear that you are using brand  
> new HTML5 language.

That works for people who know both XSLT and HTML really well.  
However, for everyone who isn't a language lawyer at the bounds of  
this approach are mysterious and arbitrary. That is, you hit the  
limits of what the HTML output method of XSLT can do and those limits  
depend on historical details.

> So, your idea sounds perfectly reasonable and I think once there is  
> something like HTML5 output method in XSLT and HTML5 is widely  
> deployed everyone should use such approach. But we are not there  
> yet, we can propose such academically clean approach, but at the  
> same time we should pragmatically solve todays' problems.

Within these constraints, there's <xsl:text disable-output- 
escaping="yes">, which doesn't require us to allow cruftier syntactic  
alternatives in HTML5 syntax.

If we allow a placeholder public id, cargo cultists will think that  
the more complicated syntax is somehow better because HTML 4 had  
similar cruft and cruft exists for a *reason*, will make up a  
rationalization for it that doesn't even mention XSLT (something like  
"it helps browsers better understand semantics") and will start  
evangelizing the more crufty syntax to other people who will end up  
wasting their time looking up a public id that is useless if they  
aren't using XSLT. Time is the most valuable resource people have, so  
inflicting time-wasting cruft on Web authors isn't nice.

>> I think the right way to deal with this is to define an HTML5  
>> output method for XSLT.
>
> I agree, and I'm willing to manage that next version of XSLT will  
> have such method. Of course this means that serialization of HTML5  
> and other related issues are resolved before. Is this part of HTML5  
> stable or are there any changes expected?

The SVG stuff is still commented out. Also, Julian contested the new  
void elements.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Thursday, 28 August 2008 10:09:14 UTC