Re: HTML5 output method

On Dec 22, 2010, at 04:07, James Clark wrote:

> On the telcon, we talked about an HTML5 output method for XSLT/XQuery.
> I was looking at HTML output method for XSLT 1.0 to see how well it would work with HTML5.
> Although it's far from ideal, in practice I think it would work reasonably well.
> HTML5 has added a few void elements that XSLT won't know about (command, embed, keygen, source, track, wbr), but I believe HTML5's error handling will ignore the end-tag that XSLT will generate in these cases.
> Elements with a non-null namespace URI will be handled like XML, so embedded MathML and SVG will work fine provided you do use a namespace for these (and don't for HTML).  The user would also have to be careful to use the default namespace rather than a prefix for SVG and MathML.
> XSLT can't output <!DOCTYPE html>, but HTML5 allows <!DOCTYPE html SYSTEM "about:legacy-compat"> as an alternative.
> Have I missed any critical problems?

 * The following elements in the namespace must not generate an end tag: "area", "base", "basefont", "bgsound", "br", "col", "command", "embed", "frame", "hr", "img", "input", "keygen", "link", "meta", "param", "source", "track", "wbr"

 * Elements in no namespace can't be represented in text/html. However, for compatibility with legacy XSLT programs that put elements in no namespace for the "html" output mode, it would be convenient to treat elements in no namespace as if they were in the namespace.

 * Elements in the SVG and MathML namespaces won't "work fine" if handled like XML in the sense that namespace prefixes are generated for them. The output method must simply output the local name for elements in the SVG and MathML namespaces and must not generate namespace prefixes.

 * Attributes in the XLink namespace must get serialized with the prefix hard-wired to "xlink". (Attributes in the XML namespace must get serialized with the prefix hard-wired to "xml" as in XML.)

 * Need to figure out what to do about elements and attributes in namespaces that aren't representable in text/html. The simplest solution would be to make the serializer ignore them. See also

 * Streams produced by the new output method must start with <!DOCTYPE html> even if the XSLT program doesn't specify a doctype (and maybe even if it does).

 * Output escaping needs to be turned off for text content in these elements in the namespace: "iframe", "noembed", "noframes", "noscript", "plaintext", "script", "style", "xmp"

 * When the output method writes a start tag for "pre", "listing" or "textarea" in the namespace, the output method must immediately write one line feed character into the output stream (to be swallowed by the parser; this way if a text node starting with a line feed follows, *that* line feed round-trips properly).

 * The HTML5 serialization algorithm escapes U+00A0 as &nbsp; for compatibility with existing innerHTML getter behavior. An XSLT output method doesn't need to do that, though.

Henri Sivonen

Received on Monday, 3 January 2011 15:57:54 UTC