Request for Volunteers: Polyglot spec from Sam Ruby on 2010-03-26 (public-html@w3.org from March 2010)

From: Sam Ruby <rubys@intertwingly.net>
Date: Fri, 26 Mar 2010 16:52:48 -0400
To: HTML WG <public-html@w3.org>
CC: Technical Architecture Group <tag@w3.org>
Message-ID: <4BAD1EA0.2070808@intertwingly.net>
I took an action item from the TAG yesterday to convey the following 
request:

     The W3C TAG requests there should be in TR space a document
     which specifies how one can create a set of bits which can
     be served EITHER as text/html OR as application/xhtml+xml,
     which will work identically in a browser in both bases.
     (As Sam does on his web site.)

This request requires a lot of explanation.  To start, it is recognized 
up front that this will be a subset of the set of possible documents 
that can be expressed as HTML5.  This is entirely OK.  For example, if 
it were to be the case that such a subset were to entirely disallow 
scripts of any kind, that would be acceptable as there exists a 
substantial class of documents which do not require scripting of any kind.

As luck would have it, many scripts are possible even with these 
constraints.  But even here additional explanation is required.  The 
tbody element is optional even in XHTML, but the lack of such elements 
may affect the operation of scripts such as the JQuery Tablesorter[1] 
plugin.  This means that users that wish to conform to this subset have 
two choices: they can either treat the tbody element as required or they 
can forgo the use of those scripts that rely on this element.

Related work to draw upon:

(a) http://www.w3.org/TR/xhtml1/#guidelines - this is actually in the 
right spirit, but significantly incomplete and now a bit out of date.

(b) http://wiki.whatwg.org/wiki/HTML_vs._XHTML - quite a bit more up to 
date and more comprehensive (but still not complete); but identifies the 
problems and not the solutions.  What's desired is something prescriptive.

(c) http://www.la-grange.net/2009/07/05/html5-xhtml5/ - this is the 
converse of what is desired.  What's desired here is a description of a 
single serialization.  For example, in the serialization diagram[2], 
neither serialization depicted are true polyglots.  Adding the HTML5 
doctype and html/@lang attribute to the XHTML5 serialization would be 
required.

Additional background:

There was a discussion as to whether or not application/xhtml+xml was 
sufficient for this use case, particularly in light of the Microsoft IE9 
Platform Preview's support for this.  The feeling of the TAG was that 
this was not sufficient as text/html would continue to be popular for 
the foreseeable future.

Additionally, it is known that there are tools in various stages of 
maturity for a number of programming languages which can take a HTML5 
document and produce an XML DOM.  The intent here is to document how to 
create a serialization which does not require such tools, i.e., the 
document can be parsed directly with an existing XML parser.

Final notes:

I filled in a bit of context based on my understanding of the problem 
domain.  I'm copying the TAG in case any of these additions 
misrepresents what is being requested.  I personally don't have the 
bandwidth to edit such a spec, but I would be quite willing to review, 
comment, and contribute to such a spec.

- Sam Ruby

[1] http://tablesorter.com/docs/
[2] 
http://www.la-grange.net/2009/07/05/html5-xhtml5/image/html5-serializations.png
Received on Friday, 26 March 2010 20:53:13 UTC