Re: HTML/XML TF Minutes 4 Jan 2011 from Henri Sivonen on 2011-01-05 (public-html-xml@w3.org from January 2011)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 5 Jan 2011 10:18:54 +0200
To: public-html-xml@w3.org
Message-Id: <89802A19-9655-40A6-9326-E8519B72165A@iki.fi>
On Jan 5, 2011, at 02:20, James Clark wrote:

> Sorry to miss this telcon,  I was asleep.  Realistically, if the telcon is at this time every week (10pm for me), my attendance will be spotty.
> 
> 
>   Henri: Last time we talked about the possibility of a new mode that would
>   make the parser more XML-like, I'd like to point to out that that would be
>   divergent from the legacy code path.
>   ... Convergence in one place may cause divergence in another place.
> 
> Didn't understand this. What does "another place" mean?

It means the code path(s) for consuming legacy text/html content.

That part of the telecon log was a voice rehash of a part of http://lists.w3.org/Archives/Public/public-html-xml/2011Jan/0041.html. Copied below.

On Dec 22, 2010, at 02:58, James Clark wrote:

> The backwards compatibility constraint is that you can't break (any significant amount of) existing content on the Web.  I appreciate and agree with that constraint.
> 
> However, this constraint alone does not require the parsing incompatibilities between HTML5 and XML.  The parsing incompatibilities only become required when you add in the design goal to eliminate modes i.e. that standards mode will be made as close as possible to quirks mode.  Now I can certainly see the advantages of this design goal, but there are also significant costs, and I think reasonable people can disagree about the right tradeoff.
> 
> Let's take perhaps the most egregious example, that HTML5 requires that <br> be treated like </br>. As of only a year or so ago, both WebKit and Gecko had made the judgement that a different treatment of </br> was desirable in standards mode (i.e. ignore it).  This is something that informed people can different opinions on.
> 
> I think presenting XML/HTML5 incompatibilities as a necessary consequence of backwards compatibility is deeply misleading.

The HTML5 effort has tried to minimize divergence and to actively seek convergence between the code paths for legacy text/html content and for newly-authored text/html content. The HTML5 effort has also sought to minimize divergence on the DOM level between tree created by parsing text/html and tree created by parsing application/xhtml+xml. The HTML5 effort has also reduced syntactic divergence by making valid syntactic talismans made popular by the infamous Appendix C.

If you wanted to make the code path for processing HTML.next more like the code path for processing XML, you'd make it less like the code path(s) for processing legacy text/html. That is, you'd introduce convergence relative to one thing but divergence relative to another. To see only the convergence, you'd need to pretend to forget about the divergence that got introduced relative to the other point of reference. Indeed, this is the trick the W3C used for the past decade: The W3C pretended HTML had been end-of-lifed and didn't exist anymore, so there was only one glorious unified XML stack to observe. But this didn't change the reality that implementations still had to support HTML.

If you introduce HTML.next that's so unlike legacy HTML that a new mode is needed but not enough like XML to use the XML code path, you haven't created convergence or reduced the number of stacks. Instead, there'd be one more stack that's divergent from both legacy HTML *and* XML!

What about an HTML.next that's 100% convergent with XML and has a mode switch for opting into? It turns out that we already have that! It's called XHTML5 and the mode switch is the Content-Type: application/xhtml+xml HTTP header. Even better than some yet-to-be-defined HTML.next mode, it's already supported by the latest versions of the top browsers (if you count IE9 as the latest version of IE).

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Wednesday, 5 January 2011 08:20:00 UTC