Re: Re-registration of text/html from Henri Sivonen on 2010-03-11 (public-html@w3.org from March 2010)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 11 Mar 2010 02:47:21 -0800 (PST)
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Cc: Julian Reschke <julian.reschke@gmx.de>, Ian Hickson <ian@hixie.ch>, HTMLwg <public-html@w3.org>
Message-ID: <1862955924.43410.1268304441661.JavaMail.root@cm-mail03.mozilla.org>

"Leif Halvard Silli" <xn--mlform-iua@målform.no> wrote:

> Henri Sivonen, Wed, 10 Mar 2010 07:12:54 -0800 (PST):
> > It doesn't follow that the HTML5 spec or now-current tools should 
> > make any particular effort to support authoring according to a
> legacy 
> > spec.
> 
> When I first read about HTML5 and during the discussions after this WG
> was started, I often heard that one goal was to preserve HTML for the
> future. To make sure that we will be able to parse today's documents
> in the future. 
> 
> What is behind the hype slowly dawns on me ... 

It's a goal to preserve HTML for the future by supporting the consumption of existing content by new software.

It's not a goal to preserve old HTML specs or to make an effort to facilitate the continue use of the features of old HTML specs when authoring new content.

> If we had simply said that "when using text/html, then the syntax must
> be text/html parser compatible", then OK. This what the text/html RFC
> says *now*. 
> 
> But to indicate that anything served as text/html *is* HTML5, is 
> confusing.

The point is that content labeled as text/html is to be processed according to HTML5. What content labeled as text/html "is" is an ontological rathole. I believe it's more productive to discuss how content labeled as text/html must be processed by new software and how authors should use the text/html label is new authoring than to discuss what content labeled as text/html "is".

New software must process text/html content according to HTML5 (until HTML5 is superseded in the future by a newer backwards-compatible spec at which point then-new software must process text/html content according to that spec instead).

For newly-authored Web hypertext markup content, authors have the choice of requesting processing according to the HTML rules or according to the XHTML rules. The latest definition of both rules is the HTML5 spec. To request processing according to the HTML rules (with bugs), the author must use the text/html label. To request processing according to the XHTML rules (with bugs), the author must use an XML content type (preferably application/xhtml+xml). There are no other choices available to authors in practice.

> >> Why does Validator.nu offer to validate HTML4 documents as HTML5 
> >> documents? Why does it offer to validate text/html XHTML1 documents
> as
> >> HTML5? Etc.
> > 
> > To ease migration.
[...]
> Why not accept any doctype that triggers standards mode?

To comply with the Degrade Gracefully Design Principle, it wouldn't be sufficient to allow doctypes that trigger the standards mode per HTML5. Instead, the allowed set would need to be the intersection of the standards-mode triggering doctype from HTML5, legacy Trident, legacy Gecko, legacy WebKit and legacy Presto. Working out what that intersection is exactly is too much work compared to the benefit.

The current set of doctypes allowed by the HTML5 spec and Validator.nu covers the doctypes that both trigger the standards more and have been *previously* approved for text/html use by W3C REC track documents. Except for the XHTML 1.1 doctype, those happen to be the common cases, too. After the common cases, diminishing marginal benefit kicks in. (Diminishing marginal benefit always tends to kick in in purity exercises.)

("Previously" is a key word in the previous paragraph. I think this WG shouldn't change its deliverables to match subsequent publications of the XHTML2 WG that's now even out-of-charter or stipulations the XHTML2 WG has made in WG Note publications that bypassed the W3C REC track.)

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Thursday, 11 March 2010 10:47:54 UTC