Backward-compatibility of text/html media type (ACTION-334)

Hash: SHA1

I took an action [1] to "Start an email thread regarding the treatment
of pre-HTML5 versions in the media type registration text of HTML5".

The relevant part of the HTML5 spec., that is, the proposed text/html
media type registration [2], currently reads:

  Published specification:
    This document is the relevant specification. Labeling a resource
    with the text/html type asserts that the resource is an *HTML
    document* using *the HTML syntax*

The terms "HTML document" and "the HTML syntax" are defined in such a
way as to rule out the use of text/html for a wide range of documents
currently served (legitimately) as such.

In contrast, the current registration [3] describes the history of the
development of HTML, and identifies five distinct versions for which
the text/html media type is appropriate, with references to their
official definitions.

RFC 4288, which governs media type registration, says with respect to
updating an existing registration [4]:

  Changes should be requested only when there are serious omissions or
  errors in the published specification.  When review is required, a
  change request may be denied if it renders entities that were valid
  under the previous definition invalid under the new definition.

The relevant HTML5 issue [5] has been put into suspended animation on
the grounds that it is not a showstopper until HTML5 approaches
Proposed Recommendation, but I am not 100% sure this is that case.

Crucially, I am _not_ satisfied with the following hypothetical response
to this issue:

 "No worries.  All those old documents will be processed just fine by
  the processing rules of the HTML5 specification.  The fact that they
  are not conforming documents is irrelevant."

because it amounts to asking people to lie about their documents. If
I have, for example, an HTML4.01 document which begins

 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">

or has no DOCTYPE, or has an HTML5 DOCTYPE (<!DOCTYPE html>) and 
uses any of the "12.2 Non-conforming features" [6], and I continue to
serve it as text/html, then I am, according to the proposed media
type registration, asserting that it "uses the HTML syntax" when I
know full well that it does not.

The obvious way around this would simply be to modify the proposed
text/html registration section to explicitly grandfather in the
historical language definitions.

Given that HTML 4.01 summarily dispensed with various legacy tags [7]
and deprecated a number of other tags which are being outlawed in
HTML5, this does seem like the right thing to do -- there was in HTML
4.01 no claim to backward compatibility with HTML3.2, so there is no
problem with HTML5 taking a similar approach.

But I'm worried that if this issue is delayed, the combination of
an eagerness to get to PR and an inclination in at least some quarters
to offer the hypothetical response above will lead to no action, and
then a messy confrontation with the IETF. . .

So, two questions:

 1) Is my analysis of the facts correct?
 2) Is it necessary to press on this issue _now_, or can it wait until
    after Last Call?


    (if this doesn't resolve, try
- -- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
                         Half-time member of W3C Team
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 651-1426, e-mail:
[mail really from me _always_ has this .sig -- mail without it is forged spam]
Version: GnuPG v1.2.6 (GNU/Linux)


Received on Wednesday, 2 December 2009 15:24:11 UTC