Re: TAG ACTION-407 -- text/html media type and legacy

On Tue, 20 Apr 2010, Henry S. Thompson wrote:
> Ian Hickson writes:
> > To help evaluate the above suggestion, could I ask for a brief summary of 
> > the problems that the above changes are intended to address?
> Section 12.1 is in effect a media type registration, and as such
> should follow the guidelines for such from the IETF.  In particular,
> it is an update of an existing registration, and the IETF has
> guidelines for that case as well.  The proposed changes are directed
> at bringing the section in to line with those guidelines.
> In particular, the IETF guidelines are intended to guarantee the
> status of legacy documents doesn't change _with respect to the media
> type_.  That is, it is perfectly OK for a new version of a language
> specification to change what is and is not allowed -- HTML 4 did this,
> and so does HTML5, and that's just fine.  What is _not_ OK is for a
> media type registration to as it were blacklist the _serving_ of
> legacy content as the same media type it has been served with in the
> past.
> Against that background, the proposed changes basically try to make
> the section more like an update to its predecessor, RFC 2854, by
>  1) Incorporating an updated version of the RFC's historical
>     recapitulation;
>  2) Making clear that continuing to serve documents as text/html is OK
>     even if the documents concerned are not conformant to this
>     specification, but that they will be interpreted according to this
>     specification.
> Hope this helps,

Indeed, thanks, that's very helpful.

Looking back at the proposal with this in mind:

On Thu, 15 Apr 2010, Henry S. Thompson wrote:
> In section 12.1 [1]
> Add
>     *Introduction and background*
>       HTML has been in use in the World Wide Web information
>       infrastructure since 1990, and specified in various informal
>       documents.  The text/html media type was first officially
>       defined by the IETF HTML working group in 1995 in [HTML20].
>       Subsequent standardization work at the W3C relevant to this
>       media type was published in [HTML32], [HTML40] and [HTML401].
>       This registration updates [RFC2854] by identifying this
>       specification as the relevant specification, without ruling out
>       continued use of the text/html media type for older documents.

Adding an "introduction and background" entry in the registration template 
seems to be at odds with RFC4288. Would it be acceptable to add material 
that satisfies the rationale described above (such as the text proposed 
above) to the Introduction and History sections of the spec? That seems 
like it would be more consistent with how RFC2854 and other MIME 
registrations are written, with the introductory text being in a separate 
section than the MIME registration.

> and replace
>     *Interoperability considerations:*
>         Rules for processing both conforming and non-conforming
>         content are defined in this specification.
> with
>     *Interoperability considerations:*
>         This specification defines rules for processing not only
>         conforming also non-conforming documents, including those
>         which conform to the early specifications listed above.

It also has rules (the same rules) for processing documents that try but 
fail to conform to earlier specifications -- the rules cover any arbitrary 
bit stream. However, the rules don't cover how to handle features that 
have never been (widely) supported, for example it doesn't say how to 
handle <HP1> from Tim's earliest drafts, nor does it say how to handle 
obscure SGMLisms allowed in HTML4. Therefore I'm not sure the suggested 
text above is completely accurate.

Would it be acceptable to merely add a sentence such as the following?:

   These rules also define how to process legacy documents written for 
   earlier versions of HTML.

This would avoid implying that any document created today that complies to 
the old specs would work, but does mention that the rules are intended to 
be compatible with how the old specs were used.

> [and replace]
>     *Published specification:*
>         This document is the relevant specification. Labeling a
>         resource with the text/html type asserts that the resource is
>         an HTML document using the HTML syntax.
> [with]
>     *Published specification:*
>         This document is the relevant specification. Labeling a
>         document with the text/html type asserts that the document is
>         a member of the HTML family, as defined by this specification
>         or those listed above [ref Introduction and background], and
>         licenses its interpretation according to this specification.

I hesitate to use this exact text because the term "HTML family" is rather 
unclear. It also removes the mention of the carefully-defined term "HTML 
document", which I think is important.

Would the following be an acceptable compromise?:

         This document is the relevant specification. Labeling a
         resource with the text/html type asserts that the resource is
         to be interpreted as an HTML document using the HTML syntax, and 
         that it conforms either to this specification or to an earlier 
         HTML specification.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Tuesday, 20 April 2010 21:06:41 UTC