Re: TAG ACTION-407 -- text/html media type and legacy from Leif Halvard Silli on 2010-04-20 (public-html@w3.org from April 2010)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Wed, 21 Apr 2010 00:29:22 +0200
To: Ian Hickson <ian@hixie.ch>
Cc: "Henry S. Thompson" <ht@inf.ed.ac.uk>, public-html@w3.org, Paul Cotton <Paul.Cotton@microsoft.com>, Maciej Stachowiak <mjs@apple.com>, "noah_mendelsohn@us.ibm.com" <noah_mendelsohn@us.ibm.com>, "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <20100421002922873090.e6bdda98@xn--mlform-iua.no>
Ian Hickson, Tue, 20 Apr 2010 21:06:09 +0000 (UTC):
> On Tue, 20 Apr 2010, Henry S. Thompson wrote:
>> Ian Hickson writes:
>> 
>>> To help evaluate the above suggestion, could I ask for a brief summary of 
>>> the problems that the above changes are intended to address?
>> 
>> Section 12.1 is in effect a media type registration,
  ....
> Looking back at the proposal with this in mind:
> 
> On Thu, 15 Apr 2010, Henry S. Thompson wrote:
>> 
>> In section 12.1 [1]
>> 
>> Add
>> 
>>     *Introduction and background*

> Adding an "introduction and background" entry in the registration template 
> seems to be at odds with RFC4288.

But RFC4288 itself has a 'Historical Note' as the main bulk of its 
first section?

> Would it be acceptable to add material 
> that satisfies the rationale described above (such as the text proposed 
> above) to the Introduction and History sections of the spec? That seems 
> like it would be more consistent with how RFC2854 and other MIME 
> registrations are written, with the introductory text being in a separate 
> section than the MIME registration.

And here you acknowledge that RFCs have an introductory text. FWIW, 
such a change would make the entire spec, instead of merely 12.1, the 
MIME registration.

>> and replace
>> 
>>     *Interoperability considerations:*
>>         Rules for processing both conforming and non-conforming
>>         content are defined in this specification.
>> 
>> with
>> 
>>     *Interoperability considerations:*
>> 
>>         This specification defines rules for processing not only
>>         conforming also non-conforming documents, including those
>>         which conform to the early specifications listed above.
> 
> It also has rules (the same rules) for processing documents that try but 
> fail to conform to earlier specifications -- the rules cover any arbitrary 
> bit stream. However, the rules don't cover how to handle features that 
> have never been (widely) supported, for example it doesn't say how to 
> handle <HP1> from Tim's earliest drafts, nor does it say how to handle 
> obscure SGMLisms allowed in HTML4. Therefore I'm not sure the suggested 
> text above is completely accurate.

It anyhow defines *the* rules for processing such documents. 

Btw RFC2854 doesn't speak about SGML and HTML4 doesn't expect SGML UAs. 
The actual confusion about the so called obscure SGML-isms is low. (The 
confusing thing is more that HTML5 forbids even SGML syntax - such as 
PIs - which it defines the UA handling of.)

> Would it be acceptable to merely add a sentence such as the following?:
> 
>    These rules also define how to process legacy documents written for 
>    earlier versions of HTML.
> 
> This would avoid implying that any document created today that complies to 
> the old specs would work, but does mention that the rules are intended to 
> be compatible with how the old specs were used.

The expression "These rules also define" makes it sound as if the spec 
has special section where it deals with the issues of how to process 
legacy documents - which is not the case. 

The text you object to more clearly states that HTML5 defines a set of 
processing rules which are to be followed both for legacy and HTML5 
compatible documents. And, as for "how the old specs were used", then I 
don't see anything of that in your replacement text. 

The text from Henry also defines documents which conform to earlier 
specs as non conforming: "not only conforming [but] also non-conforming 
documents, including those which conform to the earlier specifications 
listed above". May be it should also add that non-conforming features 
are handled as described in the HTML5 spec. That, I think, should 
remove any doubt about how things are expected to be handled.

(But here I must wonder: why does the text you object to not mention 
XHTML1? After all HTML defines how to handle a lot of XML-isms, and 
even makes XML/XHTML syntax valid as HTML syntax. Se more on this 
below.)

>> [and replace]
>> 
>>     *Published specification:*
>>         This document is the relevant specification. Labeling a
>>         resource with the text/html type asserts that the resource is
>>         an HTML document using the HTML syntax.
>> 
>> [with]
>> 
>>     *Published specification:*
>> 
>>         This document is the relevant specification. Labeling a
>>         document with the text/html type asserts that the document is
>>         a member of the HTML family, as defined by this specification
>>         or those listed above [ref Introduction and background], and
>>         licenses its interpretation according to this specification.
> 
> I hesitate to use this exact text because the term "HTML family" is rather 
> unclear. 

It is not unclear in the context: "the HTML family, as defined by this 
specification [aka HTML5] and those listed above" [namely HTML20, 
HTML32, HTML40, HTML401].

That said, I agree that *outside* this context, then "HTML family" is 
unclear. Many see HTML and XHTML as family.  Not only that: RFC2854 
talks about a HTML401 compatible version 'XHTML1', whereas I see no 
mention of XHTML1 in any of these introductory notes. However, if no 
one else has an an issue with that, then OK. RFC2854 anyhow allows 
XHTML1 on the terms of HTMl401. But still, I think the text that Henry 
sent us could seek to perhaps clarify the issue w.r.t. XHTML1.

> It also removes the mention of the carefully-defined term "HTML 
> document", which I think is important.

It also removes your reference to "HTML syntax", which I think is good, 
because your text points, via a link, to a definition of the syntax as 
found in HTML5 itself. Whereas HTML2, HTML32, HTML40 and HTML401 have a 
different syntax. For that reason, I find the text coming from Henry 
clearer.

When it comes to "HTML Document", then, in the current draft, it 
contains a link to a section in the spec which, in the opening words 
speaks about "XML documents" before it talks about "HTML documents":

]]
Document objects are assumed to be XML documents unless they are 
flagged as being HTML documents when they are created.
[[

Is this "carefully-defined"? Having your words about the unclarity of 
"HTML family" in mind, then the link from "HTML document" to the above 
quote does bring to mind that HTML and XHTML are actually family ...

> Would the following be an acceptable compromise?:
> 
>    This document is the relevant specification. Labeling a
>    resource with the text/html type asserts that the resource is
>    to be interpreted as an HTML document using the HTML syntax, and 
>    that it conforms either to this specification or to an earlier 
>    HTML specification.

Without any endorsal: If "HTML syntax" *still* is intended to link to 
the HTML syntax as defined in HTML5, 
<http://dev.w3.org/html5/spec/syntax.html#syntax>, then the text 
remains confusing. To instead say e.g. something like this: "and that 
it conforms either to the HTML syntax, as defined by this 
specification, or to an earlier specification of the HTML syntax", 
would IMHO be clearer.
-- 
leif halvard silli
Received on Tuesday, 20 April 2010 22:30:08 UTC