Re: Doctype on text/html Pages (Was: [author-guide] Character Entity References Chart) from Robert J Burns on 2008-07-22 (public-html@w3.org from July 2008)

From: Robert J Burns <rob@robburns.com>
Date: Tue, 22 Jul 2008 18:58:00 +0300
To: Smylers <Smylers@stripey.com>
Cc: public-html WG <public-html@w3.org>
Message-Id: <C53B9197-8135-477D-B9B7-CDB0947EAEC4@robburns.com>
Hi Smylers,

On Jul 22, 2008, at 4:17 PM, Smylers wrote:

>
> Robert J Burns writes:
>
>> ... section 2.1 says that a Doctype is not required by the draft  
>> while
>> section 8.1 says that it is required by the draft: both  
>> normative. ...
>> a text/html serialized HTML5 document either does or doesn't  
>> require a
>> doctype depending on whether the author follows section 2.1 or  
>> section
>> 8.1.
>
> The only mention of "doctype" I can see in section 2.1 is:
>
>  Such XML documents may contain a DOCTYPE if desired, but this is not
>  required to conform to this specification.
>
> Clearly that's talking about XML, so isn't relevant to text/html.

I'm not sure that's clear at all. Why would we be discussing the  
optional use of a doctype other than HTML5's doctype? That's not our  
business. Obviously anyone using XML is permitted to use any doctype  
they want. They do not need the HTML5 recommendation to tell them  
that. But why not just make this simple and say the doctype for HTML5  
is "<!DOCTYPE html>" (case insensitive for text/html and case  
sensitive for XML) and be done with it? Why are we introducing all of  
this confusion into the recommendation? What problem statements do  
these complications address?

In any event, if we require the doctype for the text/html  
serialization and the XML recommendation requires it for our HTML5 XML  
serialization then why even mention it as not required to conform to  
the HTML5 specification. It simply adds in some obscure esoteric  
language and introduces an unnecessary difference between the two  
serializations.

Others off-list argued that section 8.1 requiring the doctype only  
applies to the text/html serialization, but it doesn't make that  
explicit. Section 8.2 specifically starts: "his section only applies  
to user agents, data mining tools, and conformance checkers." If this  
is true of all of chapter 8 why wouldn't that appear at the top of  
chapter 8, or repeated for each section for which it applies. It also  
would be clearer if we didn't try to use the term HTML (a fairly  
central term in our endeavor ) to mean both HTML (our specification)  
and the text/html serialization (one of the serializations of our  
specification).

To recap, if HTML5 requires a doctype for the html serialization. And  
XML requires one for our XML serialization (and the same one would do  
just fine). Then why introduce this difference. What use case does  
that difference address? Are we hoping that the next XML rev drops the  
doctype requirement?

Again, this has taken us away from the original discussion of named  
character references which do not have anything to do with this  
doctype discussion. Except they have something to do with it because  
we're inviting authors to use a doctype that is not specific to HTML5  
(or HTML at all) declare their HTML document type: which strikes me as  
very wrong.

For an XML implementation to process HTML5 it has to know when it is  
encountering HTML5 (or any similar HTML serialized as XML). Without a  
mechanism to do that, the processor will not know it is HTML. In  
documents conforming to the Namespaces in XML recommendation, the HTML  
namespace URI provides that information to the processor. In non- 
namespaced documents or in non-namespace aware processing  
applications, there has to be another mechanism. One mechanism might  
be that such processors will always encouter the doctype "<!DOCTYPE  
html>. The XML schema definition recommendation provides an other  
means, but it is somewhat dependent on namespaces already. That leaves  
only the doctype to indicate that the processor is processing HTML. So  
how can we leave that as something for other recommendations to worry  
about? How does an XML processing application know that it is  
processing an HTML (specifically an HTML5) document (for the authors  
who leave of the doctype because we allowed them to do so)?

Take care,
Rob
Received on Tuesday, 22 July 2008 15:58:57 UTC