Re: Semicolon after entities from Philip & Le Khanh on 2007-04-26 (www-html@w3.org from April 2007)

From: Philip & Le Khanh <Philip-and-LeKhanh@royal-tunbridge-wells.org>
Date: Thu, 26 Apr 2007 10:41:00 +0200
To: "W3C HTML Mailing List" <www-html@w3.org>
Cc: "Lachlan Hunt" <lachlan.hunt@lachy.id.au>
Message-ID: <46306594.5020409@Royal-Tunbridge-Wells.Org>

Lachlan Hunt wrote:

> The HTML5 spec is attempting to define how to handle all HTML now and in 
> the future.  With the unfortunate exception of IE, browsers will not be 
> adding additional DOCTYPE sniffing to distinguish between HTML5 andother  
> revisions.

That is, I think at the very centre of this debate/argument/w-h-y, although
this is the first explicit mention that I have seen.  Web Apps 1 (I avoid
calling it HTML5, since there is by no means universal agreement that
Web Apps 1 should become HTML5) appears to be defining (amongst other
things) a processing model that will allow all HTML pages to be
processed in the same way (including an attempt to define the behaviour
if a document is ill-formed).  What I believe is really needed is
about as diametrically opposed to this as can be imagined : a processing
model which varies with the DOCTYPE.  I have little objection to it
defining a processing model which treats HTML 3.2 and earlier as tag
soup.  HTML 4.0 was a mistake, HTML 4.01 corrected the error and -- if
it had been properly used in the wild -- could have been parsed and
processed more rigorously : as it is, there is such a corpus of
ill-formed legacy documents that one has little choice but to once
again allow the tag-soup model.

But HTML5 should be different.  This is surely the time at which to
say "enough is enough" : either a document is well-formed (in which
case its processing is well-defined) or it is not, in which case
the browser can process it as it will.  There is <shout>no need</>
for all browsers to handle something that /alleges/ to be HTML5
consistently if the document is defective (poorly formed).  Indeed,
if browsers /do/ vary wildly in their treatment of ill-formed
HTML5 documents, there will be far greater pressure on /hoi polloi/
to write good, well-formed, HTML5 if they wish their offerings to
be seen consistently.  Thus, IMHO, HTML5 can be processed quite
differently to earlier, legacy, DTDs and it should be quite correct
for a conforming browser to switch processing models (from "lax"
to "strict") when an HTML5 DTD is detected.

To summarise, I think the following statement, taken from Web Apps 1,
is fundamentally flawed and requires radical thinking if sanity
is to prevail :

	8.1.1. The DOCTYPE

	A DOCTYPE is a mostly useless, but required, header.

	DOCTYPEs are required for legacy reasons. When omitted,
	browsers tend to use a different rendering mode that is
	incompatible with some specifications. Including the
	DOCTYPE in a document ensures that the browser makes
	a best-effort attempt at following the relevant specifications.

I would re-cast this along the lines of the following :

	8.1.1. The DOCTYPE

	A DOCTYPE is a much abused, but required, header.

	Until the introduction of HTML5, DOCTYPEs have -- in the
	main -- been mere eye-candy at the start of a putative
	HTML document.  With the introduction of HTML5, the
	DOCTYPE plays a vital role in determining the processing
	model for HTML documents.  If a well-formed HTML5 DOCTYPE
	is found (in the syntactically correct position), a
	conforming browser is REQUIRED to adopt the strict processing
	model described elsewhere in this specification.  If such
	a DOCTYPE is NOT found (or is found but in a position where
	its semantics are undefined), then a conforming browser is
	entitled to adopt any processing model that it deems fit.

Philip Taylor

Received on Thursday, 26 April 2007 10:02:19 UTC