Re: Semicolon after entities

Lachlan Hunt wrote:

 > > The HTML5 spec is attempting to define how to handle all HTML now and in
 > > the future.  With the unfortunate exception of IE, browsers will not be
 > > adding additional DOCTYPE sniffing to distinguish between HTML5 and
 > > other revisions.

That is, I think at the very centre of this debate/argument/w-h-y, although
this is the first explicit mention that I have seen.  Web Apps 1 (I avoid
calling it HTML5, since there is by no means universal agreement that
Web Apps 1 should become HTML5) appears to be defining (amongst other
things) a processing model that will allow all HTML pages to be
processed in the same way (including an attempt to define the behaviour
if a document is ill-formed).  What I believe is really needed is
about as diametrically opposed to this as can be imagined : a processing
model which varies with the DOCTYPE.  I have little objection to it
defining a processing model which treats HTML 3.2 and earlier as tag
soup.  HTML 4.0 was a mistake, HTML 4.01 corrected the error and -- if
it had been properly used in the wild -- could have been parsed and
processed more rigorously : as it is, there is such a corpus of
ill-formed legacy documents that one has little choice but to once
again allow the tag-soup model.

But HTML5 should be different.  This is surely the time at which to
say "enough is enough" : either a document is well-formed (in which
case its processing is well-defined) or it is not, in which case
the browser can process it as it will.  There is <shout>no need</>
for all browsers to handle something that /alleges/ to be HTML5
consistently if the document is defective (poorly formed).  Indeed,
if browsers /do/ vary wildly in their treatment of ill-formed
HTML5 documents, there will be far greater pressure on /hoi polloi/
to write good, well-formed, HTML5 if they wish their offerings to
be seen consistently.  Thus, IMHO, HTML5 can be processed quite
differently to earlier, legacy, DTDs and it should be quite correct
for a conforming browser to switch processing models (from "lax"
to "strict") when an HTML5 DTD is detected.

To summarise, I think the following statement, taken from Web Apps 1,
is fundamentally flawed and requires radical re-thinking if sanity
is to prevail :

 8.1.1. The DOCTYPE

 A DOCTYPE is a mostly useless, but required, header.

 DOCTYPEs are required for legacy reasons. When omitted,
 browsers tend to use a different rendering mode that is
 incompatible with some specifications. Including the
 DOCTYPE in a document ensures that the browser makes
 a best-effort attempt at following the relevant specifications.

I would re-cast this along the lines of the following :

 8.1.1. The DOCTYPE

 A DOCTYPE is a much abused, but required, header.

 Until the introduction of HTML5, DOCTYPEs have -- in the
 main -- been mere eye-candy at the start of a putative
 HTML document.  With the introduction of HTML5, the
 DOCTYPE plays a vital role in determining the processing
 model for HTML documents.  If a well-formed HTML5 DOCTYPE
 is found (in the syntactically correct position), a
 conforming browser is REQUIRED to adopt the strict processing
 model described elsewhere in this specification.  If such
 a DOCTYPE is NOT found (or is found but in a position where
 its semantics are undefined), then a conforming browser is
 entitled to adopt any processing model that it deems fit.

Philip Taylor

Received on Thursday, 26 April 2007 08:58:58 UTC