Re: Doctype detection from Jan Roland Eriksson on 2000-07-26 (www-html@w3.org from July 2000)

From: Jan Roland Eriksson <jrexon@newsguy.com>
Date: Wed, 26 Jul 2000 19:35:04 +0200
To: webmaster@richinstyle.com
Cc: www-html@w3.org, www-style@w3.org
Message-ID: <hr1unssvg93uh8avj3gtcpb0uo7u0e2nkv@4ax.com>
On Wed, 26 Jul 2000 10:19:00 +0100, Matthew Brealey
<webmaster@richinstyle.com> wrote:

(I usually don't make long posts, but this is one is important)

>Ian Hickson wrote:
>> 
>> On Tue, 25 Jul 2000, Matthew Brealey wrote:

>> > 2. Nowhere in the spec does it say: 'if you don't include a DOCTYPE, the
>> > browser can screw things up': IT SHOULDN'T MATTER.

I second that...

>> Just for the record, you are wrong.
>> 
>> Section B.1 of HTML 4.01 states:
>> # This specification does not define how conforming user agents handle
>> # general error conditions [...]
>> 
>> Since error handling is undefined, screwing things up is a perfectly
>> valid response. As is cooking some toast, dialling the FBI or starting
>> up the screensaver.

That's not "the whole truth and nothing but the truth" so to say.

UA's designed for "mass market" use in the form of e.g. www clients
are _not_ designed to be "validating SGML systems", we have to move
to clients like "Panorama", or even better, "MultiDoc-Pro", to find
that level of elegance in a client system.

What this means is that in the processing characteristics of a
"non validating SGML system", a <!DOCTYPE... declaration shall
never be an issue...

>Is this really a valid response? Is it:
>(a) legitimate for browsers to employ doctype-detection
>    in order to trigger more (CSS) bugs?
>(b) really an error response to HTML to break CSS?

Good questions, and the correct answer to both of them is; No!

And the following line of reasoning is my support for that...

From RFC1866 (HTML2)

3.3. HTML Public Text Identifiers

   To identify information as an HTML document conforming to
   this specification, each document must start with one of
   the following document type declarations.

   <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">

   This document type declaration refers to the
   HTML DTD in 9.1, "HTML DTD".

 ->   NOTE - If the body of a `text/html' message entity does
 ->   not begin with a document type declaration, an HTML user
 ->   agent should infer the above document type declaration.

And that "NOTE" uses the word 'should' as in...

  should
          If a document or user agent conflicts with this
          statement, undesirable results may occur in practice
          even though it conforms to this specification.

That "NOTE" in RFC1866 section 3.3 is a _normative_ part of
the HTML2 specification.

Further; I have re-checked HTML3.2 (Wilbur) trying to find any
kind of entry there, that would directly invalidate RFC1866
sect. 3.3, and found none. So I suggest that through the
history up til HTML4, RFC1866 sect. 3.3 is still a _normative_
part of HTML.

Now for the interesting parts of HTML4.01

In section 4. we can find this to start with...

 'At times, the authors of this specification recommend good
  practice for authors and user agents. These recommendations
  are not normative and conformance with this specification
  does not depend on their realization. These recommendations
  contain the expression "We recommend ...", "This specification
  recommends ...", or some similar wording.'

Note that "recommendations" in HTML4.01 are _not_normative_
parts of the HTML4.01 specification.

Further; In section B.1 of HTML4.01 we find...

 "The HTML 2.0 specification ([RFC1866]) observes that many
  HTML 2.0 user agents assume that a document that does not
  begin with a document type declaration refers to the
  HTML 2.0 specification. As experience shows that this is a
  poor assumption, the current specification does not recommend
  this behavior."

A _non_normative_ recommendation not to infer a HTML2
<!DOCTYPE... declaration for documents served with an
HTTP Content-Type of text/html, but without the <!DOCTYPE...
declaration.
 
RFC1866 is still a "winner" by having the only _normative_
part of a spec on this so far; lets go on...

From HTML4.01 section 4.1 Definitions...

 "For reasons of backwards compatibility, we recommend that
  tools interpreting HTML 4 continue to support HTML 3.2
  (see [HTML32]) and HTML 2.0 (see [RFC1866])."

A _non_normative_ recommendation to still support older
HTML specs.

Summary:

1) The only _normative_ statement we have in the history of
   HTML specs so far is from RFC1866 that "orders" conforming
   UA's to infer a strict HTML2 DTD reference for documents
   with an HTTP Content-Type header of 'text/html', if the DTD
   ref is missing in the data stream.
  "undesirable results" may be the effect of not following
   this "order"...

2) HTML4.01 has a _non_normative_ statement that says...
  "break the order" given in 1) above...

3) HTML4.01 has a _non_normative_ statement that says...
  "it's not absolutely required by UA's to be backwards
   compatible with older HTML specs"

So if 3) is to be followed, then 2) should be followed too.
The result of that is that UA's should stop supporting the
"Transitional" and "Frameset" DTD references all together
and only make a "strict" processing mode available.

The sheer existence of a "quirks processing mode" strongly
indicates that UA's really wants to be "backwards compatible".
If so, the recommendation in 2) shall not be followed either,
for consistency, and then we have RFC1866 as a "winner" again.

And there's no problem what so ever to design an excellent stylesheet
suggestion, using contextual selectors, for a strict HTML2 doc.
(and as I said earlier, no "mass market" UA today cares about the
real purpose of a DTD reference since none of them are designed to
be "validating SGML systems" in the first place)

Don't use "doctype-sniffing" for the wrong purpose, doing that
will only create a new set of problems that we need to discuss
again some years from now.

-- 
Jan Roland Eriksson <jrexon@newsguy.com>
<URL:http://member.newsguy.com/%7Ejrexon/>
Received on Wednesday, 26 July 2000 13:35:04 UTC