Re: Validator errors from Harold A. Driscoll on 2000-01-31 (www-validator@w3.org from January 2000)

From: Harold A. Driscoll <harold@driscoll.chi.il.us>
Date: Mon, 31 Jan 2000 11:04:26 -0600
To: Dan Connolly <connolly@w3.org>
Cc: www-validator@w3.org
Message-Id: <Version.32.20000131000622.03540430@pop.interaccess.com>
At 23:09 30-01-00 , Dan Connolly wrote:
>> You can't make XHTML the default for documents without 
>> a DOCTYPE; it'll break just about anything out there.

While acknowledging that many documents do not have a DOCTYPE statement,
I'm far from convinced that anything near "just about anything out there"
is so lacking.

>> I thought the idea of serving XHTML as
>> text/html was pure idiocy to start with, but if you start assuming it's
>> XHTML in the validator you've thoroughly broken backwards compatibility.

I think there are a lot of us who'd feel hard-pressed to defend (or even
agree with) a lot of XML and XHTML issues... but that is not really an
issue here. By definition the validator validates against a set of
specification rules. Period.

>Was doctype-sniffing a documented feature of the validator? If so,
>I think Gerald's idea makes sense:
>
>	"I'm assuming XHTML; if you don't want that, here's info on adding
>	an HTML doctype..."
>
>If you're talking about backwards compatibility with HTML specs, none
>was promised for documents with no <!DOCTYPE...>:

My reading of the HTML 2.0 specification (RFC 1866) would disagree...at 3.3:

|| NOTE - If the body of a `text/html' message entity does not begin
|| with a document type declaration, an HTML user agent should infer
|| the above document type declaration.
||
|| <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Level 2//EN">

True, this inference is of essentially no practical value, and as you point
out, the HTML 3.2 document doesn't really respect this note, anyway.

>Earlier in this thread, Kynn Bartlett wrote:
>
>>>1.  Your page does not specify a DTD with a doctype statement.
>>>     This means that your level/flavor of HTML is undefined.
>
>No, that means the document doesn't conform to any of the HTML 2.0,
>HTML 3.2, HTML 4.0, nor HTML 4.01 specs, and it's not a strictly
>conforming XHTML document. The validator is testing to see
>if it's an XML document with the XHTML namespace.

I'd agree, unless one were claiming full HTML 2.0 compliance... such
argument being more theoretical than practical, of course. <g>

>> The only way to handle this that won't break badly is to assume that
>> text/xml is XML, text/xhtml is XHTML, text/html is HTML 4.01[0], unless a
>> DOCTYPE is given in which case the DOCTYPE is used.

Huh? I too am lost on this statement.

>> I was afraid this was due to bugs in my DOCTYPE 
>> guessing code,

>The whole idea of DOCTYPE guessing was pretty goofy, 
>if you ask me. It just seems to encourage folks to put 
>documents on the web that don't match the specs, and 
>there's plenty of tools to helpyou do that without adding 
>the validator to the list ;-)

I'll go somewhere between these two views... 

A validator serves two important functions, as a teaching (learning) tool,
and as a compliance and quality assurance test.

DOCTYPE guessing could well be useful, as a teaching tool... being able to
make more specific suggestions, an artificial-intelligence behavior
typically taken for granted by a human tutor or instructor.

Similarly, there is a value to allowing a DOCTYPE to be specified via the
submission form. In either case, the validation result should only be a
provisional pass, subject to actually adding the requisite DOCTYPE
statement to the file.


I stumbled upon the change this afternoon when I ran the output of a CGI
script through the tool. I'd done a cut-paste from another script, and lost
the DOCTYPE in the process. [That LWP does not make it easy, much less make
it more trouble to not include one than to do so, is hardly something of
which LWP has any reason to be proud, alas.]

I did a double take on the result, and would be in complete agreement that
the presentation and explanation could well be improved, particularly for
the novice. Of that, my sense is that Gerald is in agreement, and plans to
improve things.

As far as the default behavior, in the short term an argument can be made
that HTML 4.01 is a more pragmatic  How much so is going to depend on the
situation... will one more pass through the validator be that big of an
impact on the project, I very much doubt.

Will an XHTML assumption make a bad first impression, and discourage people
from using the tool? Perhaps. Often, however, I very much doubt. Presuming
helpful guidance, I see it as little of an issue, quite frankly.

I've always understood Kindler Gentler to be a function of two things,
being strictly-compliant with the specification meant that it rarely gave
bum advice [exception being of course the SGML-valid but HTML-invalid
constructs it would pass.] The second was that serious effort was made to
provide helpful output and error display and explanation.

The bottom line is, a compliant HTML 4 document (at least one of any
pragmatic value) _must_ have a DOCTYPE. When one is missing, guidance on
what is needed is an important part of the tool. If it tells that one is
missing, it has done its job. 

While it is desirable to identify more rather than fewer errors per
validation iteration , I'm also skeptical that really matters that much in
the big picture... is it really a big deal that one needs the right DOCTYPE
up front, rather than later in the process? Frankly, I don't think so.

So, we just need a tactful way for the validator to say "you goofed (lack
of DOCTYPE) so fix it NOW" (rather than possibly waiting until later in the
validation process). Hardly a big deal, methinks.

Safe computing,  /Harold
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Harold A. Driscoll                 mailto:Harold@Driscoll.Chi.IL.US
#include <std/disclaimer>                 http://Driscoll.Chi.IL.US
Received on Monday, 31 January 2000 12:16:27 UTC