Re: Validator errors from Bless Terje on 2000-01-31 (www-validator@w3.org from January 2000)

From: Bless Terje <link@rito.no>
Date: Mon, 31 Jan 2000 21:25:58 +0100
To: "'W3C Validator'" <www-validator@w3.org>
Message-ID: <22FD5BD2DBC5D211BE0D0008C7A4E87FD9B41B@odin4.rito.no>
Harold A. Driscoll <harold@driscoll.chi.il.us> wrote:

>Thank you, sir.

Gee, thanks, that /really/ helped my self esteem. :-)



>Relevant here is the actual subset of documents being presented
>to a validator, either those presumed to be nearly valid, or
>those learning in the process.

Hmm... Good point! I hadn't taken that into account.


>Being a validator, we can expect (hope) for valid input.

Interesting point of view.

Methinks I shall have to ponder these issues further and render my
thoughts to you upon the morrow.  I'm a bit tired to make heads or
tails of them at the moment, but the point you raise regarding the
way the validator is used, and the kind of input we can expect, do
need to be considered fairly carefully.  I have based my reasoning
on an assumption that the relevant metric is the relative ratio of 
the two kinds of documents on the Internet and failed to take into
account the fact that this may be different from what is presented
to The W3C HTML Validation Service in actual usage.  Any thoughts?


>>I would claim the most reasonable assumption would 
>>be that it is in fact some form of HTML, 
>
>Hmmm, am I missing something, or are you suggesting that the 
>~validator~, when given a valid XHTML document (without a DOCTYPE),
>should process it as HTML 4.0, and reject it as invalid?

Yes. When presented with a document where you have no other information
about the content (because there is no DOCTYPE present) than that it is
something called "text/html". Isn't it then reasonable to assume that
that is in fact exactly what it is? If it had been served as text/xhtml
there'd be no problem telling that it was XHTML, but when it's labeled
as HTML it should be validated as HTML IMO.

To put it another way. You have a file called "index.html", and the web
server says that it is of type "text/html", but it has no DOCTYPE present;
isn't it reasonable to assume that it is HTML? Why would you assume it
was XHTML when there are no signs pointing in that direction and in fact
all existing signs point to it being HTML?


The problem is to allow serving of XHTML as text/html _and_ apply the
logic that since all valid HTML documents have a DOCTYPE, any document
served as text/html which lacks a DOCTYPE must therefore be XHTML. This
has the effect that no HTML file which lacks a DOCTYPE can be validated
because the validator validates it against the XHTML DTD with the XML
processor, as opposed to the HTML DTD with the SGML parser. Since the
DOCTYPE isn't required to make documents render more or less as intended
on user systems, it's _extremely_ easy to leave it out (for some reason).


However, in light of the new perspective you brought me above, I think
I'll have to reconsider whether this is a problem in practice or not;
the lesser evil and all that.


>Oh, but US government officials assure us repeatedly that both 
>are totally safe

Actually, fertilizer is likely to be on the FBI Watch List given how
those Internet Hacker types sit around making recipies for bombs and
plans to overthrow the government to start a pornographic spam-factory
(no, not the kind trademarked by Hormel. the other kind) on the White
House lawn. :-(


>A validator should play by the rules, presume good behavior, and be
>courteous with less than ideal behavior. I find the alternative ~you
>probably screwed-up, so we'll act accordingly~  distasteful, 
>whether done by a validator or by a prejudicial police official.

Yeah, but the majority screw up; not out of incompetence, but because
they've been mis-informed. Assume HTML and you're rude to the 1% of
users with XHTML 1.0, which is served as text/html, _and_ has no DOCTYPE.
Assume XHTML and you're rude to the 99% of users with /some/ form of HTML
without a DOCTYPE. Simple numbers: 99% vs. 1%.

The numbers are of course open for discussion as they have been chosen
on no solid evidence, but a general feel and opinion about the subject
matter. No hard data.

Of course, I would argue that we alienate all 100% of them. :-)
No defaults and "no DOCTYPE == no validation".



Anyways, gotta go catch some Zzzzzz......
Received on Monday, 31 January 2000 15:25:49 UTC