Re: Beta: Fatal Error: No DOCTYPE specified! from Bjoern Hoehrmann on 2002-10-25 (www-validator@w3.org from October 2002)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Fri, 25 Oct 2002 20:06:52 +0200
To: Terje Bless <link@pobox.com>
Cc: W3C Validator <www-validator@w3.org>
Message-ID: <3dc879d2.12901010@smtp.bjoern.hoehrmann.de>
* Terje Bless wrote:
>>Without the public beeing able to keep track of these bugs, it's quite
>>useless to inform it on bug numbers...
>
>The bug numbers will be referenced in release notes for future versions so
>you'll be able to check whether your bug was addressed.

Better point at the bug report by reference to the bug report...

>[Validator refuses to validate documents without doctypedecl]
>
>Ok. But then lets try to restate the problem. The problem isn't that we
>refuse to validate in the absence of a DOCTYPE, the problem is that the
>current user interface and error messages aren't informative enough, or
>they are confusing and/or obscure.

No, the problem is that most web pages don't have a correct document
type declaration and by refusing to validate such documents, the
validator refuses to help most web page authors -- and that's bad.

If I enter my web page's address and click on "validate" I expect the
validator to give me a report on what's potentially wrong with my page
and not telling me my web page just sucks.

It may sound unlikely, but there are people out there who care about the
quality of their web pages but don't know anything about this document
type declaration, FPI, public identifier, system identifier, SGML short
tags, ISO-8859-1 and whatever-geek-speech-mumbo-jumbo-stuff. Did you
never come across people asking what to choose from this "document type"
drop down control? People who don't know the difference between HTML 2.0
and HTML 3.2 or between HTML 4.01 Transitional and XHTML 1.0 Strict? I
don't like to tell them they are too dump to use the validator.

>Now how can we improve on that?

If the validator is unable to detect a document type declaration use the
most likely and most loose document type for validation. Report the
missing declaration as error, that the validator used a default document
type and the errors in the document according to the default document
type. Also allow him to choose a different document type declaration for
validation.

I really don't see why this could by any worse than refusing to work at
all. Refusing to validate is not only annoying to Joe Avarage, but also
to experts who are trying to help Joe with problems on his page. They
will first go to the validator and check how bad his markup actually is
to take either a closer look at it or tell Joe he needs to fix many
errors before they are able to look at something specific. Even if the
"fatal error" page allows them to choose a document type to validate
against, they still need several steps to get results at all and that's
annoying, unnecessarily annyoing.

>We spent a considerable amount of effort on dispensing with the need to
>"guess" DOCTYPEs for a reason. For a lint it would be a very good thing,
>but for something that aspires to "formal validator" status, it just
>doesn't wash.

So you *do* want a formal validation tool for experts,
not a helpful tool for everyone?

>Perhaps if it hadn't been tried and found to be an utter
>unmaintainable mess you would have stood a chance at convincing me, but
>right now I think it highly unlikely that this will change.

Just ignore what I've written before and answer me, why and for whom is
it better if the validator refuses to validate instead of defaulting to
the most loose document type for validation?

>>>>2) the page should display the revalidate form
>>>Agreed, but it may be tricky to implement.
>>
>>Not my fault :-)
>
>Are you sure? I /need/ someone to blame it on. Maybe the dog did it?

Sure it wasn't Microsoft? :-)

>The phrase "prior to the root element" is friendly?!?!? :-)

If you replace "root" by "<html>" and provide an example of what is
meant then yes, I consider this a good compromise between friendliness
and correctness. But I gess the former is hard to implement, because
without a document type declaration you are unable to tell whether a
document is a (X)HTML document at all...

>>[...] I don't see a good reason to ignore XHTML 1.0's "Use both the
>>lang and xml:lang attributes when specifying the language of an element"
>>and thereby advertising "incompatible" markup.
>
>Hmmm. Do me a favour and play devil's advocate for a moment. What are the
>_downsides_ to including both?

Downsides of redundancy? Well, waste of octets? If you care about that,
you could send the "Content-Language" HTTP header but there is an actual
downside that it is unspecified, whether XML documents inherit the
language from the HTTP header or not.
Received on Friday, 25 October 2002 14:06:15 UTC