W3C home > Mailing lists > Public > www-validator@w3.org > October 2002

Re: Beta: Fatal Error: No DOCTYPE specified!

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 31 Oct 2002 11:44:59 +0000 (GMT)
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: Terje Bless <link@pobox.com>, W3C Validator <www-validator@w3.org>
Message-ID: <Pine.LNX.4.21.0210311130150.32717-100000@dhalsim.dreamhost.com>

On Thu, 31 Oct 2002, Bjoern Hoehrmann wrote:
> 
> "Why do I need a document type declaration?"
> "The Validator won't validate without."

No, it's:

   "Why do I need a document type declaration?"
   "Your document is invalid without one."

If the answer to that is "I don't care about validation" then it really
doesn't matter what the validator does, since they just said they don't
care about validation. If they _do_ care about validation, which would
make sense if they are asking the question, then they'll add one.


> Let me repeat:
> 
>   if element html has attribute xmlns='http://www.w3.org/1999/xhtml'
>     default to XHTML 1.0 Transitional
>   else
>     default to HTML 4.01 Transitional

No offense, but that is _hopelessly_ naiive.

You can't know whether the <html> element has a particular attribute until
after you've parsed the document, and you can't parse the document until
after you've decided whether it's HTML or XHTML.

Not only that, but depending on whether you start parsing as HTML or as
XHTML, you will get different results from your heuristic. See, for
example, these pathological cases which demonstrate the problem:

   http://www.damowmow.com/playground/html-not-xml.html
   http://www.damowmow.com/playground/html-not-xml-2.html

Furthermore, the HTML working group has stated that user agents (and the
validator is a user agent) should not attempt to detect XHTML in text/html
documents. To wit:

| There should be no sniffing of text/html documents to see if they are
| really XHTML.
 -- http://lists.w3.org/Archives/Public/www-html/2000Sep/0024.html


> You are talking about a best guess, I just want a reasonable default.

Actually, simply assuming HTML 4.01 Strict (for text/html content) might
not be a bad idea. But anything more than a simple default (and sniffing
for XHTML is more than a simple default) would be bad.

-- 
Ian Hickson                                      )\._.,--....,'``.    fL
"meow"                                          /,   _.. \   _\  ;`._ ,.
http://index.hixie.ch/                         `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 31 October 2002 06:45:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:04 GMT