W3C home > Mailing lists > Public > www-validator@w3.org > October 2001

Should text/html be parsed as SGML or XML?

From: Nick Kew <nick@webthing.com>
Date: Mon, 8 Oct 2001 07:28:16 +0100 (BST)
To: www-validator@w3.org
Message-ID: <Pine.BSF.4.21.0110080639500.1366-100000@fenris.webthing.com>

[ if these URLs get wrapped, you'll need to unwrap them ]

Im the course of investigating error reports, I've just looked at

contrasted with

The offending document looks like:

<!doctype html>
<p> [ several mistyped links, but content that would be valid in
      an HTML <body> ]

The first error generated is of course "no internal or external document
type declaration subset; will parse without validation".  So of course
the report that follows depends on the default SGML declaration used
in such cases.

w3-validator (in common with the WDG validator) generates a longish list
of errors, from which it appears to be checking XML well-formed-ness.
Yet the page in question is served as text/html, which in my book
(and in particular those Site Valet tools that don't make this a
user option) should still be parsed as SGML, not XML.

I recollect reading some years ago in what I think was an official
W3C spec (probably for HTML 3.2 or 4.0) that for back-compatibility,
legacy documents should be parsed as HTML 2.0 in the absence of an
FPI.  Am I going senile, or has this been completely abandoned?

Nick Kew

Site Valet - the essential service for anyone with a website.
Received on Monday, 8 October 2001 02:28:54 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:17:31 UTC