Re: flakey charset detection from Terje Bless on 2002-12-04 (www-validator@w3.org from December 2002)

From: Terje Bless <link@pobox.com>
Date: Wed, 4 Dec 2002 23:11:45 +0100
To: W3C Validator <www-validator@w3.org>
cc: Karl Dubost <karl@w3.org>, David Brownell <david-b@pacbell.net>
Message-ID: <a01060007-1022-60E5293807D511D7987A00039300CF5C@[193.157.66.10]>

David Brownell <david-b@pacbell.net> wrote:

>It does more than "mention" it!  This looks like one of those "from
>false premises, you can deduce anything" results.  Perhaps somebody was
>unwilling to fix IE to obey that web standard, and was wielding the
>editorial pen at a key point in the HTML4 process?  :)

I suspect the problem here was that HTML 4.01 was trying to fix something
that it was not in their purview to fix; namely the poor suitability of
ISO-8859-1 as a default for many web documents.

It is highly unfortunate IMO that they chose to do this by overriding HTTP
instead of adding an additional requirement that HTML 4.01 served over HTTP
must explicitly set a character encoding; or by simply punting the issue
back to where it belongs, namely the HTTP specification.

>>]]]-- http://www.w3.org/TR/html401/charset.html#h-5.2.2
>
>Curious.  But still, this page _used to validate_ just fine using the
>W3C validator.  Issue a warning if you must, but I can't see a way an
>XHTML validator should count this as an error.

It doesn't.

It's telling you that it can't find the character encoding of the document
and that without that information it is impossible to Validate it. It's
stating it's own inability to deal with the lack of information, not saying
anything about the validity or lack of it in the document itself.

Of course, there is the strong implication that a document that does not
explicitly specify it's encoding is invalid and unparseable, but this is
wholly intentional given that state of character encoding issues.

-- 
We've gotten to a point where a human-readable,   human-editable text format
for structured data has become a complex nightmare where somebody can safely
say  "As many threads on xml-dev have shown, text-based processing of XML is
hazardous at best" and be perfectly valid in saying it.      -- Tom Bradford

Received on Wednesday, 4 December 2002 17:11:53 UTC