Re: w3c and apache team from Martin Duerst on 2002-08-14 (www-validator@w3.org from August 2002)

From: Martin Duerst <duerst@w3.org>
Date: Wed, 14 Aug 2002 09:20:16 +0900
To: carlo@linux.it, www-validator@w3.org
Message-Id: <4.2.0.58.J.20020814090135.02803538@localhost>

Hello Carlo,

At 23:11 02/08/13 +0200, Carlo Perassi wrote:

>As I explain to the Apache developers
>(
>see
>http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=102918549709592&w=2
>and
>http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=102925143132691&w=2
>)
>it's trivial to change the Apache C code to generate W3C pages but they have
>technical reasons which don't permit to define a meta tag with charset
>definition... so some minutes ago, on the Apache CVS tree it's appeared a fix
>for a header problem, and as Greg Ames <gregames@apache.org> said
>"I would hope that if (the Validator) saw a good http Content-Type header,
>it wouldn't need the stuff in the html meta line."

Yes, this is true. A Content-Type with a charset parameter is
of course sufficient.

But there may be an additional complication: Some 404s may be in
other encodings than iso-8859-1. In that case, the header would
be wrong. As long as this is just for the built-in 'last resort'
error message that doesn't change, it's okay. But in case it's
tagged onto any arbitrary error message, it's a problem.

BTW, a related problem is the directive 'AddDefaultCharset'.
This adds a 'charset' parameter to *every* Content-Type that
doesn't already have one. This means that if you have some
gifs, they get served as Content-Type: image/gif; charset=foo.
This is of quite useless.

>Before trying the new Apache CVS code... I found a "problem": when your
>Validator found a "404" on the response header of the server, it doesn't
>parse the HTML provided anymore.

>My question is: why don't you drive the Validator to parse the html code, even
>when the return code is different from 200?
>If you do like this, Apache team will be able to check if the fix on the code
>which produces the header of the response is enough to pass the test.

I think in general, it's a bad idea to parse return codes except 200.
Assume the following scenario: Somebody wants to validate a bunch
of pages. Somehow they get the URIs a bit wrong. They get a bunch
of 404s, which all validate correctly. They think everything is
fine, while it's absolutely not.

But I think having an option 'validate error messages' is a good idea,
because we want to be able to validate all html.

Regards,    Martin.

Regards,    Martin.

Received on Tuesday, 13 August 2002 22:30:53 UTC