Checking HTTP Headers

As discussed in the light of
http://lists.w3.org/Archives/Public/www-validator/2004Aug/0241.html
here are some thoughts on HTTP headers.  Dealing with all of these
will bring validator ahead of where Valet currently stands:-)


(0) Redirects

We should report any HTTP redirections.

(1) Content-Type

We already check content-type to determine whether to validate
a document.  We could also report when we determine the charset
from the HTTP header, saving potential confusion when someone
thinks they've set it in the document.

More generally, we should report exactly where we get the charset from
  * from HTTP headers
  * from XML rules (BOM or xmldecl)
  * from HTML rules (<meta> hack)

We could check all three, and warn if a conflict is found.

(2) Content-Length

We should warn if a malformed or zero content-length is encountered,
or if the content-length differs from the length of the document
validated.

(3) Content-Location

If the content-location is set and differs from what we requested
(after redirections, if applicable), then report it.


(4) Vary

If there is a Vary header, we should warn that the document is marked
as negotiated:

Warning:
This document is negotiated and may vary according to browser preferences
(such as the reader's language).  The response indicated the following
values of the negotiated headers:
  Content-Language: en
  Content-Type: text/html;charset=iso-8859-1

(5) Last-Modified

We might consider reporting Last-Modified headers.  This might possibly
be of use to users who are struggling with their publishing software.

(6) Proxy headers

It might be of interest if a document has come through a proxy,
particularly a content-transforming or cacheing one.  We should
report if there's a "Via" response header, and in that case list
any Warning headers recieved.

Question: should we stop proxies transforming documents to be validated,
by setting "Cache-control: no-transform" in the validator's Request
headers?

-- 
Nick Kew

Received on Sunday, 5 September 2004 16:58:40 UTC