Re: Link checker: 404 error for http://www.debian.org/support

On Wed, 2004-02-18 at 14:38, era+gmane@iki.fi wrote:
> I can access the page <http://www.debian.org/support> just fine, but
> <http://validator.w3.org/checklink?uri=http%3A//www.debian.org/support>
> displays a 404 error for this page (and also others on the site
> www.debian.org, but by no means all of them).
> 
> I can only speculate about the reasons for this.
> 
>  0. Weird robots.txt on http://www.debian.org/robots.txt but I get the
>     impression that the link checker presently does not consult this
>     file. (Just checked on my own site. It doesn't. Scratch this.)

Right.  The link checker does not support the robots exclusion standard
(yet).

>  1. www.debian.org administrators are blocking access from the user
>     agent or the host validator.w3.org for whatever reason. If this is
>     the case, I'll be happy to bring this up with them.

This does not seem to be the case.

>  2. Problems with content negotiation.

Bingo.  http://www.debian.org/support seems to result in a 404 for all
requests which have "Accept-Language: *".  For example:

  $ nc www.debian.org 80
  HEAD /support HTTP/1.0
  Host: www.debian.org
  Accept-Language: *

  HTTP/1.1 404 Not Found
  [...]

By default, link checker sends the Accept-Language headers sent by the
user's browser to the target links as-is.  But if that header is not
present in the incoming request, link checker sends the "*" case above
instead.  Sending of the Accept-Language header can also be prevented
altogether, see the link checker "front page".

So, the server side conneg configuration effective for
http://www.debian.org/support seems to be broken.  The "Accept-Language:
*" case should AFAICS be treated the same way as when the header is not
present at all (which works for the above URI).  RFC 2616 (12.2) also
states that instead of returning a 404, the server could respond with a
300 or 406 to allow falling back to agent-driven negotiation.

>     (When viewing these pages directly I get them in English. If I had
>     to fall back to some other language I would prefer Swedish, but my
>     reverse DNS obviously indicates that I am in Finland. Whoever is
>     responsible for conjecturing language preference from [apparent]
>     geographical location should be flogged.)

Yup.  But decent browsers allow one to configure the language
preferences too, which you could then set to "en, sv"... which should
then again be preferred over any DNS heuristics by a sane service.

Received on Wednesday, 18 February 2004 13:44:12 UTC