4xx responses for bad query strings? from NESTING, DAVID M (SBCSI) on 2004-08-02 (www-talk@w3.org from July to August 2004)

From: NESTING, DAVID M (SBCSI) <dn3723@sbc.com>
Date: Mon, 2 Aug 2004 17:11:21 -0500
To: <www-talk@w3.org>
Message-ID: <D8A36B741FD7BF45ADE6DA228E05B1960185712E@mostls1msgusr06.itservices.sbc.com>

I'm trying to determine a proper way to indicate a requested piece of information was not found, where the information is keyed not just on the URI components identifying the HTTP resource, but upon the query string as well.

If I have a content display resource at, say, http://example.com/path/display, and that resource generates a different page depending upon query string*, I might have URLs like:

http://example.com/path/display?page1

http://example.com/path/display?page2

If, however, one of those "logical" pages doesn't exist, with what HTTP response code should this resource respond?

My first thought was to use a 404 response, but it can be argued pretty convincingly that this response code indicates the HTTP resource itself (the "display" resource) is missing, when it's not. It's just the *query* to that resource failed to return any content.

My second thought was to use a 403, since that could be interpreted as just a generic refusal to handle the request. We could put "not found" within the response body. This is a lot better, because there's no risk of user agent implementations (or users) thinking that the resource itself has gone MIA.

I don't think a 500-series error would be appropriate, because that implies a problem with the server, not with the request.

It could also be argued that the resource itself is operating correctly and the request was fine and is getting a valid response (of "no such content"), therefore it should use a 200 response code. The problem with this, though, is that search engines will end up indexing it as though it were legitimate, which is not desirable**.

I'm a little curious to know if there is a recommended practice here. Many HTTP response codes tie themselves with the presence, absence or abilities of the HTTP *resource* itself, without discussing resources that may change behavior based upon a query string.

Thanks for your help.

David Nesting

[*] - IMO, a proper solution would be to utilize some sort of URI translation to make the URI look more like /path/display/page1 or /path/page1, but I don't have that luxury here.
[**] - The use of "robots exclusion" information is not acceptable, because it either requires knowing in advance what query strings are valid and what are not (for use in the robots.txt file), or assumes the responses are HTML.

Received on Monday, 2 August 2004 18:13:24 UTC