Re: 4xx responses for bad query strings?

David Nesting ("NESTING, DAVID M (SBCSI)") wrote to  
<mailto:www-talk@w3.org> on 2 August 2004 in "4xx responses for bad  
query strings?"  
(<mid:D8A36B741FD7BF45ADE6DA228E05B1960185712E@mostls1msgusr06.itservice 
s.sbc.com>):

> I'm trying to determine a proper way to indicate a requested piece of  
> information was not found, where the information is keyed not just on  
> the URI components identifying the HTTP resource, but upon the query  
> string as well.

First you need to eliminate the misunderstanding over "HTTP resource".  
As far as Web architecture is concerned, URIs like
  "http://example.com/path/display"
   and
  "http://example.com/path/display?page"
are nearly opaque and their respective resources have no inherent  
relationship. The second URI, with the query string, identifies a  
first-class HTTP resource.

> If, however, one of those "logical" pages doesn't exist, with what  
> HTTP response code should this resource respond?

In short, one of
  200 OK,
  204 No Content,
   or
  404 Not Found.

The choice depends on what you want to convey.

> My first thought was to use a 404 response, but it can be argued  
> pretty convincingly that this response code indicates the HTTP  
> resource itself (the "display" resource) is missing, when it's not.

You'll find at least this humble correspondent resistant to such  
argument. For more authoritative answers, direct your question to the  
World Wide Web Consortium's Technical Architecture Group (TAG), the  
authors of the HTTP/1.1 specification in RFC 2616, the authors of the  
URI specification in RFC 2396, or the contributors to the revision of  
RFC 2396. (Granted, there are plenty who belong to more than one of  
those groups.)

> It's just the *query* to that resource failed to return any content.

But why and how did it fail to return content? Details about your  
situation would help.

Imagine an English-language dictionary service that allows queries of  
the form http://example.info/lexicon?term=<input-term> . An HTTP  
request with a Request-URI of "http://example.info/lexicon?term=ant"  
should yield "200 OK" with definitions of the word "ant" in the  
response entity body.

If a word were recognized as belonging to the English language but had  
no definition available from the lexicon (just play along for the sake  
of example), a response of "204 No Content" would be best. A 204  
response signals to crawlers as well as to human users that there is no  
definition available there.

If the input term did not form a word recognized as belonging to the  
English language, a response of "404 Not Found" would be best.

> My second thought was to use a 403, since that could be interpreted as  
> just a generic refusal to handle the request.

This seems at the margins of acceptable practice, but I'm having a  
difficulty in explaining my position.

> I don't think a 500-series error would be appropriate, because that  
> implies a problem with the server, not with the request.

I agree.

> It could also be argued that the resource itself is operating  
> correctly and the request was fine and is getting a valid response (of  
> "no such content"), therefore it should use a 200 response code.  The  
> problem with this, though, is that search engines will end up indexing  
> it as though it were legitimate, which is not desirable**.

So would "204 No Content" suffice?

> I'm a little curious to know if there is a recommended practice here.

I don't know of any, but what I don't know could fill an encyclopedia.

> Many HTTP response codes tie themselves with the presence, absence or  
> abilities of the HTTP *resource* itself, without discussing resources  
> that may change behavior based upon a query string.

I refer to my opening position in this message that the URI with a  
query string identifies a separate and full-fledged resource.

-- 
Etan Wexler.

Received on Wednesday, 4 August 2004 02:38:26 UTC