- From: Roy Fielding via Datatracker <noreply@ietf.org>
- Date: Fri, 20 Jun 2025 15:08:32 -0700
- To: <ietf-http-wg@w3.org>
- Cc: draft-ietf-httpbis-safe-method-w-body.all@ietf.org, ietf-http-wg@w3.org
Document: draft-ietf-httpbis-safe-method-w-body Title: The HTTP QUERY Method Reviewer: Roy Fielding Review result: Not Ready HTTPDIR review of draft-ietf-httpbis-safe-method-w-body-11 I have mixed feelings about this draft. I know that the editors have worked hard to address everyone's concerns and reflect the very wide set of potential use cases. Aside from some vagueness, the draft is well written and would normally meet the needs of a standards track document. However, the technology being described fails to meet the basic architectural requirements for the Web and HTTP. "All important resources are identified by a URI" is the primary design principle of the Web. The entire system depends on it for linkability and scale. Likewise, there is no opportunity to just "move the request content into the cache key" and call that cacheable. That's a security vulnerability, not a feature. > Abstract > > This specification defines a new HTTP method, QUERY, as a safe, > idempotent request method that can carry request content. The above is too generic. It should say why it is being defined: This specification defines the QUERY method for HTTP. A QUERY requests that the request target process the enclosed content in a safe/idempotent manner and then respond with the result of that processing. This is similar to POST requests but can be automatically repeated or restarted without concern for partial state changes. The kind of processing desired to perform the QUERY is indicated by the Content-Type header field value enclosed with the request. > 1. Introduction > > This specification defines the HTTP QUERY request method as a means > of making a safe, idempotent request (Section 9.2 of [HTTP]) that > contains content. "that contains content" --> "containing content that describes how the request is to be processed by the target resource" > Most often, this is desirable when the data conveyed in a request is > too voluminous to be encoded into the request's URI. For example, > this is a common query pattern: > > GET /feed?q=foo&limit=10&sort=-published HTTP/1.1 > Host: example.org > > However, for a query with parameters that are complex or large, > encoding it in the request URI may not be the best option because > > * often size limits are not known ahead of time because a request > can pass through many uncoordinated systems (but note that > Section 4.1 of [HTTP] recommends senders and recipients to support > at least 8000 octets), > > * expressing certain kinds of data in the target URI is inefficient > because of the overhead of encoding that data into a valid URI, > and > > * encoding queries directly into the request URI effectively casts > every possible combination of query inputs as distinct resources. > > As an alternative to using GET, many implementations make use of the > HTTP POST method to perform queries, as illustrated in the example > below. In this case, the input to the query operation is passed as > the request content as opposed to using the request URI's query > component. > > A typical use of HTTP POST for requesting a query is: > > POST /feed HTTP/1.1 > Host: example.org > Content-Type: application/x-www-form-urlencoded > > q=foo&limit=10&sort=-published > > This variation, however, suffers from the same basic limitation as > GET in that it is not readily apparent -- absent specific knowledge > of the resource and server to which the request is being sent -- that > a safe, idempotent query is being performed. > > The QUERY method provides a solution that spans the gap between the > use of GET and POST, with the example above being expressed as: > > QUERY /feed HTTP/1.1 > Host: example.org > Content-Type: application/x-www-form-urlencoded > > q=foo&limit=10&sort=-published > > As with POST, the input to the query operation is passed as the > content of the request rather than as part of the request URI. > Unlike POST, however, the method is explicitly safe and idempotent, > allowing functions like caching and automatic retries to operate. I think it would be more useful to list the number of different reasons for needing to use QUERY instead of GET (length, complexity, query privacy) first, and then why to use QUERY instead of POST (safe and idempotent). The intro seems to lump them all together. > Summarizing: > > +============+============+==================+==================+ > | | GET | QUERY | POST | > +============+============+==================+==================+ > | Safe | yes | yes | potentially no | > +------------+------------+------------------+------------------+ > | Idempotent | yes | yes | potentially no | > +------------+------------+------------------+------------------+ > | Cacheable | yes | yes | yes, but only | > | | | | for future GET | > | | | | or HEAD requests | > +------------+------------+------------------+------------------+ > | Content | "no | expected | expected | > | (body) | defined | (semantics per | (semantics per | > | | semantics" | target resource) | target resource) | > +------------+------------+------------------+------------------+ > > Table 1: Summary of relevant method properties A QUERY request is not cacheable because the request content is not available at the time cache decisions are made. > 2. QUERY > > The QUERY method is used to initiate a server-side query. Unlike the > HTTP GET method, which requests that a server return a representation > of the resource identified by the target URI (as defined by > Section 7.1 of [HTTP]), the QUERY method is used to ask the server to > perform a query operation (described by the request content) over > some set of data at the resource. The content returned in response > to a QUERY cannot be assumed to be a representation of the resource > identified by the target URI. Let's say there are two resources on a server, "/a" and "/b". Is the same QUERY sent to "/a" going to mean the same to "/b"? Are we expecting those two queries to result in the same response? If not, the QUERY is being sent to the targeted resource, which is doing all of those things above that are being described as "the server". "HTTP GET" --> "GET" "requests that a server return a representation" --> "requests a representation" "to ask the server to perform a query operation" --> "to ask the target resource to perform a query operation" "over some set of data at the resource" --> "within the scope of that target resource (as defined by its origin)." "target URI" should be cited somewhere else [this looks like a citation for GET]. The last sentence "cannot be assumed" isn't useful. What is needed is a definition of what is returned for each kind of response. I believe that should at least include: A 200 (OK) response to a QUERY request indicates that the query was successfully processed and the results of that processing are enclosed as the response content. A 303 (See Other) response to a QUERY request indicates that the query is equivalent to the resource identified in the response's Location header field. The client can obtain a result of the query by performing a GET request on the referenced Location, which will then be cacheable and reusable like any other GET request. [you may want to describe other common codes as well if they have a particular meaning unique to QUERY. I particularly suggest highlighting the difference for 200, 204, 206, 400, 406, 415, and 422.] > The content of the request and it's media type define the query. > Implementations MAY use a request content of any media type with the > QUERY method, provided that it has appropriate query semantics. This is ambiguous. For security filtering reasons, the request header field Content-Type MUST be consistent with the request content. IOW, if they differ the server MUST fail (400) the request. As such, the media type defines the query processing and the request content MUST adhere to the media type requirements to be successfully processed. > As for all HTTP methods in general, the target URI's query part takes > part in identifying the resource being queried and therefore is not > part of the actual query. Whether and how the URI's query part > directly affects the result of the query is implementation specific > and out of scope for this specification. Strike "and therefore is not part of the actual query" because it contradicts the next sentence. I would say it is resource-specific, since the implementation doesn't limit such things. > QUERY requests are both safe and idempotent with regard to the > resource identified by the request URI. That is, QUERY requests do > not alter the state of the identified resource. However, while > processing a QUERY request, a server can be expected to allocate > computing and memory resources or even create additional HTTP > resources through which the response can be retrieved. No, no, no. Do not even think about redefining https://www.rfc-editor.org/rfc/rfc9110.html#name-safe-methods here. Just reference it. Both the "That is" and "However" are wrong in very subtle ways. If you want to clarify a very specific point (like safe doesn't mean a server can't create a new resource if it wants to) then do that in a separate sentence after referencing the definition of "safe". This should also mention the good thing about idempotent being that a client can retry or repeat the request after connection failure. > A successful response to a QUERY request is expected to provide some > indication as to the final disposition of the operation. For > instance, a successful query that yields no results can be > represented by a 204 (No Content, Section 15.3.5 of [HTTP]) response. > If the response includes content, it is expected to describe the > results of the operation. I am not sure what this is attempting to clarify. Maybe that an empty set of results is a 200, not a 404? But doesn't that depend on the query itself? > 2.1. Content-Location and Location Fields > > A successful response (2xx, Section 15.3 of [HTTP]) can include a > Content-Location header field containing an identifier for a resource > corresponding to the results of the operation; see Section 8.7 of > [HTTP] for details. This represents a claim from the server that a > client can send a GET request for the indicated URI to retrieve the > results of the query operation just performed. The indicated > resource might be temporary. That paragraph needs to be above in the description of a 200 response to QUERY, or at least referred to as such, since it is just repeating the definition of Content-Location. > A server can create or locate a resource that identifies the query > operation for future use. If the server does so, the URI of the > resource can be included in the Location header field of the 2xx > response (see Section 10.2.2 of [HTTP]). This represents a claim > that a client can send a GET request to the indicated URI to repeat > the query operation just performed without resending the query > content. This resource might be temporary; if a future request > fails, the client can retry using the original QUERY resource and the > previously submitted content. Again, this is part of the protocol definition and belongs above as generally applying for all 2xx responses (even 204). > 2.2. Redirection > > In some cases, the server may choose to respond indirectly to the > QUERY request by redirecting the user agent to a different URI (see > Section 15.4 of [HTTP]). The semantics of the redirect response do > not differ from other methods. > > For instance, a 303 (See Other, Section 15.4.4 of [HTTP]) response > would indicate that the Location field identifies an alternate URI > from which the results can be retrieved using a GET request (this use > case is also covered by the use of the Location response field in a > 2xx response). No, that is incorrect. In a 303 response, the Location field identifies a replacement target resource that will perform the same query when it receives a GET request. To complete processing of the original QUERY, the user agent will need to perform a GET request on the resource referenced by Location. This allows the original query to be identified for reuse as a normal resource and for the results to be cached. > On the other hand, response codes 307 (Temporary Redirect, > Section 15.4.8 of [HTTP]) and 308 (Permanent Redirect, Section 15.4.9 > of [HTTP]) can be used to request the user agent to redo the QUERY > request on the URI specified by the Location field. > > Various non-normative examples of successful QUERY responses are > illustrated in Appendix A. > > 2.3. Conditional Requests > > A conditional QUERY requests that the selected representation (i.e., > the query results, after any content negotiation) be returned in the > response only under the circumstances described by the conditional > header field(s), as defined in Section 13 of [HTTP]. I think this is incorrect and requires more explanation and reference to 3.2 [HTTP]. You might even want to quote the last paragraph of 3.2 and specifically define that, for QUERY, the response content is influenced by content negotiation. I recommend a more extensive discussion of content negotiation along with the examples. The conditional request mechanisms, however, are defined by the "selected representation" of GET semantics. Specifically, things like last-modified and etag comparisons are done before the query is processed, not after, since the condition says "do not proceed". A resource that responds to QUERY is almost certain to also respond to GET (usually with an empty form/instructions). The existing conditional mechanisms will work on that empty form, which is probably not what is desired. In theory, new condition fields could be defined that operate after the query has been processed, but that would be silly given the instructions for the query are already located within the request content and can include their own conditions for when to limit or what to exclude. Use the query's conditions instead of HTTP conditionals. > 2.4. Caching > > The response to a QUERY method is cacheable; a cache MAY use it to > satisfy subsequent QUERY requests as per Section 4 of > [HTTP-CACHING]). No, just no. A cache does not have access to the request content when making a hit/miss decision. Use the 303 response, as designed. The reason why this is not allowed in HTTP is because routing decisions are based on the connection context, host, and entire target URI. A cache cannot know what parts may apply. The origin doesn't know either. The actual server recipient of a request containing query parameters might have been passed along a completely different internal routing path, with its own security filtering, from the same request with those parameters hidden within the request content. Allowing a cache to change the key by moving identifiers from the content would allow a generic resource to poison the cache for other, more specific resources. [skipping other parts of draft that look fine] > A.4. Content-Location, Location, and Indirect Responses > > As described in Section 2.1, the Content-Location and Location > response fields in success responses (2xx, Section 15.3 of [HTTP]) > provide a way to identify alternate resources that will respond to > GET requests, either for the received result of the request, or for > future requests to perform the same operation. Going back to the > example from Appendix A.1: Why explain it that way? These are great examples, but lumping them together in one sentence leaves the reader to guess which header field is for what purpose. These examples belong where the responses are defined. > > QUERY /contacts HTTP/1.1 > Host: example.org > Content-Type: application/x-www-form-urlencoded > Accept: application/json > > select=surname,givenname,email&limit=10&match=%22email=*@example.*%22 > > Response: > > HTTP/1.1 200 OK > Content-Type: application/json > Content-Location: /contacts/stored-results/17 > Location: /contacts/stored-queries/42 > Last-Modified: Sat, 25 Aug 2012 23:34:45 GMT > Date: Sun, 17 Nov 2024, 16:10:24 GMT > > [ > { "surname": "Smith", > "givenname": "John", > "email": "smith@example.org" }, > { "surname": "Jones", > "givenname": "Sally", > "email": "sally.jones@example.com" }, > { "surname": "Dubois", > "givenname": "Camille", > "email": "camille.dubois@example.net" } > ] > > A.4.1. Using Content-Location > > The Content-Location response field received above identifies a > resource holding the result for the QUERY response it appeared on: > > GET /contacts/stored-results/17 HTTP/1.1 > Host: example.org > Accept: application/json > > Response: > > HTTP/1.1 200 OK > Last-Modified: Sat, 25 Aug 2012 23:34:45 GMT > Date: Sun, 17 Nov 2024, 16:10:25 GMT > > [ > { "surname": "Smith", > "givenname": "John", > "email": "smith@example.org" }, > { "surname": "Jones", > "givenname": "Sally", > "email": "sally.jones@example.com" }, > { "surname": "Dubois", > "givenname": "Camille", > "email": "camille.dubois@example.net" } > ] > > A.4.2. Using Location > > The Location response field identifies a resource that will respond > to GET with a fresh result for the QUERY response it appeared on. It's not a "fresh result". It's a current result for the same process and parameters as the original QUERY. [For HTTP caching, "fresh" means it's okay to reuse the old result, not that the result is current.] > GET /contacts/stored-queries/42 HTTP/1.1 > Host: example.org > Accept: application/json > > In this example, one entry was removed at 2024-11-17T16:12:01Z (as > indicated in the Last-Modified field), so the response only contains > two entries: Note that the text version isn't indenting examples, so it can be hard to discern when commentary is inserted in the middle. I would prefer that the protocol examples be placed in the protocol definition above, but that's editorial discretion. I agree that the longer query examples are better as an appendix. > Note that there's no guarantee that the server will implement this > resource indefinitely, so, after an error response, the client would > need to redo the original QUERY request in order to obtain a new > alternative location. That is equally true for the QUERY request target, so why does it need to be said? > A.4.3. Indirect Responses > > Servers can send "indirect" responses (Section 2.2) using the status > code 303 (See Other, Section 15.4.4 of [HTTP]). > > Given the request at the beginning of Appendix A.4, a server might > respond with: > > HTTP/1.1 303 See Other > Content-Type: text/plain > Date: Sun, 17 Nov 2024, 16:13:17 GMT > Location: /contacts/stored-queries/42 > > See stored query at "/contacts/stored-queries/42". > > This is similar to including Location on a direct response, except > that no result for the query is returned. This allows the server to > only generate an alternative resource. This resource could then be > used as shown in Appendix A.4.2. The protocol is so much clearer with this kind of example. It is small enough to be given where 303 is first mentioned. "generate" --> "generate or reuse" (it may be a common query). We have discussed most of this in the past. I don't understand why it wasn't corrected in the draft. Has the caching of QUERY (as a method) been implemented? Have such implementations detailed how they account and correct for cache poisoning? How they use a cache key that requires the body to be read first? How they intend to secure this across protection boundaries and on different request paths? I am not seeing that in current practice. What I see is sufficient justification for a QUERY method that is like GET with a body except it cannot be immediately cached, and like POST but with guarantees for safe and idempotent. That's enough to be useful. We should not be implying that this method is, in any way, suitable as a replacement for information retrieval of identified resources via GET. ...Roy T. Fielding, Senior Principal Scientist, Adobe
Received on Friday, 20 June 2025 22:08:36 UTC