Re: draft-ietf-httpbis-safe-method-w-body-11 early Httpdir review

On Fri, Jun 20, 2025 at 03:08:32PM -0700, Roy Fielding via Datatracker wrote:
> However, the technology being described fails to meet the
> basic architectural requirements for the Web and HTTP.
> "All important resources are identified by a URI" is the
> primary design principle of the Web. The entire system depends
> on it for linkability and scale.

Apps that use POST for queries already have this problem.  And the use
of the Location: response header in QUERY does provide a way to identify
the given query by a URI.  It's just that the URI returned in the
Location: will be like a shortened link: not one that a human could
write.

As a tongue-in-cheek design consider a GET of a SQL statement where one
simply uses the SQL query as the URI local-part with all whitespace and
special characters url percent encoded...  That will always work, at
least if the query is small enough.

> Likewise, there is no opportunity to just "move the request
> content into the cache key" and call that cacheable.
> That's a security vulnerability, not a feature.

Oh yes, and I called that out in the Hacker News and GitHub discussions.

> >    This specification defines the HTTP QUERY request method as a means
> >    of making a safe, idempotent request (Section 9.2 of [HTTP]) that
> >    contains content.
> 
> "that contains content" --> "containing content that describes
> how the request is to be processed by the target resource"

Or just "has a request body -- contents that does not appear in the URI
local-part".

> >    Unlike POST, however, the method is explicitly safe and idempotent,
> >    allowing functions like caching and automatic retries to operate.
> 
> I think it would be more useful to list the number of different
> reasons for needing to use QUERY instead of GET (length, complexity,
> query privacy) first, and then why to use QUERY instead of POST
> (safe and idempotent). The intro seems to lump them all together.

+1

> A QUERY request is not cacheable because the request content is not
> available at the time cache decisions are made.

That is a subtlety that I missed in my commentary to the authors.
It seems like a pretty important bit of sublety.

My own commentary was that query normalization is dangerous, infeasible,
and will never materialize.  Others pointed out other problems with
query normalization, especially @vidraj on GutHub.

> Let's say there are two resources on a server, "/a" and "/b".
> Is the same QUERY sent to "/a" going to mean the same to "/b"?
> Are we expecting those two queries to result in the same response?

I would expect that no to both questions.

Imagine a SQL RDBMS with an HTTP API using QUERY.  `/a` might be
"database a" and `/b` might be "database b", and a QUERY should have
Content-Type: application/sql (say), and any local-part query parameters
might be intended for binding any equally named parameters in the
request body SQL query.

(The above example makes sense if the RDBMS is PostgreSQL since
connections are to "databases" and queries in one database cannot see
tables and such in another database even if both databases are served by
the same service.)

> If not, the QUERY is being sent to the targeted resource, which is
> doing all of those things above that are being described as "the server".

Yes.

> The last sentence "cannot be assumed" isn't useful.
> What is needed is a definition of what is returned for each kind
> of response.  I believe that should at least include:
> 
>    A 200 (OK) response to a QUERY request indicates that the query
>    was successfully processed and the results of that processing are
>    enclosed as the response content.

Since every distinct query can be expected to produce a distinct result,
maybe success should be 201, since each result would have a distinct
Location / Content-Location.

Oh, also, can one GET that Location / Content-Location to retrieve the
_query_ rather than the results, or both serialized in some way?  I
would hope so.

Perhaps there need to be two URIs returned, one for the query, and one
for the results of executing the query.  Or perhaps the Accept: of a GET
of the URI should distinguish which to return.

> >    The content of the request and it's media type define the query.

And as you note the URI local-part must also be part of the equation
that "defines the query".

> >    A successful response to a QUERY request is expected to provide some
> >    indication as to the final disposition of the operation.  For
> >    instance, a successful query that yields no results can be
> >    represented by a 204 (No Content, Section 15.3.5 of [HTTP]) response.
> >    If the response includes content, it is expected to describe the
> >    results of the operation.
> 
> I am not sure what this is attempting to clarify. Maybe that an
> empty set of results is a 200, not a 404? But doesn't that depend
> on the query itself?

I think it means that if a query produces no data then you can get 204m
although 200 with an empty response body surely would work just as well?

> >    For instance, a 303 (See Other, Section 15.4.4 of [HTTP]) response
> >    would indicate that the Location field identifies an alternate URI
> >    from which the results can be retrieved using a GET request (this use
> >    case is also covered by the use of the Location response field in a
> >    2xx response).
> 
> No, that is incorrect. In a 303 response, the Location field identifies
> a replacement target resource that will perform the same query when it
> receives a GET request. To complete processing of the original QUERY, the
> user agent will need to perform a GET request on the resource referenced
> by Location. This allows the original query to be identified for reuse as
> a normal resource and for the results to be cached.

This makes me wonder if there is a way to re-execute a query given the
URI it returned as opposed to re-fetching the results of the earlier
query.  There should be a way to do one vs. the other.

> > 2.3.  Conditional Requests
> > 
> >    A conditional QUERY requests that the selected representation (i.e.,
> >    the query results, after any content negotiation) be returned in the
> >    response only under the circumstances described by the conditional
> >    header field(s), as defined in Section 13 of [HTTP].
> 
> I think this is incorrect and requires more explanation and reference
> to 3.2 [HTTP].  You might even want to quote the last paragraph of 3.2
> and specifically define that, for QUERY, the response content is
> influenced by content negotiation. I recommend a more extensive
> discussion of content negotiation along with the examples.
> 
> The conditional request mechanisms, however, are defined by the
> "selected representation" of GET semantics. Specifically, things like
> last-modified and etag comparisons are done before the query
> is processed, not after, since the condition says "do not proceed". 

For something like a SQL RDBMS HTTP conditional requests don't really
make sense.  I don't know how to construct an ETag for a complex SELECT
that processes many rows in order to produce a result.  Nor do I know
how to handle If-Modified-Since: if the application is a SQL RDBMS.

So at the very least conditional requests need to be optional -- they
are optional, but I think some discussion is warranted of why an
application might not support them in QUERY.

> A resource that responds to QUERY is almost certain to also respond
> to GET (usually with an empty form/instructions). The existing
> conditional mechanisms will work on that empty form, which is
> probably not what is desired.

I would expect a resource that responds to QUERY to also respond to GET
with similar queries encoded in the URI local-part as well, though
obviously subject to length limits that QUERY request bodies would not
be (or they would be larger anyways).

> In theory, new condition fields could be defined that operate after the
> query has been processed, but that would be silly given the instructions
> for the query are already located within the request content and
> can include their own conditions for when to limit or what to exclude.
> Use the query's conditions instead of HTTP conditionals.

Quite.

> > 2.4.  Caching
> > 
> >    The response to a QUERY method is cacheable; a cache MAY use it to
> >    satisfy subsequent QUERY requests as per Section 4 of
> >    [HTTP-CACHING]).
> 
> No, just no. A cache does not have access to the request content when
> making a hit/miss decision. Use the 303 response, as designed.
> 
> The reason why this is not allowed in HTTP is because routing decisions
> are based on the connection context, host, and entire target URI.
> A cache cannot know what parts may apply. The origin doesn't know either.

Worse, the cache doesn't even know the request body's length if it's
indeterminate (and it would be indeterminate in HTTP/2 and HTTP/3), thus
the cache can't know whether it has the resources to even read the whole
request.  And what happens if the request is too large for the cache?
It will have to give up and have buffered what it's consumed so it can
regurgitate it to the upstream.  That's all a bit mich.

> The actual server recipient of a request containing query parameters
> might have been passed along a completely different internal routing
> path, with its own security filtering, from the same request with
> those parameters hidden within the request content.
> 
> Allowing a cache to change the key by moving identifiers from the
> content would allow a generic resource to poison the cache for other,
> more specific resources.

Thank you.  I made this point, and the authors added Security
Considerations text.  My preference was to remove all query
normalization by caches.

> >    QUERY /contacts HTTP/1.1
> >    Host: example.org
> >    Content-Type: application/x-www-form-urlencoded
> >    Accept: application/json
> > 
> >    select=surname,givenname,email&limit=10&match=%22email=*@example.*%22

I don't see why encode the request weirdly.  The whole point of QUERY is
that one need not do so.

> We have discussed most of this in the past. I don't understand why it wasn't
> corrected in the draft. Has the caching of QUERY (as a method) been implemented?
> Have such implementations detailed how they account and correct for cache
> poisoning? How they use a cache key that requires the body to be read first?
> How they intend to secure this across protection boundaries and on different
> request paths? I am not seeing that in current practice.
> 
> What I see is sufficient justification for a QUERY method that is like GET
> with a body except it cannot be immediately cached, and like POST but with
> guarantees for safe and idempotent. That's enough to be useful. We should not
> be implying that this method is, in any way, suitable as a replacement for
> information retrieval of identified resources via GET.

+1

Nico
-- 

Received on Friday, 20 June 2025 22:56:49 UTC