Re: ETags, If-Match and database versions from Graham Cox on 2024-12-28 (ietf-http-wg@w3.org from October to December 2024)

From: Graham Cox <graham@grahamcox.co.uk>
Date: Sat, 28 Dec 2024 12:08:02 +0000
To: "Roy T. Fielding" <fielding@gbiv.com>
Cc: Mark Nottingham <mnot@mnot.net>, ietf-http-wg@w3.org
Message-ID: <CAPBurBsA-1qfWuvEptCdwb9uQEnmm2_Bd0snmFOuaCur7GOSwQ@mail.gmail.com>
>
> The good news is that unless you deliberately coded the
> representation-generating algorithm to be random, it is probably already
> consistent over time and would have no problem using strong etags. The
> backend doesn't have an opinion -- only the HTTP interface matters.


So, this is the bit that really sparked things for me here. We've always
done exactly this - assumed that the same input data would always produce
the same JSON, and therefore could just use the database version as a
strong etag. And, it turned out, we were proven wrong. We did a *minor*
version update of our underlying framework (Spring Boot 3.3 to 3.4), which
it seems had some knock-on effect down the line that changed the order in
which the fields were rendered into the JSON object - which would therefore
require a different strong etag for the exact same input data.

I've also seen the suggestion to append a server version to the etag to
avoid this, but there are significantly more reasons to change the server
version that *wouldn't* change the JSON representation. And doing that
would then mean every time the server version changes then all the etags
would change, which would mean that caching and optimistic locking breaks -
often needlessly...

As the origin server developer, you define the consistency of your
> resources. It doesn't matter how they are stored. It matters how
> consistently you represent them across the interface. If you are consistent
> for a given resource, that resource can use If-Match with strong etags. If
> you are inconsistent, then those features of HTTP have to be disabled.
> Hence, If-Match is not allowed with weak etags because no match can be
> guaranteed.


This is the crux of it. And maybe I'm being over-cautious with it all -
which wouldn't exactly be unusual for me! 🙂 - but following that through,
it would seem that if I *can't* 100% guarantee that the same resource will
always be rendered byte-for-byte into the same response body - e.g. because
an underlying framework upgrade has changed something that I didn't realise
- then I should be using weak etags.

And I fully understand that certain features can't reliably be used with
weak etags - such as byte-range requests. I'm just not sure why If-Match
preconditions, and therefore optimistic locking, should be on that list. If
anything, it seems to me that this is potentially a *better* candidate for
allowing weak etags than caching is - the backend server can safely perform
the update as long as the original data hasn't changed, regardless of the
byte-for-byte representation that was used.

Cheers

On Sat, 28 Dec 2024 at 00:51, Roy T. Fielding <fielding@gbiv.com> wrote:

> On Dec 27, 2024, at 3:18 PM, Graham Cox <graham@grahamcox.co.uk> wrote:
>
> This is almost exactly the scenario I'm talking about though. The only
> major difference is that I'm not storing the document as-is but instead
> storing it as data fields in a database.
>
> So, for a worked example, say I've got a database record of:
>
>    - id: abc123
>    - version: 1
>    - title: Some Article Title
>    - content: Some Article Content
>
> I can then have a call to "GET /articles/abc123" that returns this as JSON
> - for example:
> {
>     "title": "Some Article Title",
>     "content": "Some Article Content"
> }
>
> And I can also have a call to "PUT /articles/abc123" that updates this.
>
> As best I can tell, that's pretty much exactly the same as the "Editing
> the Web" spec is describing. In fact, from the point of view of the client,
> it's indistinguishable from it...
>
> The only major difference - which is only discernable within the backend
> and not outside of it - is the fact that I'm not storing the JSON directly.
> Instead, I'm generating it from the stored data. However, that in turn
> means that - by my understanding of the specs - the "version" field from
> the database shouldn't directly be a *strong* etag. I'm saying this
> because, theoretically, a minor update to the server could change the exact
> way the JSON is generated in a way that's semantically the same but isn't
> the same string of bytes - e.g. it might stop pretty-printing it, or it
> might change the order the fields are returned. None of those changes have
> any impact on the meaning of the JSON returned, but they do have an impact
> on the exact bytes returned for this representation, and thus are
> "observable in the content of a 200 (OK) response to GET".
>
> As such, it seems that it's intended that either a strong etag is
> generated from the generated bytes returned for the response - which is
> hard to do, especially when your backend architecture has a strict
> separation between the presentation layer that generates the JSON and the
> application layer that does the version checking - or else a weak etag can
> be returned based on the "version" field in the database - which is much
> easier to do, makes much more sense, but then breaks the rules around
> If-Match...
>
> Or, is the "Editing the Web" spec simply not designed for this kind of
> scenario? In which case, is there an alternative that is intended for it?
> Because it seems a real shame to have a spec that so nearly, but not quite,
> works perfectly for this case...
>
>
> It has nothing to do with the backend implementation.
>
> It is easier to think of it as a consistency requirement. A strong etag is
> just a value that is guaranteed to change when the representation's data
> changes. A weak etag has no such guarantee. Therefore, when the interface
> is making a comparison between two strong etags, the interface (HTTP) can
> make the corresponding assumption that, if they are the same, then the
> corresponding representation data will also be the same. This is important
> so that the recipient clients (user agent and caches) can perform partial
> updates and patches using the response, or prevent such from being
> performed by recipients with a representation that has a different etag.
>
> As the origin server developer, you define the consistency of your
> resources. It doesn't matter how they are stored. It matters how
> consistently you represent them across the interface. If you are consistent
> for a given resource, that resource can use If-Match with strong etags. If
> you are inconsistent, then those features of HTTP have to be disabled.
> Hence, If-Match is not allowed with weak etags because no match can be
> guaranteed.
>
> The good news is that unless you deliberately coded the
> representation-generating algorithm to be random, it is probably already
> consistent over time and would have no problem using strong etags. The
> backend doesn't have an opinion -- only the HTTP interface matters.
>
> ....Roy
>
>
Received on Saturday, 28 December 2024 12:08:19 UTC