Re: ETags, If-Match and database versions from Roy T. Fielding on 2024-12-28 (ietf-http-wg@w3.org from October to December 2024)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Fri, 27 Dec 2024 16:51:26 -0800
To: Graham Cox <graham@grahamcox.co.uk>
Cc: Mark Nottingham <mnot@mnot.net>, ietf-http-wg@w3.org
Message-Id: <ED23534F-1AE4-40A1-A796-271FBB095117@gbiv.com>
> On Dec 27, 2024, at 3:18 PM, Graham Cox <graham@grahamcox.co.uk> wrote:
> 
> This is almost exactly the scenario I'm talking about though. The only major difference is that I'm not storing the document as-is but instead storing it as data fields in a database.
> 
> So, for a worked example, say I've got a database record of:
> id: abc123
> version: 1
> title: Some Article Title
> content: Some Article Content
> I can then have a call to "GET /articles/abc123" that returns this as JSON - for example:
> {
>     "title": "Some Article Title",
>     "content": "Some Article Content"
> }
> 
> And I can also have a call to "PUT /articles/abc123" that updates this.
> 
> As best I can tell, that's pretty much exactly the same as the "Editing the Web" spec is describing. In fact, from the point of view of the client, it's indistinguishable from it...
> 
> The only major difference - which is only discernable within the backend and not outside of it - is the fact that I'm not storing the JSON directly. Instead, I'm generating it from the stored data. However, that in turn means that - by my understanding of the specs - the "version" field from the database shouldn't directly be a *strong* etag. I'm saying this because, theoretically, a minor update to the server could change the exact way the JSON is generated in a way that's semantically the same but isn't the same string of bytes - e.g. it might stop pretty-printing it, or it might change the order the fields are returned. None of those changes have any impact on the meaning of the JSON returned, but they do have an impact on the exact bytes returned for this representation, and thus are "observable in the content of a 200 (OK) response to GET".
> 
> As such, it seems that it's intended that either a strong etag is generated from the generated bytes returned for the response - which is hard to do, especially when your backend architecture has a strict separation between the presentation layer that generates the JSON and the application layer that does the version checking - or else a weak etag can be returned based on the "version" field in the database - which is much easier to do, makes much more sense, but then breaks the rules around If-Match...
> 
> Or, is the "Editing the Web" spec simply not designed for this kind of scenario? In which case, is there an alternative that is intended for it? Because it seems a real shame to have a spec that so nearly, but not quite, works perfectly for this case...

It has nothing to do with the backend implementation.

It is easier to think of it as a consistency requirement. A strong etag is just a value that is guaranteed to change when the representation's data changes. A weak etag has no such guarantee. Therefore, when the interface is making a comparison between two strong etags, the interface (HTTP) can make the corresponding assumption that, if they are the same, then the corresponding representation data will also be the same. This is important so that the recipient clients (user agent and caches) can perform partial updates and patches using the response, or prevent such from being performed by recipients with a representation that has a different etag.

As the origin server developer, you define the consistency of your resources. It doesn't matter how they are stored. It matters how consistently you represent them across the interface. If you are consistent for a given resource, that resource can use If-Match with strong etags. If you are inconsistent, then those features of HTTP have to be disabled. Hence, If-Match is not allowed with weak etags because no match can be guaranteed.

The good news is that unless you deliberately coded the representation-generating algorithm to be random, it is probably already consistent over time and would have no problem using strong etags. The backend doesn't have an opinion -- only the HTTP interface matters.

....Roy
Received on Saturday, 28 December 2024 00:51:42 UTC