Re: Etag-on-write, draft -04 from Julian Reschke on 2006-12-03 (ietf-http-wg@w3.org from October to December 2006)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Sun, 03 Dec 2006 14:33:04 +0100
To: Henrik Nordstrom <henrik@henriknordstrom.net>
CC: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <4572D210.1000308@gmx.de>
Henrik Nordstrom schrieb:
> ...
> I'd say that no matter if an strong or weak etag is returned the client
> can assume that the result is based on what it sent in it's PUT request
> for as long as the etags compare true, and that there under those
> conditions usually is no need to refetch the object to continue
> authoring. But returning a weak etag does signal that quite likely
> something will differ on a GET for the object.

Yes.

> Also, since the weak etag can not be used in future conditional PUT the
> agent better fetch a new copy if a weak etag was returned in response to
> PUT if actual details of the content is important. Making the weak etag
> mainly suitable for detecting if there has been important edits
> modifying the actual content between the PUT and subsequent GET/HEAD.

Yes. So that solves the lost update problem (the refetch is safe not to 
lose somebody else's update), but it still requires the client to refetch.

>> First of all, this argument seems to be based on the assumption that 
>> content-rewriting servers do not return ETags, which we know is not 
>> true, and is even required in XCAP. Are these servers non-compliant to 
>> RFC2616? I don't think so.
> 
> They are compliant in my view. And absolutely certainly so if returning
> a weak etag.

Yes.

>> Furthermore, I don't get the part about the intermediary. Even if the 
>> server didn't rewrite the content, and returned a strong ETag, an 
>> intermediary may have rewritten the content, right?
> 
> Right. Content-modifying intermediaries is on their own. HTTP specs is
> not too detailed on what those may or may not do but anyone who have
> grasped ETag would consider it a functional requirement that content
> rewriting intermediaries also remap ETags suitably or at least degrade
> them to weak. If not serious confusion may arise.

+1. Contrary what some say, this topic is already complicated enough. 
Let's get an understanding of how authoring works in the absence of 
these kinds of intermediaries first.

>> But returning the ETag is very useful, and completely harmless unless 
>> the client tries a byte range request. The final status of the result 
>> will be the same, no matter whether the client does a 
>> PUT/GET(refresh)/PUT or a PUT/PUT request, so it's pointless to refresh 
>> the local copy unless somebody is interested in the newly substituted 
>> keywords.
> 
> True, and in this scenario weak ETags maps extremely well. But I would
> not argue that it's sane to return a strong ETag when doing keyword
> substitution on the PUT entity.

The approach using the weak ETag has the problem requiring an additional 
GET.

In general, with APP, CALDAV and XCAP we have several protocols where it 
will be the *default* case that content rewriting happens. That's 
because they will usually store content not as supplied by the client, 
but as the serialization of an internal representation. XCAP will run 
everything through an XML parser, APP many resources, and store the 
result as some kind of serialization of their object model (which may be 
the XML Infoset or something else). CALDAV servers in general will use 
ICS parsers.

I think in the end we should be able to come up with a solution where 
these resources on these servers can be written to with no need of a 
refetch. Optimally, that solution is compliant to RFC2616 (I would even 
say it's required to be so, so that's why it's essential to clarify what 
RFC2616 really is saying).

I'd also like to mention that in many cases, an authoring session looks like

1) Client application starts, gets the content (GET)
2) User does edits, saves (PUT)
3) User does more edits, saves (PUT)
...
n) User is done, final PUT

Ir would be good if at least a refetch GET could be avoided for every 
but the last PUT.

>> Not returning an ETag surely will always work, because the source of 
>> confusion is eliminated. But instead, you now have to resolve the lost 
>> update problem, for which the ETag previously would have been the solution.
> 
> Yes, and a weak etag is sufficient for solving the lost update problem.
> If it wasn't for the stupid fact that a weak ETag can not be used in a
> PUT/PUT If-Match sequence if following the spec.

So, can we resolve all of this by relaxing that requirement?

>> Well, all of this isn't necessary if the server does return the strong 
>> ETag, and the client just keeps authoring that resource.
> 
> Well.. lets return to the Subversion use case again.
> 
> The keyword substitution may or may not be important depending on how
> the data continues to be used on the client. If we only consider
> authoring then it's not very important as the substitution will be
> redone again on the next PUT so a PUT/PUT sequence should work out fine
> (unless subversion substitutes $Log$ like CVS does...). But it's not the
> case for any other use of the content where the substituted keywords may
> actually be important (embedded in version identifiers etc).

That would indicate there's a real use case for have more information, 
either by returning diff information as proposed by Roy, or by adding 
more detailed information to Entity-Transform.

>>> Yes.  A strong ETag in a PUT result means that the entity has not been
>>> transformed in an octet-significant way.  No strong ETag in a PUT result
>>> means that it may have been transformed and thus a refresh is necessary
>>> to obtain both the current form and the current ETag.
>> I don't think this is was RFC2616 says, and what (some) servers do 
>> today. And even if we could simply ignore that, this approach would make 
>> authoring of resources on content-rewriting servers hard, because you 
>> wouldn't be able to use ETags anymore (that is, you would always have to 
>> refresh the local copy, although it may be completely unneeded).
> 
> I don't think RFC2616 says neither yes or no here. Actually I don't
> think automatic rewriting of the posted entity was considered at all in
> the specification of PUT other than "we don't specify how any of that is
> done".
> 
> The language used in 9.6 does not leave much room for automatic entity
> modifications as part of the PUT processing.

"HTTP/1.1 does not define how a PUT method affects the state of an 
origin server."

>> Many protocols have defined new response headers that aren't metadata, 
>> and I'm not sure what problem that causes in practice. This being said, 
>> the value of Entity-Transform could actually be seen as metadata, and 
>> also uses a format which makes it impossible that stale copies of the 
>> header cause harm in the client.
> 
> Same here. To me this new header allows for reasonable signaling of
> server-side PUT entity processing where RFC2616 hasn't been very clear.
> 
> But I am also of the view that just making a better definition of weak
> etags would go pretty much as far by defining that a strong etag is
> octet equal to the PUT entity, while a weak etag may differ in
> non-important aspects as is always the case on weak etags.
> 
> So I can only second Roy here.

I do agree that upgrading weak ETags so they become usable for authoring 
would be a potential solution to that problem. We would still need to 
clarify RFC2616 with respect to returning ETags on PUT at all, and also 
with requirements on strong ETags. And when making that change, we need 
to take into account that we may break existing servers.

>> As far as I can tell, this has the price of making it incompatible with 
>> RFC2616, and also defies efficient ETag-based authoring on servers that 
>> do rewrite the content.
> 
> I don't agree.
> 
> Servers which do rewrite the content is free to exploit weak etags in
> their PUT responses, signaling that "yes, it means the same but may
> differ..", and is frankly about the only way I can make content
> rewriting servers fit in the 2616 definition of PUT where no provision
> for content rewriting is provided..
> 
> And I can't see how using an ETag returned from PUT in subsequent
> conditional requests would be incompatible. The returned ETag is after
> all supposed to represent the entity created from the PUT request.

OK, let me rephrase this: RFC2616 allows servers to rewrite content on 
PUT, and it also allows to return a strong ETag. Servers do that today. 
Now I hear that this may be a bad idea, and not what the authors of 
RFC2616 intended, but it I don't see how the actual text of RFC2616 can 
be understood differently.

So if this is really a bug in RFC2616, it would a good thing to add that 
to the errata list ASAP, and to tell the RFC-Editor not to publish XCAP, 
because it violates that requirement.

> ...
>>>     require that ETag in a response to PUT means that the client
>>>     can use that entity tag in future conditional requests (that
>>>     includes IMS, If-Match, and Range-based conditional requests).
>> And the ambiguity here was that it's not clear what "can use" actually 
>> means.
> 
> strong validator by definition means octet equality. I guess the
> question is "of what"?

Yes.

> ...

Best regards, Julian
Received on Sunday, 3 December 2006 13:33:19 UTC