Re: Etag-on-write, draft -04 from Henrik Nordstrom on 2006-12-02 (ietf-http-wg@w3.org from October to December 2006)

From: Henrik Nordstrom <henrik@henriknordstrom.net>
Date: Sun, 03 Dec 2006 00:32:19 +0100
To: Julian Reschke <julian.reschke@gmx.de>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <1165102339.3472.112.camel@henriknordstrom.net>
lör 2006-12-02 klockan 22:00 +0100 skrev Julian Reschke:

> > Which, again, is why it is completely unnecessary.  If the entity being
> > edited had an etag, and the result from performing a PUT did not include
> > an etag (or included only a weak etag), then the client can conclude that
> > the data would look different.  Whether or not it does happen to be
> 
> ...I'd say *may* look different. Many servers never return an ETag upon PUT.

I'd say that no matter if an strong or weak etag is returned the client
can assume that the result is based on what it sent in it's PUT request
for as long as the etags compare true, and that there under those
conditions usually is no need to refetch the object to continue
authoring. But returning a weak etag does signal that quite likely
something will differ on a GET for the object.

Also, since the weak etag can not be used in future conditional PUT the
agent better fetch a new copy if a weak etag was returned in response to
PUT if actual details of the content is important. Making the weak etag
mainly suitable for detecting if there has been important edits
modifying the actual content between the PUT and subsequent GET/HEAD.

> First of all, this argument seems to be based on the assumption that 
> content-rewriting servers do not return ETags, which we know is not 
> true, and is even required in XCAP. Are these servers non-compliant to 
> RFC2616? I don't think so.

They are compliant in my view. And absolutely certainly so if returning
a weak etag.

> Furthermore, I don't get the part about the intermediary. Even if the 
> server didn't rewrite the content, and returned a strong ETag, an 
> intermediary may have rewritten the content, right?

Right. Content-modifying intermediaries is on their own. HTTP specs is
not too detailed on what those may or may not do but anyone who have
grasped ETag would consider it a functional requirement that content
rewriting intermediaries also remap ETags suitably or at least degrade
them to weak. If not serious confusion may arise.

> But returning the ETag is very useful, and completely harmless unless 
> the client tries a byte range request. The final status of the result 
> will be the same, no matter whether the client does a 
> PUT/GET(refresh)/PUT or a PUT/PUT request, so it's pointless to refresh 
> the local copy unless somebody is interested in the newly substituted 
> keywords.

True, and in this scenario weak ETags maps extremely well. But I would
not argue that it's sane to return a strong ETag when doing keyword
substitution on the PUT entity.

> Not returning an ETag surely will always work, because the source of 
> confusion is eliminated. But instead, you now have to resolve the lost 
> update problem, for which the ETag previously would have been the solution.

Yes, and a weak etag is sufficient for solving the lost update problem.
If it wasn't for the stupid fact that a weak ETag can not be used in a
PUT/PUT If-Match sequence if following the spec.

> Well, all of this isn't necessary if the server does return the strong 
> ETag, and the client just keeps authoring that resource.

Well.. lets return to the Subversion use case again.

The keyword substitution may or may not be important depending on how
the data continues to be used on the client. If we only consider
authoring then it's not very important as the substitution will be
redone again on the next PUT so a PUT/PUT sequence should work out fine
(unless subversion substitutes $Log$ like CVS does...). But it's not the
case for any other use of the content where the substituted keywords may
actually be important (embedded in version identifiers etc).

> > Yes.  A strong ETag in a PUT result means that the entity has not been
> > transformed in an octet-significant way.  No strong ETag in a PUT result
> > means that it may have been transformed and thus a refresh is necessary
> > to obtain both the current form and the current ETag.
> 
> I don't think this is was RFC2616 says, and what (some) servers do 
> today. And even if we could simply ignore that, this approach would make 
> authoring of resources on content-rewriting servers hard, because you 
> wouldn't be able to use ETags anymore (that is, you would always have to 
> refresh the local copy, although it may be completely unneeded).

I don't think RFC2616 says neither yes or no here. Actually I don't
think automatic rewriting of the posted entity was considered at all in
the specification of PUT other than "we don't specify how any of that is
done".

The language used in 9.6 does not leave much room for automatic entity
modifications as part of the PUT processing.

> Many protocols have defined new response headers that aren't metadata, 
> and I'm not sure what problem that causes in practice. This being said, 
> the value of Entity-Transform could actually be seen as metadata, and 
> also uses a format which makes it impossible that stale copies of the 
> header cause harm in the client.

Same here. To me this new header allows for reasonable signaling of
server-side PUT entity processing where RFC2616 hasn't been very clear.

But I am also of the view that just making a better definition of weak
etags would go pretty much as far by defining that a strong etag is
octet equal to the PUT entity, while a weak etag may differ in
non-important aspects as is always the case on weak etags.

So I can only second Roy here.

> As far as I can tell, this has the price of making it incompatible with 
> RFC2616, and also defies efficient ETag-based authoring on servers that 
> do rewrite the content.

I don't agree.

Servers which do rewrite the content is free to exploit weak etags in
their PUT responses, signaling that "yes, it means the same but may
differ..", and is frankly about the only way I can make content
rewriting servers fit in the 2616 definition of PUT where no provision
for content rewriting is provided..

And I can't see how using an ETag returned from PUT in subsequent
conditional requests would be incompatible. The returned ETag is after
all supposed to represent the entity created from the PUT request.


> The Xythos client always assumes that content isn't rewritten. And if
> no ETag is returned, it uses the Last-Modified date as cache key. So
> it's already broken with respect to servers that have to rewrite.

And noone would argue that using Last-Modified as a guarantee that the
content has not been rewritten is a good thing to do. Last-Modified
recently is clearly defined as a weak validator which does not guarantee
octet equality.

But it would be unwise for a content rewriting server to preserve
Last-Modified if supplied by the client. A Last-Modified some time ago
may be considered a strong validator almost like a strong ETag.


> >     require that ETag in a response to PUT means that the client
> >     can use that entity tag in future conditional requests (that
> >     includes IMS, If-Match, and Range-based conditional requests).
> 
> And the ambiguity here was that it's not clear what "can use" actually 
> means.

strong validator by definition means octet equality. I guess the
question is "of what"?

> In my server I can return a strong ETag although the content was 
> rewritten, and that ETag can be used both for PUT/If-Match (PUT will 
> succeed), GET/If-None-Match and in GET/If-Match+Range (server will 
> return the full content because it knows the ETag was sent back upon PUT).

I don't read 2616 in this way, but I understand that others do.

The PUT request entity is defined as the entity to be stored under the
URL. No mention of any possible rewriting of the content here. But it's
intentionally excluded how the server stores the entity.

A strong ETag is defined to guarantee octet equality, plus imho
reasonable equality in entity headers. Note that the ETag refers to the
entity, not the entity-body.

> So if it does support that functionality, can it return the strong ETag 
> even though content was rewritten?

In my view of 2616 it should not, but it's not strongly spelled out so I
may be wrong.

But with the crippled definitions of weak etags little other choice is
available.. Your proposed draft is a way forward which allows for use
cases to continue use If-Match based on an ETag from a rewritten PUT in
a well defined manner.

Perhaps an alternative would be to more clearly define the purpose of
weak etags, and also extend their scope to allow them to be used in most
operations where octet equality is not really a strict requirement (i.e.
most operations except for If-Range).  To make this fit nicely in HTTP
may require a split in three categories of validators:

 * very weak. Give no strong guarantees at all. In this category one
finds "Last-Modified: justnow" (based on the Date of the same response,
not some clock). But this may be seen as a weak validator if one wants
to or if it by other means outside of HTTP is known for certain that two
edits within the same second will not occur (may even be upgraded to a
strong validator in such case).

 * weak. Semantic equivalence. In this category one finds weak etags or
Last-Modified being on the correct side of time but not equal to what
was matched against.

 * strong. Full equivalence on octet level and in case of ETag semantic
equivalence of important entity headers as well. Entity headers is part
of an entity and should not be ignored, but octet equivalence is not
defined or realistic. In this category one finds strong etags and
last-modified some time ago.

Regards
Henrik
Received on Saturday, 2 December 2006 23:32:31 UTC