Re: NEW ISSUE: weak validator: definition inconsistent from Robert Siemer on 2008-01-12 (ietf-http-wg@w3.org from January to March 2008)

From: Robert Siemer <Robert.Siemer-httpwg@backsla.sh>
Date: Sat, 12 Jan 2008 04:37:24 +0100
To: Lisa Dusseault <lisa@osafoundation.org>
Cc: Larry Masinter <LMM@acm.org>, 'Werner Baumann' <werner.baumann@onlinehome.de>, ietf-http-wg@w3.org
Message-ID: <20080112033724.GA1632@polar.elf12.net>
First I want to say that I see the HTTP spirit that way:
strong match --> octet identity
weak match --> semantic identity

So I agree with Werner's original interpretation of the issue.

>
> If we change the definition "Good enough" should be qualified whether  
> the spec means good enough for read-only caching, or good enough for  
> write-without-lost-update.
>

Both, of course. (being write-without-lost-update something like
PUT /x  If-Match: W/"tag")

> 
> Are there servers that use weak validators besides Apache?
>

The assumption that
-current servers generate "wrong" weak etags (on it's own) and
-these weak etags never match on these servers and
-the affected server does not handle resource modifying requests all 
 alone based on RFC 2616 only (no scripts, no WebDAV)

makes the answer to that question irrelevant as _reading_ twice doesn't 
hurt, does it? The asumption excludes any _writing_.

I assume you don't ask for CGI scripts that try to take advantage of 
weak etags, do you?


>
> Should Apache's trick of switching from a weak to a strong validator  
> (with the ability to compare the strong to the weak) be documented?
>

Can you enligthen me on that?



My big picture so far is:
-"tag" matches weak with W/"tag"  (that solves issue #71, too)
-servers can't "automatically" generate weak etags for e.g. files, as 
 they have no idea about semantic equivalence
-server could be configured to generate weak etags under some 
 circumstances, but that means that the administrator puts some external
 knowledge into the system (e.g. the admin knows that 
 same-second-modifications are semantic equivalent)
-same-second-modifications make no weak or strong match at all otherwise
-if a client provides a weak etag or downgraded a strong etag to a weak 
 etag on a request, weak matches make up for an effective match
 (otherwise the client shouldn't provide a weak etag)
-if a client provides a strong etag or upgraded a weak etag to a strong 
 etag, only strong matches make up for an effective match (otherwise the 
 client should provide a weak tag if weak matches are acceptable)
-range and range-like requests form no exception: if the range is octet 
 based, the client should not put weak etags in requests if it has no
 means of combining weak match results (it may have ways to do so, with 
 overlapping parts or such things...)

That raises a NEW practical ISSUE:
How can a response carry several etags at the same time to fully take 
advantage of semantic equivalence:

e.g. four documents with their strong etags and several weak ones:
"1": W/"1"
"2": W/"1", W/"2"
"3": W/"1", W/"2", W/"3"
"4": W/"1", W/"2", W/"3", W/"4", W/"5"
"5": W/"2", W/"3", W/"4", W/"5"

That means entity "5" is already too different from "1", but not from 
the others. Entity "4" is "forward semantic equivalent" with "5".

Within the current spec, I see only two solution:

A) we care about strong match requests (for sensible range requests)
-servers provide the strong tag only
-on requests clients downgrade strong tags to weak ones to take 
 advantage of weak matches (if they can (i.e. no ranges))
-caches may construct the equivalence list seen above forwarding 
 conditional requests for unknown weak tags associating them with the 
 strong etag they receive in the response

B) try to improve match hits in caches
-server sends the most promising weak etag from the weak etag list
-client uses that weak etag (or not)
-caches do their best
-no sensible range requests possible

A new header or something like that for providing a strong etag (or 
several etags?) while sending a standard ETag: W/"weak one" makes room 
for good caching possibilities with "old" implementations that don't 
match weak etags with strong etags at all.


Robert

> 
> Lisa
> 
> On Jan 2, 2008, at 1:11 PM, Larry Masinter wrote:
> 
> >to define it more carefully, perhaps to remove the notion of "semantic
> >equivalence" and replace it with "good enough, from the server's  
> >point of
> >view". That is, a server is free to report a "match" on a weak  
> >validator if
> >the server thinks an entity previously served with that validator  
> >is "good
> >enough", from the server's point of view. Whether that's semantically
> >equivalent doesn't need to come into the picture, except as an  
> >example of
> >one reason why, even if something has changed, you might be content  
> >to let
> >the client use old content.
> >
> >
> >
> >>-----Original Message-----
> >>From: ietf-http-wg-request@w3.org [mailto:ietf-http-wg- 
> >>request@w3.org]
> >>On Behalf Of Werner Baumann
> >>Sent: Saturday, December 29, 2007 8:57 AM
> >>To: ietf-http-wg@w3.org
> >>Subject: NEW ISSUE: weak validator: definition inconsistent
> >>
> >>
> >> From 13.3.3 Weak and Strong Validators:
> >>
> >>    Entity tags are normally "strong validators," but the protocol
> >>    provides a mechanism to tag an entity tag as "weak." One can  
> >>think
> >>    of a strong validator as one that changes whenever the bits of an
> >>    entity changes, while a weak value changes whenever the  
> >>meaning of
> >>    an entity changes. Alternatively, one can think of a strong
> >>validator
> >>    as part of an identifier for a specific entity, while a weak
> >>    validator is part of an identifier for a set of semantically
> >>    equivalent entities.
> >>
> >>      Note: One example of a strong validator is an integer that is
> >>        incremented in stable storage every time an entity is  
> >>changed.
> >>
> >>        An entity's modification time, if represented with one-second
> >>        resolution, could be a weak validator, since it is possible
> >>that
> >>        the resource might be modified twice during a single second.
> >>
> >>While in paragraph 1 "weak validator" is defined in terms of semantic
> >>equivalence, paragraph 3 qualifies modification time as "weak
> >>validator". But the second modification of a file within the same
> >>second
> >>may change the file into anything. There is no means to guarantee
> >>semantic equivalence in this case. Both this paragraphs are mutual
> >>exclusive.
> >>
> >>The reason for this is the abstraction "weak validator" itself.
> >>While "validator" is a good abstraction from the details of
> >>Last-Modified and Etag, and also "strong validator" is quite clear,
> >>this
> >>can't work for "weak".
> >>
> >>"weak validator" tries do build a common abstraction from two
> >>different,
> >>completely unrelated kinds of "weakness".
> >>
> >>Weak etags: the weakness is not to guarantee byte-equivalence, but  
> >>they
> >>guarantee semantic equivalence. Of course, the server needs some
> >>concept
> >>of semantic equivalence build in, to use weak etags. (Oh, and it  
> >>would
> >>be fine, if the client would have the same idea about semantics.)
> >>
> >>Last-Modified date: the weakness is the limited time resolution.  
> >>It is
> >>*unreliable* (or not a validator at all), unless it meets some extra
> >>conditions. There is no concept of semantic equivalence whatsoever.
> >>
> >>On consequence are the strange restrictions on "weak validators".
> >>Clients must only use them in conditional (full body) GET requests.
> >>This
> >>is reasonable for Last-Modified (if it does not meet the additional
> >>restrictions), but not at all justified for weak etags.
> >>
> >>The only reasonable restriction on weak etags is not to use them in
> >>range requests. But a PUT with If-Match: W/"xxx" is perfectly ok.
> >>
> >>I suggest to remove the term "weak validator" from the spec.  
> >>Validator
> >>is either a Last-Modified Date or an Etag. Etags can be strong or  
> >>weak.
> >>I should be made clear, that weak etags ore only meant to validate
> >>semantic equivalence and it should be clear, that everything said  
> >>about
> >>semantic equivalence is related to weak etags.
> >>
> >>Practical issue:
> >>Apache misuses weak etags when it can not create a strong one, due to
> >>the limited time resolution (and mtime is the main component of
> >>Apache's
> >>etags). This etags will *never* match. (IIS seems to do something
> >>similar.) Although I'm sure, this is not what weak etags are intended
> >>for, one could use the inconsistent definition in the spec to justify
> >>this (one has to be either a lawyer or a programmer to do so).
> >>
> >>I don't know, if there is any application, that uses weak etags as  
> >>they
> >>are intended (for validating semantic equivalence). But if there  
> >>is, or
> >>will be, the above misuse will most likely create interoperability
> >>problems. WebDAV-clients (e.g. davfs2) already have problems to work
> >>around this wrong "weak etags".
> >>
> >>Werner
Received on Saturday, 12 January 2008 03:36:34 UTC