Re: ETags and concurrency control

Henrik Nordstrom wrote:
> Wait a minute here. The "weak" ETags generated by Apache isn't that
> weak. For Apache in default configuration to generate the same weak ETag
> for two different versions the following need to all be true
> 
> - The update was done in-place by overwriting parts of the file,
> preserving the same inode number.

Or the inode is recycled when the old file is deleted and a new one
created.

> - The update MUST be within the same sub-second as the previous update

Or the clock moves backwards due to a correction, or someone writes a
similar file using a timestamp-preserving copy (like cp -p, rsync -t).

NB: Both of these break strong Etags for updates _not_ in the last
second too - Apache's algorithm is not watertight.

> - The update MUST NOT change the file size.

Quite common, when editing a file of the same name.

> - The inode change timestamp must also not change by the update.

If the modification time is in the same second, you can be quite
confident the change timestamp will be in the same second too.

> This can practically only happen if there is other processes updating
> the file content directly outside Apache.

That's probably the most common way files in Apache are updated.

> The reason why Apache sends weak ETag on content modified in the last
> second is because the default configuration assumes there will be other
> processes running on the server "randomly" overwrite parts of published
> files many times within the same second.

Or non-randomly.  Every way I've seen files updated and published
through Apache other than WebDAV (FTP, rsync, scp) can trigger these
problems occasionally.

Clearly, it does screw up range requests.

But also: after doing an update, then you run a client to GET the
file, perhaps to verify it's serving the right content, you expect to
get the file you have just updated.  If weak Etags are used in
caching, this is not guaranteed any more.

> For normal HTTP use where updates is done using PUT, or nearly all
> normal edits the above isn't true and Apache may just as well send a
> strong ETag without any loss of guarantee.

That's right.  I believe it can be configured to do so, if you can
confirm all these guarantees.

Dynamically generated content from databases, blogs, wikis etc. can
also use strong Etags in the same way for precise cache validation, if
you can confirm the tags precisely.  So can backends serving ordinary
files modified by other processes, if you can use something like
Linux's F_LEASE or inotify to be informed of file changes
synchronously.  But these aren't the simplest of configurations.

-- Jamie

Received on Friday, 2 May 2008 18:56:15 UTC