- From: Dan Brotsky <dbrotsky@adobe.com>
- Date: Wed, 16 Oct 2002 08:47:49 -0700
- To: w3c-dist-auth@w3.org
- Cc: Dan Brotsky <dbrotsky@adobe.com>
Sorry not to have weighed in on any of the interop issues since then.
Things have been a bit busy :^).
On Tuesday, October 8, 2002, at 12:25 AM, Julian Reschke wrote:
>> And if it has Etag support and it's the last PUT, you might not care
>> care if you've lost the lock as long as the content has not
>> been updated. There's no point in checking for the lock in that
>> situation and spec'ing that the check be done anyway, just causes
>> needless delay.
>> ...
>
> Wouldn't that mean to optimize for a *really* uncommon case? How
> frequently
> does that happen? Does is really require special handling?
In the real world this happens constantly. Servers can make locks go
away whenever they want, and clients have no way of learning why or
what the implications are for edit state. Workflow administrators
remove them, thinking they're inactive. They expire because the client
is idle for reasons beyond its control (sleeping a machine). Server
administrators do lock cleanups because their database seems corrupt.
The fact of the matter is that, the way the spec is written, real-world
clients constantly have to be doing rediscovery of what's happening on
the server so they can "convince" the server they know exactly what's
going on, and in fact they often have to figure out from a particular
series of rejected requests how that *particular* server models things.
(temperature rises)
Sorry if this sounds like a flame, but it's really just a heartfelt
complaint about the state of the spec from a client's point of view.
Until you've tried to write an expensive, production-quality client
used by a naive user that interoperates in a wide range of authoring
situations with 20 different servers, each of which takes a defensible
but slightly different interpretation of the spec, and each of whose
"easy to handle edge cases" mean something completely different coming
from another server, it's hard to have a good idea about how narrow a
guaranteed-successful path through the spec's garden path of
"defensible interpretations" is (hint: how small is epsilon :^), or how
hard it is to discover a path that will work for a particular server.
I believe that all the client-side spec requests that came out of the
interop (detailed below) come from high-quality implementation teams
that have watched their code devolve into a series of spaghetti strands
each of which knows how to talk to one particular server. As only a
slight exaggeration, we might as well be supporting different protocols
against each one. I've even had one of my lead implementors seriously
suggest that we implement our client API as an abstract object which
first does an OPTIONS call to see (from the "Server:" header) which
server we're talking to and then load the appropriate concrete provider!
(temperature falls)
So let me go through the various proposals floating around with an eye
not towards clarifying their details or fixing them but just motivating
them from a client point of view. First, some basic requirements that
underly this discussion, which I believe were in the original
requirements doc that Judy wrote a long time ago:
R1. distributed authoring clients are very concerned about the lost
update problem. (clients A and B read the same version of a doc, make
separate changes and then each save back not knowing the other has.
Whoever saved first loses.)
R2. distributed authoring clients are very concerned about their users
not wasting time working on something they can't save. (a variant of
the lost update problem, in which clients can discover that the doc has
been updated but not that it's going to be updated.)
R3. distributed authoring clients want to be able to offer their users
a consistent least-common-denominator model of the server's hierarchy,
even though the actual situation may be far more complex than that
model suggests.
R4. distributed authoring clients want to make the same sequence of
calls against all compliant servers, and know that they can rely on
differences in result to be due to server policy or differing
conditions, NOT on different server interpretations of the meaning of
the request.
In the light of these, here's the motivation for a number of the client
requests (no pun intended) that Lisa mentioned in her summary message
from the last interop:
1. require etag support (when feasible). Without strong (or at least
weak) validators, there's really no way to address the lost-update
problem. Mod dates are too randomly defined, which is why etags made
it into HTTP 1.1 in the first place, but even when a server's mod dates
really are weak validators there's no reason for clients to have to
figure out whether to use mod dates or etags on a server-by-server
basis (see R4).
2. Provide a lock-state-checking-less way of doing an operation on a
locked resource. In my experience with multiple server
implementations, LOCK is simply unreliable as anything other than a way
to satisfy R2: warn other users that I'm going to update the resource.
This is because, the way the spec is written, locks are simply a gating
factor on the succcess of certain client requests, not a reliable,
precise declaration of write capabilities on a resource. Locks
certainly don't guarantee that a resource hasn't been updated, because
there's absolutely nothing in the spec that keeps the server from doing
that behind my back (even leaving the lock in place!), and I can show
you servers that do this defensibly and routinely. Nor is loss of lock
a way to find out that my ability to update has changed, because locks
go away for any number of random reasons having nothing to do with me
(and I can show you servers that defensibly and routinely do that, as
well).
From a client's point of view, I do LOCK GET PUT UNLOCK so that YOU can
know not to start editing the same resource until the UNLOCK is done.
But I really don't care whether you do or not, and I really don't care
whether my lock goes away or not, because I'm going to use etags to
make sure to avoid lost updates (and so should you).
(By the way, from this point of view shared write locks are
interoperable, even when used by the same principal; consider a user of
two clients having them warn each other that they're both in use.)
3. provide a way of forcing authentication. If I'm a distributed
authoring client using a server, then I expect (in the course of an
update) to do a LOCK (if the server supports it), then a GET and/or
some PROPPATCHs, then some PUTs and/or PROPPATCHs (with the PUT
protected by an IF: etag condition, and yes I wish there were there
were a similar thing for properties), and then finally (if supported)
an UNLOCK (or maybe a DELETE, which will do the UNLOCK very effectively
:^). I want to know when I start that I have the authority to do all
of these calls, and I want to know right in the beginning what identity
I will use to do that so I can properly put some info about it in the
LOCK owner field. So I want some way of forcing the server to
challenge me to authenticate as someone who can do this sequence, and I
want it even before the LOCK starts (especially cause I can't always do
a LOCK).
4. require servers to be consistent about specifying slashes in
responses to propfind requests that enumerate a tree. (and, by the
way, require clients always to use slashes when referring to what they
believe is a collection...) This is motivated by a combination of R3
and R4. I need to be able to show my users a consistent view of the
server's hierarchy, separating collections (which appear to contain
other resources) from non-collections (which don't). When the server
responds to a PROPFIND enumerating a tree, I need not to be guessing
why this collection had a slash on the end whereas this other one
didn't, and I really need to know which ones of these returnees can
themselves be enumerated (which, by the way, is allowed to be a VERY
different piece of information than I get back from asking about the
resource type, although I don't think it was meant to be - another
problem :^).
5. that the *same* pair of clients and servers interoperate over all
features before we believe we have interoperability. The motivation
for this should be obvious by now: you can get pairwise
interoperability between DIFFERENT pairs without any guarantee that you
can write a generic client which is anything other than the "union"
client my lead implementor was proposing.
6. only allow one kind of TIME in lock expiry. this is R4.
7. require PROPFIND results to use a single base URL, and that the URL
either be the returned content-location header URL or the one the
client used. This is R3 and R4. You may expect clients to be able to
keep referring to different elements of collections with different
preferred paths, but I can't expect my user to, and I can't hide that
from him.
8. servers can't withhold lock owner info. R4: it's mine, don't force
me to work around you withholding it.
That's it for rationale. I'll gradually be sending around proposals on
resolving a number of these specific issues, hopefully ones that
reflect the discussion so far and are acceptable to the group :^). If
you got this far, thanks for your attention to such a long message.
dan
Received on Wednesday, 16 October 2002 11:48:19 UTC