RE: a critique of webdav-protocol (part 2) from Jim Whitehead on 1998-11-24 (w3c-dist-auth@w3.org from October to December 1998)

From: Jim Whitehead <ejw@ics.uci.edu>
Date: Mon, 23 Nov 1998 17:26:45 -0800
To: "Mark D. Anderson" <mda@discerning.com>, WEBDAV WG <w3c-dist-auth@w3.org>
Message-ID: <001801be1749$7f76dbe0$d115c380@galileo.ics.uci.edu>
This is part 2 of my response to Mark Anderson's comments on
draft-ietf-webdav-protocol-08, available at:
http://lists.w3.org/Archives/Public/w3c-dist-auth/1998OctDec/0099.html

> >5  Collections of Web Resources
>
> [CLARITY, OBJECTION] The entire section 5 is impenetrable.
> It seems to needlessly complicate something that should not
> be that obscure. Presumably this is a legacy from when the
> proposal was different. It is so bad that I'm going to
> annotate this as an OBJECTION as well.

The latest revision of the protocol specification has singificantly changed
section 5 in response to comments by you and others.  In particular, the
issue you raise here (again) that the mapping between a URI and a resource
can be more than 1:1 has been substantively addressed in
draft-ietf-webdav-protocol-10.

> Let me attempt to articulate the issues which section 5
> is apparently trying to address.
>
> In the real world, URLs on web sites are not unique
> identifiers for underlying resources. A resource typically has
> multiple URLs, for reasons which include replication,
> compatibility with external links (bookmarks, other apps,
> etc), and user navigation.  This is over and beyond any
> "morning star"/"evening star" issues, such as "today.html" and
> "30oct1998.html" identifying the same resource.
>
> Furthermore, in many web sites the accessible URLs return
> documents which are generated from multiple underlying
> resources -- which are typically *not* accessible to random
> users, and perhaps to no one (over http). The simplest case is
> just document conversion (doc/txt/pdf/html) from one
> underlying document. But in general the result of a GET can be
> from a combination of such things as an html template file,
> queries to various resource managers, and program logic (whose
> source code is probably stored in files).
>
> Now, when it comes to authoring of resources, it is generally
> not appropriate -- perhaps not even meaningful -- to think
> of this as being able to perform a PUT on anything that
> a GET can succeed on. This is tantamount to the "updating
> a view" situation in RDBMS's.

This issue has been discussed in section 5.4.  I appreciate you trying to
provide alternate wording to discuss this very complex subject, however, it
is too late to attempt a significant rewording of this section when there is
no demonstrable interoperability problem.

> Rather, the author typically wants to update the "underlying"
> resources -- the ones that may not even be accessible to end
> users via GET. In fact, the authorable resources will often
> reside on an entirely different server machine from the one which
> is "live" (there may even be multiple such deployed servers
> reliant on the master repository). The authorable resources will
> have URLs which likely appear under a different URL root, which
> might be a gateway to an SCM system or DMS. The URL hierarchy for
> the authorable resources may also be significantly different from
> that used for the public GET-able URLs, and in fact may have an
> entirely flat "hierarchy".
>
> Particularly where these "source resources" are managed by a
> SCM/DMS they will have URL's that function as identifiers.
> These identifiers will persist through the lifetime of the
> resource, even as it is changed, and they will be unique
> (individual versions will also be addressable).

So far I don't really see a problem here.  The source link can handle this
case quite easily.

>
> The concept of a "URN" (rfc2396) is highly relevant here.
> While the identifiers for these "source resources" may
> not have the same level of persistence, uniqueness, and
> location independence of an ISBN number, it is a difference
> more of degree than of type.

We wanted to ensure that WebDAV would work without having a dependency on
any URN resolution mechanism.

> Declaration and discovery of the association between
> GET-able public resources and authorable underlying resources
> is therefore important but is mostly uncharted territory.
> We propose the use of "source link" properties to declare
> the mapping from a derived resource to one or more
> "source resources". The act of finding the resource manager
> for a URN is called "resolution" (see rfc2276 and rfc2169).
>
> We need to be able to handle the scenarios above which hold for
> larger web sites -- and for larger projects generally, as
> sometimes the distributed authoring exercise may have nothing to
> do with web site publishing.
>
> However, we also need to handle the simplest scenario
> which might occur for example in a corporate intranet,
> where users want to GET and PUT in the same URL hierarchy.

OK, in these paragraphs you start using the royal "we", as if you were
suggesting replacement text.  But you never state that this is your
objective.  It's a bit confusing.

> It is up to the server to enforce any constraints on
> what methods are allowed on what URLs. A client can
> determine this proactively using the OPTIONS method
> and Allow http header field, as defined in HTTP 1.1
> and extended here.
>
> Depending on the user and the URL, either, both, or
> neither of a GET and a PUT might be allowed.
>
> In cases where both a GET and a PUT are allowed on a URL, the
> intended semantic of the PUT is certainly that if
> no other changes occur, an ensuing GET will recover the
> same resource that was sent with a preceding PUT.
> This is not required, but that is the usual semantic.
> Similarly, a GET following a DELETE should fail.
>
> Authoring methods should generally be forbidden by the
> server on the URLs for "read-only" derived resources.
> When a PUT or DELETE is allowed, the expectation is
> that it will have the normal consequences for those
> other derived resources (at least eventually).
> On the other hand, the expectation is that a PUT or
> DELETE on a particular resource will *not* have any
> side-effected consequences on unrelated resource --
> authorable or derived, in the same URL hierarchy or not.
>
> There is nothing to prevent a server from doing something
> else. This is all just to state that "DELETE" means delete,
> and "PUT" means put. Multiple GETs of the same URL should
> generally return the same thing (if no changes are made).
> And so on.
>
> None of this is really new with webdav; it is
> the situation with HTTP 1.1 as it stands.
>
> With this proposal, some http methods, both existing (like
> DELETE) and new (like MOVE) have an additional semantic
> concerning the URL hierarchy: they take a Depth
> parameter. Depth 0 means that the method is intended to apply
> to just the resource identified by the URL; Depth 1 includes
> its immediate children; and Depth "infinity" includes all its
> descendents. The intent with this proposal is that the "child"
> hierarchy of resources is directly tied to the slash-separated
> URL hierarchy of their identifiers. Other proposals may extend
> this proposal to act on other hierarchies or graphs of
> resources which do not correspond to the URL hierarchy and
> rely on some other declaration mechanism.
>
> If a server accepts a MOVE or DELETE request with a particular
> depth on a URL, then the expectation is that following the
> successful operation, none of the resources addressable via a
> URL which is within that depth of its "slash" hierarchy will
> exist -- in the sense that later methods on those URLs will
> fail. Similarly, the expectation after a MOVE or COPY is that
> resources formerly addressable within the specified depth in
> the "from" URL will subsequently be addressable via the
> corresponding descendent URL under the "to" URL.
>
> However, a server may choose to accept such a "recursive"
> request, and still not actually make it apply to all resources
> that are within the specified URL scope. It might return an
> error for some of those resources (say, for access control
> reasons), and it might just silently not do it for some (say,
> because some "child" resources are "invisible" to the
> particular user). Again, an OPTIONS request will allow a
> client to determine what capabilities are available ahead of
> time, and on what children.
>
> We refer to resources that are addressable with slash-terminated
> URLs as "collections". The intention is that a method with
> Depth > 0 is only meaningful for collections. We refer to a resource
> that is URL-addressable with a URL below a collection in the
> "slash hierarchy" as its "member" or "descendent". A descendent
> resource is "immediate" or "a child" of a collection resource if
> its URL has just one more path element.
>
> We refer to resources whose URLs have no trailing slash (and
> hence no children) as being "simple resources". There can be such
> a thing as an empty collection, which is not considered to be a
> "simple resource".  All resources are either collections or
> simple resources, corresponding to the spelling of their URL.
>
> This is all purely terminology based on URLs, regardless of
> whether a server implements webdav or not. Modifyable resources
> (ones which allow PUT, DELETE, etc.) should have unique URLs
> and in particular should not have a URL both as a collection
> (slash-terminated) and as a simple resource.
>
> (Deep breath.)


Again, I appreciate you trying to provide alternate wording, but it is too
late to attempt a significant rewording of this section when there is no
demonstrable interoperability problem.

> Returning to the existing section 5, none of the introduced
> notions of "compliance", "consistency", or "non-null resource"
> seems to add anything -- not that I fully understand what is intended
> by any of those terms.

Compliance is necessary to differentiate between "resources which support
WebDAV" and "resources which don't support WebDAV but which do support
HTTP".  Of course, when one uses the term "support", it is helpful to know
what is meant by this.  Section 15 gives a detailed of what is meant by
"compliance", and hence "supports".

Consistency is needed because it allows the postconditions of namespace
operations to be concisely specified.

Non-null resource is used to differential between "null resource" and
"non-null resource".

> I also prefer my terminology of "immediate"
> rather than "internal" and "simple resource" rather than "non-collection".

Again, I see little advantage to changing terms at this point now that the
community has shared knowledge organized around the existing terms.

> I see no reason for the notion of a "compliant resource".  A
> server may implement webdav or not. Its implementation might
> be said to be compliant with a spec or not.  For resources, a
> method on any URL might be allowed by a server or not. An
> OPTIONS request (which I hope can take a Depth parameter) can
> be used to determine ahead of time what operations will be
> allowed. The response to an OPTIONS Depth:1 request might
> indicate that some of the immediate members allow only GET,
> while others allow other methods.

The basic problem with your approach is there is no such thing as a
"server".  HTTP servers programs, because they support ISAPI and NSAPI
plug-ins, as well as CGIs, may not know the exact set of functionality of
all their resources.  The only granularity over which it is possible to say
anything conclusively about compliance/non-compliance is an individual
resource.

> It would be worthwhile to construct a MUST/SHOULD table
> indicating the interrelationships among methods over time,
> assuming that only one client is talking to the server. For
> example, if a server responds with an OPTIONS response
> indicating that children can be MOVE'd, then an ensuing MOVE
> SHOULD succeed. A GET after a successful DELETE MUST fail.
> A Depth > 1 method SHOULD behave the same as if the client had
> carried out the operation as separate requests working up from
> the bottom. And so on. This table might be extended to convert
> some of the "SHOULD"s to "MUST"s in the event that a resource
> has some specified property (via the independent property
> spec) which indicates that the server signs up to some greater
> commitment.

I think this would be very nice information to have in a design document, or
an implementor's guide.

One of the guidelines we followed in the WebDAV spec. was not trying to
specify the same requirement in multiple places.  I feel that the table you
suggest would specify the same information in multiple places.

> A *server* can be said to be "compliant" or not according
> to whether it implements the rules in that table.

There would be much more to server compliance that just these requirements.

> It would be easy to belabor the semantic portent being attached to
> the "URL namespace", attempting to specify precisely that there are
> no duplicates, no infinite URLs, no cycles, and so on, with some
> sort of suitably mathematical language.  This is probably not
> necessary beyond the MUST/SHOULD table suggested above. (It would
> in fact be quite a feat to successfully specify the "no duplicates"
> criterion precisely. Unfortunately Frege, Wittgenstein and Kripke
> didn't write RFCs.)

Again, you hit another tradeoff in the writing of the WebDAV spec.  While
the WebDAV specification might very well be more precise if expressed in
some mathematical notation or using some formal specification technique, we
decided to stick to english language specifications for readability reasons.
Time and again, experience from people applying formal methods is the formal
specifications are difficult to read (the one exception I hear is
StateCharts, which would be *very* difficult to express in ASCII text).

>
> As for "collections", it should just be as simple as what
> I specified above: collections are resources addressable with
> slash-terminated URLs. Period. Regardless of webdav.
>
> Just to belabor a couple of salient paragraphs from section 5....
>
> >   Any given internal member MUST only belong to the collection once,
> >   i.e., it is illegal to have multiple instances of the same URI in a
> >   collection.
>
> [CLARITY] This is an example of the kind of muddle about resources
> versus URI's that pervades the spec. Is a collection a resource
> or a URL? If it is a resource, then we shouldn't talk about a URI
> being in a collection. And what does "multiple instances of the
> same URI" mean? Does that mean multiple resources addressable
> by the very same URI?
>
> >   For all WebDAV compliant resources A and B for which B is the parent
> >   of A in the HTTP URL namespace hierarchy, B MUST be a collection
> >   which has A as an internal member. So, if http://foo.com/bar/blah is
> >   WebDAV compliant and if http://foo.com/bar/ is WebDAV compliant then
> >   http://foo.com/bar/ must be a collection and must contain
> >   http://foo.com/bar/blah as an internal member.
>
> [CLARITY] If the definition of collection is purely one of
> addressability within a URL hierarchy, then this is almost tautological.
> If the definition is something else, then the significance of this
> paragraph escapes me.

These points were addressed by a significant discussion on the mailing list.
If you believe these points raise new information in addition to the points
which were raised on the list, please identify those points.

See specifically threads beginning with:
http://lists.w3.org/Archives/Public/w3c-dist-auth/1998JulSep/0153.html
http://lists.w3.org/Archives/Public/w3c-dist-auth/1998JulSep/0227.html
http://lists.w3.org/Archives/Public/w3c-dist-auth/1998OctDec/0126.html


>
> >   In HTTP/1.1, the PUT method is defined to store the request body at
> >   the location specified by the Request-URI.  While a description
> >   format for a collection can readily be constructed for use with PUT,
> >   the implications of sending such a description to the server are
> >   undesirable.  For example, if a description of a collection that
> >   omitted some existing resources were PUT to a server, this might be
> >   interpreted as a command to remove those members.  This would extend
> >   PUT to perform DELETE functionality, which is undesirable since it
> >   changes the semantics of PUT, and makes it difficult to control
> >   DELETE functionality with an access control scheme based on methods.
>
> [CLARITY] I don't understand this argument. What is it about a
> PUT that entails a delete? How is this resolved with MKCOL?
> Are PUT and MKCOL thought to be methods which create only one
> resource (which might be a collection)? Or are you saying that
> one or both of them is able to also start populating the
> collection with children as part of the single request?

Some people had suggested that you should create a collection by performing
a PUT with a message body which lists the members of the collection, e.g.:

PUT /collection/ HTTP/1.1
Host: www.foo.com
Content-Length: xxxx
Content-Type: application/davcollection

http://www.foo.com/collection/mem1.html
http://www.foo.com/collection/mem2.html
...

So, if collection/mem1.html, collection/mem2.html, and collection/mem3.html
already exist, this might be interpreted as being an instruction to "remove
collection/mem3.html".

>
> >   Note that
> >   the value of a source link is not guaranteed to point to the correct
> >   source.  Source links may break or incorrect values may be entered.
> >   Also note that not all servers will allow the client to set the
> >   source link value.  For example a server which generates source
> >   links on the fly for its CGI files will most likely not allow a
> >   client to set the source link value.
>
> [CLARITY] I'm not certain why the draft is so hesitant about
> requiring source links to be reliable. Presumably if they are
> provided, that represents some level of commitment? Since they
> point to the underlying source resources, which as discussed
> above can be nearly tantamount to URNs, I see no reason to be
> so lax about their being unreliable.

The source link can either be live or dead.  If it is dead, it raises the
issue that the property may not be consistent, since the client may not have
updated the source link since whatever event occured which caused its value
to be incorrect.

If the destination of a source link is a URN, as you suggest, then an
inconsistent source link value is a very unlikely event indeed.

> Links and pointers are an important aspect of a full-fledged
> property proposal; most of the properties discussed in the
> draft are on only a single resource. Again, the webdav
> property framework should be isolated into its own proposal,
> and all this stuff about "dead properties" and "links" and so
> on can be fleshed out there.

See previous objection, on the grounds this would lead to reduced
interoperability.

> As stated above in my attempt to re-articulate section 5, the
> ability to map to source resources is an important issue. It
> deserves to be highlighted more in the draft, rather than
> mentioned here and then buried in section 13.10. In particular,
> so far as I can make out, this is the only property which is
> intended to be set on non-authorable (derived) resources.
> It would aid the reader to draw that out.

It is a pretty simple mechanism, which appears to be well-specified in
section 5, and section 13.10.  I'm not sure what, exactly, you mean by
"highlight more", or what this would necessarily achieve.  Would this make
the source link more interoperable?

- Jim
Received on Monday, 23 November 1998 20:49:04 UTC