RE: aliasing and other (primarily) editorial issues with -protocol-08 from Jim Whitehead on 1998-09-09 (w3c-dist-auth@w3.org from July to September 1998)

From: Jim Whitehead <ejw@ics.uci.edu>
Date: Wed, 9 Sep 1998 15:10:00 -0700
To: Larry Masinter <masinter@parc.xerox.com>, WEBDAV WG <w3c-dist-auth@w3.org>
Message-ID: <000201bddc3e$96649860$d115c380@galileo.ics.uci.edu>
Larry,

Thank you for taking the time to perform a detailed review of the DAV spec.
on this issue -- it has helped me understand your concerns much better.

My comments are below.

> -----Original Message-----
> From: w3c-dist-auth-request@w3.org
> [mailto:w3c-dist-auth-request@w3.org]On Behalf Of Larry Masinter
> Sent: Tuesday, September 08, 1998 11:58 PM
> To: Webdav
> Subject: aliasing and other (primarily) editorial issues with
> -protocol-08
>
>
> I've owed it to the group to submit an analysis of -protocol-08 over
> the issue of URL aliases; I apologize for the delay.
>
> As you may recall, the issue I raised was one of whether the
> webdav draft required that a DAV-compliant resource had one, and
> only one, URI. I pointed out the various situations of aliasing,
> of which 'case sensitive URI' is only one instance, but other kinds
> of aliasing might also be possible.
>
>    A collection is a resource whose state consists of at least a list
>    of internal members and a set of properties, but which may have
>    additional state such as entity bodies returned by GET.  An internal
>    member resource MUST have a URI that is immediately relative to the
>    base URI of the collection.  That is, the internal member's URI is
>    equal to the parent collection's URI plus an additional segment
>    where segment is defined in section 3.2.1 of RFC 2068 [Fielding et
>    al., 1996].
>
> In this paragraph, it isn't clear whether "MUST have a URI that is ..."
> means "MUST have only one URI; that URI is ..." or
>       "MUST have at least one URI which is ..."
> that is, is the requirement of the member resource be that it
> has no aliases anywhere, or just that aliases are out of scope,
> or is it just that the requirement on member resources be that there
> is at least one way of accessing it.
>
> Now, if it is allowed that referent of "the internal member's URI"
> is "the URI that fits the description above", i.e., an internal
> member might also have other URIs too.
>

Let me create a small example which I hope will make my 1:1 URI to resource
mapping position clear, and show how it doesn't have the namespace
restrictions which are being attributed to it.

In my view, there is a 1:1 correspondence between a URI, and an instance of
a resource.  Since a resource is an abstraction, there is a mapping step
where this abstraction is concretely mapped onto the abstractions provided
by a particular repository.

For a file system respository, a resource is mapped onto a file.  If I
understand Larry's position correctly, for a given file "foo.html", there is
exactly one resource instance.  In my view, there can be multiple resource
instances associated with "foo.html".

But, we both recognize that there can be multiple paths by which "foo.html"
can be accessed.  There might be an FTP URL, and an HTTP URL.  As Larry has
pointed out, there can be multiple HTTP URLs (aliases) by which "foo.html"
can be accessed, and I agree that a GET on any of these HTTP URLs will
retrieve an entity based on the contents of "foo.html".

The point of divergence lies with the modeling of "foo.html" with the
resource abstraction.  My position is that there can be many instances of
the resource abstraction which map to the same physical file "foo.html".
Each instance corresponds to a different URI.

I believe that Larry models "foo.html" as having exactly one instance of the
resource abstraction associated with it, but that there can be multiple URIs
which identify this instance.

So, if the file "foo.html" can be retrieved from the following URLs:

http://www.web.com/foocollect/foo.html
http://www.web.com/FOOCOLLECT/foo.html
http://www.web/com/anothercollect/baz.html

Then I view these as being three URLs which identify three resources which
each represent the file "foo.html", while Larry views these as three URLs
which identify one resource which represents the file "foo.html".

Based on this, I view a collection as containing the exactly one URI for a
resource.  However, as the example above shows, this doesn't imply that the
file "foo.html" cannot appear in more than one collection.  This also means
I don't find the language, "MUST have at least one URI which is a member of
the collection", since it isn't meaningful to me for a resource to have more
than one URI.  But, I know what you mean, and this language is perfectly
meaningful if a there is only one resource for an underlying chunk of state,
such as the file "foo.html".

Note that by having a 1:1 correspondence between URI and resource, this
doesn't mean the underlying chunk of state (the file "foo.html") only has
one parent.  However, in the 1:1 mapping, each *resource* does have one
parent, and each resource's parent can be determined by evaluating the
relative URI ".." operator on the resource's URL.

So, getting back to the previous example, for the 1:1 mapping of URI to
resource:

Resource:
http://www.web.com/foocollect/foo.html
Has parent:
http://www.web.com/foocollect/

and Resource:
http://www.web.com/FOOCOLLECT/foo.html
Has parent:
http://www.web.com/FOOCOLLECT/

and Resource:
http://www.web/com/anothercollect/baz.html
Has parent:
http://www.web/com/anothercollect/

For the many-to-one mapping of URI to resource:

The resource which models file "foo.html" has three parents:
http://www.web.com/foocollect/
http://www.web.com/FOOCOLLECT/
http://www.web/com/anothercollect/

It is currently impossible to determine two of the three parents using URL
manipulations.

As for how state-changing operations such as PUT, DELETE, and PROPPATCH
behave when issued against one of many resources which map to the same
underlying chunk of state (the file "foo.html"), this is a server policy
issue.  If the server wishes to delete all resources, and the underlying
chunk of state when a DELETE is issued against one of the URLs for a chunk
of state, that server is OK to do so.

So, following the example some more, if the file "foo.html" can be retrieved
from the following URLs:

http://www.web.com/foocollect/foo.html
http://www.web.com/FOOCOLLECT/foo.html
http://www.web/com/anothercollect/baz.html

The 1:1 mapping of URL to resource would mean that a DELETE issues against
one of these URLs, say, http://www.web.com/foocollect/foo.html, would at
minimum mean future GET requests against
http://www.web.com/foocollect/foo.html would not retrieve an entity based on
"foo.html".  The server could also decide to remove file "foo.html" and also
ensure that accesses via http://www.web.com/FOOCOLLECT/foo.html and
http://www.web/com/anothercollect/baz.html do not retrieve an entity based
on file "foo.html", but this is up to the server.


>    Any given internal member MUST only belong to the collection once,
>    i.e., it is illegal to have multiple instances of the same URI in a
>    collection.
>
> Perhaps you mean "e.g." instead of "i.e." here; that is, the
> example of 'the same URI' is one way of having 'the same resource'
> in a collection; but it is probably intended that it be illegal
> to have "Ford" and "FORD" as elements of the "Cars" collection,
> if the server is case insensitive, even though their URIs aren't
> 'the same'.

Based on the 1:1 mapping of URI to resource, this language is internally
consistent, and the "i.e." is seen to mean "that is".

> I will note the curious phrase under 'locking':
>
>    Some servers automatically replicate resources across multiple URLs.
>    In such a circumstance the server MUST only accept a lock on one of
>    the URLs if the server can guarantee that the lock will be honored
>    across all the URLs.
>
> and note that this might be considered a kind of aliasing, but seems
> to also allow for other operations ("replication").

This language is inconsistent with the 1:1 mapping of URI to resource.  I
would rewrite it as:

Some servers allow the same chunk of persistent state to be accessed via
multiple resources, hence URLs, sometimes by replicating and distributing
the persistent state. In such a circumstance the server MUST only accept a
lock on one of the resources if the server can guarantee that the lock will
be honored across all the resources.

> In any case, these are, as far as I can tell, the only places in the
> entire WebDAV protocol document anything that hint at requirement that
> a resource have only one URI. Adding that requirement in order
> to be WebDAV compliant would seem to eliminate popular features of
> web servers; eliminating the confusion and clarifying that a resource
> might have multiple URIs would improve the spec.
>
> Having done so, there seems to be no good reason to require that
> the multiple URIs for a resource have the same parentage property.
>
> In some ways, this just means that clients cannot easily be interoperable
> if they have some need to perform a (currently unsupported) DAV operation
> of "going up the parent tree". However, it's not clear, at this point,
> whether "going up the parent tree" is a necessary operation, so my
> original call for adding it may be moot. Mainly all we need is to
> remove the presumption that one can "go up the parent tree" merely by
> manipulating the URI used to originally access the resource.

This is dependent on your view of how a resource maps to a URI.

- Jim
Received on Wednesday, 9 September 1998 18:21:18 UTC