Hierarchical URLs and Collections (was: Docushare and WebDAV model)

I'd like to state what I believe should be done regarding collections
and HTTP URLs.  I do this without having studied WebDAV recently,
although I have followed most of the discussion on the list.  This does
address some of the founding ideas of WebDAV, and so perhaps these
issues were addressed early on or off-line.  But it seems there are some
more questions or perhaps new requirements by potential users.

First of all, I think it should be fine to *allow* a collection and its
components to be identified with non-hierarchically related URLs.  By
that I mean the form of URL that these Docushare people want, where each
object or collection has a URL like
"HTTP://<dms-namespace>/<object-handle>" with no explicit relationship
between the URLs encoded in the path.  However, this *permission* to use
unrelated URLs is not the end of the story.

I would also like to see hierarchically related URLs as alternatives, as
the WebDAV people desire.  This means that the very same collection and
components could have *both* hierarchically related URLs and unrelated
URLs.  I would, in fact, like to *require* the hierarchically related
URLs, but I don't have a justification for that requirement.  It would
allow the client to predict a URL for a component given the URL of a
collection and an identifier for the component within the collection.
This may be a convenience, but a component identifier could also be some
other relative or absolute URL if it happened that the URL of the
component was unrelated to that of the collection.  The client would
have to ask for the component identifier either way, and it may be a
relative identifier or a full URL.  I don't see any other benefit for a
requirement, but I'll have to think about it some more.

But allowing and supporting *multiple* URLs per resource is a
significant change that has other benefits, regardless of the
hierarchically related URL issue.  It recognizes that resources may be
replicated and may move.  Multiple URLs for the same resource will, in
fact, exist whether or not HTTP acknowledges it.  But HTTP does have
mechanisms to specify the URI in a response, and redirect to other URIs,
so that's a start.  This also has implications for URNs where there
could well be multiple URNs for the same resource over the life of the
resource (this is not a conflict with the uniqueness requirement, by the
way).

Furthermore, I would like to see every HTTP URL that ends in a '/'
correspond to a collection, where the URLs of its components are those
that extend the URL (except for '..' extensions).  There are already
several ways in which the hierarchical path has meaning within HTTP;
consider security realms, cookie paths, redirection specifications
(within the server configuration), and relative URL resolution.
Formalizing these notions to talk about collections instead of some
fuzzy path thingie that may or may not correspond to a resource would
help HTTP, I believe.  This doesn't necessarily mean that one could
create URLs for collections by repeatedly chopping off the tails,
although I would like it to mean that too.  It also doesnt mean that if
a user has access to a component of a collection they also have access
to the collection.

By the way, contrary to popular belief, the hierarchical path of HTTP
URLs has only coincidental correspondance with file system hierarchies.
All of us know, of course, that the hierarchical path could map to any
resolution process within the server that may or may not use filesystem
hierarchies.  The fact that the path is visible to clients means they
can treat them hierarchically, and that is as it should be.

--
Daniel LaLiberte
 dlaliberte@gte.com  (was: liberte@ncsa.uiuc.edu)
 liberte@hypernews.org

Received on Monday, 10 August 1998 14:47:54 UTC