Re: Repository -- do we need it?

Geoffrey M. Clemm (gclemm@tantalum.atria.com)
Wed, 5 May 1999 22:53:35 -0400


Date: Wed, 5 May 1999 22:53:35 -0400
Message-Id: <9905060253.AA07899@tantalum>
From: "Geoffrey M. Clemm" <gclemm@tantalum.atria.com>
To: sv@hunchuen.crystaliz.com
Cc: ietf-dav-versioning@w3.org
In-Reply-To: <004e01be971d$ab25a790$d0acddcf@crystaliz.com>
Subject: Re: Repository -- do we need it?

   From: "Sankar Virdhagriswaran" <sv@hunchuen.crystaliz.com>

   > I believe that "collections" are the only namespace service providers.
   > Activities, configurations, and workspaces group things, but not for
   > the purpose of giving them names.

   Why not? (see below where I expand on the concept of my twist on repository)

Collections, activities, configurations, and workspaces all refer to
the same set of revisions.  If more than one of them tries to give the
same revision a name, it is inevitable that the name one of them gives
it will conflict with the name another gives it.  Which one wins?

In contrast, if only one resource type is responsible for naming
(i.e. the collection), then it is unambiguous what the name is, and
where to get the name from.  You go to the collection containing a
resource, and that tells you one segment of the name.  You go to the
collection containing that collection, and that gives you the
preceding segment, and so on until you get to the root of the URL tree.

   ... However, for those cases where this type of caching will actually
   create problems and for those cases where workspace concept actually
   gets in the way of designing appropriate clients I would like to have
   the ability to not use workspaces to perform the kind of selections.

Can you describe on situations where caching would create problems,
and on where the workspace concept would get in the way of designing
appropriate clients?  A workspace in this context is just a resource
that can contain a "revision selection rule".  The only difference
between not using a workspace and using a workspace is that in the
former case you pass the revisions-selection-rule in a header, while
in the latter case, you PROPPATCH the revision-selection-rule into a
workspace and then pass the name of thw workspace in the header.

Other than occasionally doing two method calls instead of one, how
does this get in the way of designing a client?

   > What is "an implementation that is based on an artifactual system"?

   Basically, each new state of any versioned entity gets a new, globally
   unique-id. In other words, each state is considered to be an
   'artifact' (in the archeological sense of that word) - i.e., it is
   immutable. Version names, etc. are all overlaid on top of this
   unique-id scheme as a 'namespace'.

This is certainly an implementation that will be supported by the
protocol (since many of us use exactly that implementation, or one
very like it).  If it isn't, please do let us know so that we can
fix it!

   > I'd have to see more specifically what "extending a repository to
   > be a set of namespaces" means, but if it means that a repository performs
   > all the functions of a workspace, a configuration, and an activity, then
   > that would be a very different proposal.

   Yup ;-). Just to explain about this notion of a collection of
   namespaces and the notion of namespace service provider, let me use
   the example of Java Naming and Directory Interface (JNDI). JNDI
   provides a common API for querying and navigating different
   (typically) graph structured namespaces which can be federated. They
   also have a way of implementing particular implementations of this API
   using 'service providers'. They have implemented service providers for
   navigating file systems, CORBA name space service objects, LDAP
   directories, etc. One can imagine implementing such a service provider
   for configurations, activities, and workspaces. In particular,
   workspaces are similar to LDAP service providers because LDAP service
   providers actually have to support sophisticated querying on the data
   they maintain.

Actually, I believe it is a "collection" that is equivalent to a JNDI
directory, not a workspace.  A workspace is just a mechanism for
mapping a versioned-resource to a revision or working-resource of that
versioned-resource.  A server uses the workspace in conjunction with
an initial versioned-resource to successively map segments of a
hierarchical name to versioned-collection revisions and then finally
to a versioned-resource revision.

So it is not the workspace, but rather the versioned-collection
revisions that provide the name mapping.  The workspace just provides
the version-selection service.

   I have not been tracking the advanced collections spec. development as
   closely as you have been. So, may be I way off the mark.  However, I
   thought the advanced collections spec. as it evolves looked more and
   more like JNDI (given the discussion about resources and how resources
   are actually mapped to different things and given the discussion about
   bind and unbind). So, I was making the suggestion about
   'implementation' in the sense of JNDI service providers.

Yes, advanced collections are very much like JNDI directories.
(Although interestingly enough, JNDI only lets you apply properties to
the directories, not to non-directory resources registered in those
directories).

   That is, in my mind, the advanced collection spec. would provide a
   general set of protocols to create/modify/delete collections and to
   navigate them. Once could then 'implement service providers' that
   implement this protocol for different types of specific collections we
   care about (compound document collections, configurations, activities,
   etc.). Hope this helps in clarifying.

Yes, for resources for which "add-member", "move-member",
"delete-member", and "share-member" all are required, using the
advanced collection protocol is the only sensible thing to do.  But my
position is that these methods should *not* be required for specifying
the revisions selected by a configuration, so modeling a configuration
as a collection of revisions would not allow the server to make the
implementation choices essential for efficient configuration creation.

Let's take a specific example.  Suppose I were a branch-based server
implementor.  One of the most efficient ways to implement a "snapshot"
operation is to just store the time-of-day and current branch as
immutable properties of a snapshot resource.  This works great if the
only operation I need to support is "snapshot the contents of this
workspace" (I can get the current branch from the workspace RSR, and
the time-of-day from a system clock).  But if I need to support
"add/move/delete" configuration member, I'm out of luck.  I don't want
to be out of luck (:-).

   >This means that the
   > server will need a separate configuration implementation, at which
   > point any methods required by the collection protocol that are
   > not really needed for the configuration protocol are actually a
   > *burden* on the server implementor, not a benefit.

   This I don't agree. In general what you say is true - generality has
   burdens on specific implementations.

I only care about specific cases where it would cause a problem.  In
general, I support the re-use of collection protocol wherever
possible.  (That's the basis for my "property-collection" proposal,
i.e. whenever you have a property whose value acts like a collection,
just use the collection protocol as the means to update it.

   However, there are advantages to going with the approach I am
   proposing. Client writers have to learn one 'regime'. We found this to
   be very useful when we did our implementation. Our client writers (who
   were not as sophisticated as our server writers) had to learn one way
   of creating/modifying/navigating different namespaces. Also, API's
   such as JNDI and CORBA collections spec. and Java 1.2 (i.e., Java 2)
   collections API actually show different ways of achieveing our
   objectives.

Yes, that all sounds right to me.

   Imagine the other case. I need to educate our client writers with the
   basic DAV protocol methods, the advanced collections protocol methods,
   the configuration (collection) protocol methods, the activities
   (collection) protocol method, the workspace (collections) protocol
   methods (and DASL). Folks won't be able to swallow all the subtle
   differences between each of these methods.

Yes, if there were a complex set of methods associated with each
of these, that would be a mess for exactly the reasons you state.
But I'm just concerned with one specific case (i.e. a configuration)
for which supporting the collection protocol would make certain
desireable implementations infeasible.

   Even after implementing a system such as the one that we are
   developing in DAV, I (personally) cannot keep all the different
   variations in my mind. This is partly due to terminology and partly
   due to spending only part time on the WEB-DAV activity.  Still, I hope
   you see my point.

Yes, I completely agree with you in principle, i.e. layer above existing
protocol whenever possible.  It's only for very specific cases
(or here, just one very specific case) where a particular existing
protocol is inappropriate for a particular resource type.

   thanks for listening

As always, thank *you* for your time and interest!

Cheers,
Geoff