Re: Repository -- do we need it?

Geoffrey M. Clemm (gclemm@tantalum.atria.com)
Wed, 5 May 1999 10:11:26 -0400


Date: Wed, 5 May 1999 10:11:26 -0400
Message-Id: <9905051411.AA07654@tantalum>
From: "Geoffrey M. Clemm" <gclemm@tantalum.atria.com>
To: sv@hunchuen.crystaliz.com
Cc: ietf-dav-versioning@w3.org
In-Reply-To: <007501be963d$36673ee0$d0acddcf@crystaliz.com>
Subject: Re: Repository -- do we need it?

   From: "Sankar Virdhagriswaran" <sv@hunchuen.crystaliz.com>

   > A workspace provides a "version-selection" mechanism that
   > maps from versioned-resources to revisions and working-resources.  A
   > repository holds the versioned-resources, activities, configurations,
   > and workspaces.

   ... Ignoring versioned resources for the moment, I think of
   activities, configurations, and workspaces as 'namespace service
   providers'. They provide particular kinds of organization on top of
   'data'.

I believe that "collections" are the only namespace service providers.
Activities, configurations, and workspaces group things, but not for
the purpose of giving them names.

   If we allow clients to access the repository through these
   'namespace service providers', in an orthogonal fashion, then we may
   be able to achieve the orthogonal I was hoping for. For example, a
   sophisticated client can use the configuration namespace service
   provider and the activities namespace service provider to implement a
   change-set based consistency management system and can ignore
   workspaces.

The resources being defined are designed to be orthogonal, but not
redundant.  So activities, configurations, and workspaces are not
different flavors of the same thing, but rather very different things
that serve different purposes.

A workspace creates a namespace within which apropriate revisions
appear with appropriate names.

An activity defines a "logical change" to a set of resources.  It
is semantically a set of change arcs, but for convenience we represent
it as the set of revisions that are the destination of those arcs.

A configuration defines a set of revisions for historical recreation.

If you carefully limit the set of methods that each of these objects
are required to perform, the server can chose efficient implementations
of them.  If you blur the distinction between them by requiring that
they each perform the function that the others provide, a server
no longer can peform the optimizations that are essential if this
protocol is going to be useful for large-scale applications.

In particular, a "workspace" is currently proposed as the artifact
the server can use to do efficient version selection on groups of
resources.  The fact that a client uses a workspace for a series
of requests is the key characteristic that allows a server to cache
information for re-use between requests.  A server can chose to just
implement a very lightweight workspace, and not bother to do any
caching, but unless the protocol requires this workspace argument
to operations, a server *cannot* effectively optimize for large scale
applications by caching information between requests.

   Similarly, (as Geoff mentions) a simple browsing client
   can just allow 'managers' to browse the various namespaces for
   administrative purposes or build a way of navigating related changes
   (i.e., activities) and user tasks (i.e., workspaces) for what is often
   referred to as 'change tracking and management'.

Yes, the metadata being proposed (especially activities) were carefully
designed to facilitate key out-of-scope tasks like "change tracking".

   NOTE: some argue that even versioned resources should be implemented
   as based on an 'artifactual' system.

What is "an implementation that is based on an artifactual system"?

   If that implementation strategy
   was chosen by some of the implementers, even versioned-resources data
   in the repository becomes a namespace.

Versioned resources have URL's, and revisions have URL's, so they are
visible in the URL namespace.  But this is a protocol commitment, not an
implementation strategy.  Probably I misunderstood your point.

   From watching the discussion
   in the DAV list I have a feeling that only a few implementers think
   this way about versioned resources, therefore I mention this point in
   a note rather than to substantiate the main point of supporting having
   a repository specification just based on orthogonality and
   extensibility.

Just to reinforce my earlier point, orthogonality does not imply that
you can have just one without the other.  Sleeping and eating
are orthogonal issues, but that doesn't mean that you
can chose to just sleep and never eat, or vica versa.

   I think that my proposal is different from Geoff (am I right
   Geoff?). I think he wanted to have a repository (as in a database)
   specified as part of the protocol for mostly 'administrative' uses. I
   think I am actually extending his idea to have the repository as a set
   of namespaces that can be used by DELTA-V clients in an orthogonal
   fashion to perform their function, not just use it for administrative
   UI purposes.

I'd have to see more specifically what "extending a repository to
be a set of namespaces" means, but if it means that a repository performs
all the functions of a workspace, a configuration, and an activity, then
that would be a very different proposal.

   PS: Also, if the advanced collections is specified correctly, these
   'namespace service providers' can be implemented using advanced
   collections. Our implementation is based on such an architecture.

We need to be a bit careful with the term "implementation".  I believe
you are talking about how the protocol is layered above existing
protocols, and I definitely agree that collections (and especially,
advanced collections) are the constucts that should be used to define
the namespace protocol.

But we need to be very careful to distinguish how the protocol is
layered above existing protocols, from how the protocol is
implemented on a server.  In particular, the chance that the server
that is designed to scale will be able to reuse its general
collection implementation for a more sophisticated construct like a
"configuration", is vanishingly small.  This means that the
server will need a separate configuration implementation, at which
point any methods required by the collection protocol that are
not really needed for the configuration protocol are actually a
*burden* on the server implementor, not a benefit.

Cheers,
Geoff