Next message: Tim Ellison OTT: "RE: Enumerating repositories and worksp"
Message-ID: <65B141FB11CCD211825700A0C9D609BC01D4D719@chef.lex.rational.com>
From: "Clemm, Geoff" <gclemm@Rational.Com>
To: ietf-dav-versioning@w3.org
Date: Thu, 24 Feb 2000 10:30:19 -0500
Subject: RE: Units of Work
One of the problems with attempting a "bulk update" of a collection
is that you cannot distinguish a "MOVE" from a "COPY/DELETE". A MOVE
simply gives a new name to a versioned resource, and changes made at
the new location are simply added to the history of that versioned
resource. A COPY creates a new versioned resource with an empty history.
So I agree that the problem is real, but coming up with an interoperable
protocol to deal with the problem is hard.
Just as an aside, the user visible notion of a "unit of work" is captured
by the notion of an activity. This is very different from the user
invisible
implementation issues that David is raising.
Cheers,
Geoff
-----Original Message-----
From: David.Goodenough@dga.co.uk [mailto:David.Goodenough@dga.co.uk]
Sent: Thursday, February 24, 2000 6:51 AM
To: ietf-dav-versioning@w3.org
Subject: Units of Work
I am new to this list, so please forgive me if I charge in with only a
cursory read of all that has gone before (there is a lot of it to read).
I come from what is probably a slightly odd background compared to most of
you, in that I am not from a vendor of document management or version
control software, and I have not been involved in efforts like RCS and CVS
or any of their free software decendants. I do work in a software house,
but we are users of such code. We recently hit the problem of integrating
two IBM products, VisualAge for Java(VAJ) and Team Connection(TC). These
can talk to each other using the SCC API, but we are not a Windows shop (we
used to use OS/2 and are moving to Linux) and so that option was not open
to us. TC had just introduced an XML interface to allow access into the
repository, and as VAJ allows me to build tools which access its repository
I built a bridge between the two environments. This bridge is available
(for free) from our web site should anyone be interested. It is my
intention to build a new bridge which will connect VAJ to Delta-V/WebDAV.
Coming at the problem afresh, and to a degree influenced by what the TC and
VAJ interfaces allowed, I constructed a rather different bridge to those
that seemed to exist before, in that they concentrate on what are in
Delta-V terms Workspaces, rather than individual items such as source or
object files. Thus the user only had to concentrate on the work item in
hand (the workspace) and all the relevant objects were fetched or saved for
them. I take a rather simple approach to users (including myself) in that
I do not trust them to do other than simple things (like I do not trust
them back up their PCs, I arrange to have the servers backed up for them)
and frankly as a user I only want to do simple things, that way I do not
make mistakes.
TC provided a means where I could fetch and save multiple related items
with a single request, which I think is important not only because it
corresponds to the unit that the user is using, but also as HTTP is
stateless the only unit of work it recognises is an individual request.
This was done by using a compound XML datastream, not unlike the
"multistatus" sets that Delta-V already has. Two compound objects,
"multistore" and "resultset" are available. These allow all the members of
a workspace (workarea in TC terms) to be fetched or saved as a single unit.
Such an approach not only work with the HTTP stateless approach, but also
matches the users perception of what they are doing. Additionally it
solves several "Unit of Work" problems in the case of problems.
The Unit of Work problem only really arrises when you are considering
bridge code like mine, it obviously does not arise if you have a user
client who can spot what is going on and take remedial action. Recovering
from a situation where some updates have occured and others have not is a
seriously non-trivial problem, and will involve lots of queries to the
server to find out the state of the system so that the difference can be
worked out and the relevant retries built. If the request is a single
unit, it either happens or it does not, and life is so much easier. This
of course assumes (as I believe I should) that the weak link in the chain
is the communications part, not the client or the server. Even if it is
the server, we have transaction commit/rollback on databases which might be
used as repositories, and even journaling file systems now and those
problems are easily containable.
Currently in the WebDAV/Delta-V protocols the GET and PUT/POST operations
are explicitly forbidden for collections, and it strikes me that adding
such functionality would solve this problem. For a collection a "partset"
compound XML stream would be required, and it would always work with the
whole of the list of parts in a workspace. This implies that if a part no
longer exists implicit deletions occur, and that new parts are created
dynamically using information in the datastream. This says that the
datastream would have to include enough information to allow objects to be
created with the proper attributes should the need arrise. I believe that
individual part creation may not be obvious to users (say they add a new
picture to an HTML document - it is an oddity of HTML compared to other
word processing formats that this is actually a separate file, not part of
the single document) and that such details must be masked from them.
Such a system could be implemented as a filter servlet in front of a
WebDAV/Delta-V server, but then you would loose some of the "Unit of Work"
benifits, as in the case of a DB based repository could not achieve backout
of the partially completed work, but this is only a problem in the case of
server failure, you still gain the communications resilience.
I realise that replication of WebDAV/Delta-V servers is not currently being
addressed, but I feel that it is needed and that a solution of this form
will be useful. In particular even if full replication is not attempted,
disconnected working is required. I work from home much of the time, and
the ability to check out the whole of a job of work in one go, and then to
replace it when I have finished would be extreemly useful - the whole world
is not (yet) permanently connected. This means that this kind of solution
is required even if integrated products such as VAJ are not being used.
As as aside, and I realise that it is complicating this note and bringing
in items from other threads, one of the other things TC used the resultset
object for queries. The one I use most frequently is (in logical terms) -
give me a list of all the workareas that are assigned to me and are open.
This gives me a resultset which describes each workarea, so that I can
present a list to the user and ask which they want fetched (saving is done
using information that I saved when I fetched the workarea). This is a
very simple user interface, and one which precisely answers the problem in
hand as that is what the user is concerned with.
In summary I believe that this is a real problem, and that connecting
products such as VAJ which has (currently) its own repository to a
WebDAV/Delta-V server will need to be done along with disconnected working.
This proposal would make such integration easier to use, eaiser to
implement and much more resilient.