RE: Units of Work

From: David.Goodenough@dga.co.uk
Date: Thu, Feb 24 2000

Next message: jamsden@us.ibm.com: "RE: Enumerating repositories and workspaces"

Previous message: Tim Ellison OTT: "RE: Enumerating repositories and worksp"
Maybe in reply to: David.Goodenough@dga.co.uk: "Units of Work"
Next in thread: Clemm, Geoff: "RE: Units of Work"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Other mail archives: [this mailing list] [other W3C mailing lists]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: David.Goodenough@dga.co.uk
To: ietf-dav-versioning@w3.org
Message-ID: <8025688F.006057E7.00@mail.dga.co.uk>
Date: Thu, 24 Feb 2000 17:44:37 +0000
Subject: RE: Units of Work

I am not sure that the MOVE versus COPY/DELETE problem is as bad as you
say.  With the TC system that I implemented (sorry its the only one I have
to go on) when I fetched the parts I also fetched their attributes.  This
included an internal object ID, and type information along with last update
times etc.  When I return the objects this information is returned with
them, or in the case of new objects where I do not have the information a
basic set is returned which indicates the type (text or binary in this case
by inspecting the data of the object) and requesting the server to allocate
it an ID.  Then its user name becomes irrelvant.

Now I appreciate that this is not required when handling individual object
Get/Put-Post operations, because there you can now have the side
information, and the user has to go through the indovidual operations.
However for bulk operations, or for versioning aware clients this is no
problem to handle.

In the case of VAJ I kept the information with each project (which
corresponded to a WorkArea in TC terms) in a space provided by VAJ for side
information.  In the case of a directory fetch/save system that I also
implemented so that I could use this same WorkArea based approach for
directories of files I created two additional files, one with binary
information that was this side information, and the other with the history
of the WorkArea - which in the TC case included the text of the
defect/feature request that had created the area and the history of who had
done what to this request and any notes that had been attached to it along
the way.

I am sorry about the workspace/activity misunderstanding - I suspect that
the documentation is going to need to a lot clearer about the user's view
of the system versus the client and server internal views.

David

"Clemm, Geoff" <gclemm@rational.com> on 24-02-2000 03:30:19 PM

To:   ietf-dav-versioning@w3.org
cc:    (bcc: David Goodenough/DGA/GB)
Subject:  RE: Units of Work

One of the problems with attempting a "bulk update" of a collection
is that you cannot distinguish a "MOVE" from a "COPY/DELETE".  A MOVE
simply gives a new name to a versioned resource, and changes made at
the new location are simply added to the history of that versioned
resource.  A COPY creates a new versioned resource with an empty history.

So I agree that the problem is real, but coming up with an interoperable
protocol to deal with the problem is hard.

Just as an aside, the user visible notion of a "unit of work" is captured
by the notion of an activity.  This is very different from the user
invisible
implementation issues that David is raising.

Cheers,
Geoff

-----Original Message-----
From: David.Goodenough@dga.co.uk [mailto:David.Goodenough@dga.co.uk]
Sent: Thursday, February 24, 2000 6:51 AM
To: ietf-dav-versioning@w3.org
Subject: Units of Work

I am new to this list, so please forgive me if I charge in with only a
cursory read of all that has gone before (there is a lot of it to read).

I come from what is probably a slightly odd background compared to most of
you, in that I am not from a vendor of document management or version
control software, and I have not been involved in efforts like RCS and CVS
or any of their free software decendants.  I do work in a software house,
but we are users of such code.  We recently hit the problem of integrating
two IBM products, VisualAge for Java(VAJ) and Team Connection(TC).  These
can talk to each other using the SCC API, but we are not a Windows shop (we
used to use OS/2 and are moving to Linux) and so that option was not open
to us.  TC had just introduced an XML interface to allow access into the
repository, and as VAJ allows me to build tools which access its repository
I built a bridge between the two environments.  This bridge is available
(for free) from our web site should anyone be interested.  It is my
intention to build a new bridge which will connect VAJ to Delta-V/WebDAV.

Coming at the problem afresh, and to a degree influenced by what the TC and
VAJ interfaces allowed, I constructed a rather different bridge to those
that seemed to exist before, in that they concentrate on what are in
Delta-V terms Workspaces, rather than individual items such as source or
object files.  Thus the user only had to concentrate on the work item in
hand (the workspace) and all the relevant objects were fetched or saved for
them.  I take a rather simple approach to users (including myself) in that
I do not trust them to do other than simple things (like I do not trust
them back up their PCs, I arrange to have the servers backed up for them)
and frankly as a user I only want to do simple things, that way I do not
make mistakes.

TC provided a means where I could fetch and save multiple related items
with a single request, which I think is important not only because it
corresponds to the unit that the user is using, but also as HTTP is
stateless the only unit of work it recognises is an individual request.
This was done by using a compound XML datastream, not unlike the
"multistatus" sets that Delta-V already has.  Two compound objects,
"multistore" and "resultset" are available.  These allow all the members of
a workspace (workarea in TC terms) to be fetched or saved as a single unit.

Such an approach not only work with the HTTP stateless approach, but also
matches the users perception of what they are doing.  Additionally it
solves several "Unit of Work" problems in the case of problems.

The Unit of Work problem only really arrises when you are considering
bridge code like mine, it obviously does not arise if you have a user
client who can spot what is going on and take remedial action.  Recovering
from a situation where some updates have occured and others have not is a
seriously non-trivial problem, and will involve lots of queries to the
server to find out the state of the system so that the difference can be
worked out and the relevant retries built.  If the request is a single
unit, it either happens or it does not, and life is so much easier.  This
of course assumes (as I believe I should) that the weak link in the chain
is the communications part, not the client or the server.  Even if it is
the server, we have transaction commit/rollback on databases which might be
used as repositories, and even journaling file systems now and those
problems are easily containable.

Currently in the WebDAV/Delta-V protocols the GET and PUT/POST operations
are explicitly forbidden for collections, and it strikes me that adding
such functionality would solve this problem.  For a collection a "partset"
compound XML stream would be required, and it would always work with  the
whole of the list of parts in a workspace.  This implies that if a part no
longer exists implicit deletions occur, and that new parts are created
dynamically using information in the datastream.  This says that the
datastream would have to include enough information to allow objects to be
created with the proper attributes should the need arrise.  I believe that
individual part creation may not be obvious to users (say they add a new
picture to an HTML document - it is an oddity of HTML compared to other
word processing formats that this is actually a separate file, not part of
the single document) and that such details must be masked from them.

Such a system could be implemented as a filter servlet in front of a
WebDAV/Delta-V server, but then you would loose some of the "Unit of Work"
benifits, as in the case of a DB based repository could not achieve backout
of the partially completed work, but this is only a problem in the case of
server failure, you still gain the communications resilience.

I realise that replication of WebDAV/Delta-V servers is not currently being
addressed, but I feel that it is needed and that a solution of this form
will be useful.  In particular even if full replication is not attempted,
disconnected working is required.  I work from home much of the time, and
the ability to check out the whole of a job of work in one go, and then to
replace it when I have finished would be extreemly useful - the whole world
is not (yet) permanently connected.  This means that this kind of solution
is required even if integrated products such as VAJ are not being used.

As as aside, and I realise that it is complicating this note and bringing
in items from other threads, one of the other things TC used the resultset
object for queries.  The one I use most frequently is (in logical terms) -
give me a list of all the workareas that are assigned to me and are open.
This gives me a resultset which describes each workarea, so that I can
present a list to the user and ask which they want fetched (saving is done
using information that I saved when I fetched the workarea).  This is a
very simple user interface, and one which precisely answers the problem in
hand as that is what the user is concerned with.

In summary I believe that this is a real problem, and that connecting
products such as VAJ which has (currently) its own repository to a
WebDAV/Delta-V server will need to be done along with disconnected working.
This proposal would make such integration easier to use, eaiser to
implement and much more resilient.

Next message: jamsden@us.ibm.com: "RE: Enumerating repositories and workspaces"
Previous message: Tim Ellison OTT: "RE: Enumerating repositories and worksp"
Maybe in reply to: David.Goodenough@dga.co.uk: "Units of Work"
Next in thread: Clemm, Geoff: "RE: Units of Work"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Other mail archives: [this mailing list] [other W3C mailing lists]
Mail actions: [ respond to this message ] [ mail a new topic ]