RE: Units of Work

From: Clemm, Geoff (gclemm@Rational.Com)
Date: Thu, Feb 24 2000

  • Next message: Clemm, Geoff: "RE: Enumerating repositories and workspaces"

    Message-ID: <65B141FB11CCD211825700A0C9D609BC0205B014@chef.lex.rational.com>
    From: "Clemm, Geoff" <gclemm@Rational.Com>
    To: ietf-dav-versioning@w3.org
    Date: Thu, 24 Feb 2000 21:57:14 -0500
    Subject: RE: Units of Work
    
    I agree that the inclusion of resource-id's with the
    bulk update would address the issue of MOVE vs. COPY/DELETE.
    If you could write up what extensions you would like to
    see to the protocol (i.e. methods, properties, headers, etc.)
    to support what you have in mind, that would be great.
    
    Cheers,
    Geoff
    
    > -----Original Message-----
    > From: David.Goodenough@dga.co.uk [mailto:David.Goodenough@dga.co.uk]
    > Sent: Thursday, February 24, 2000 12:45 PM
    > To: ietf-dav-versioning@w3.org
    > Subject: RE: Units of Work
    > 
    > 
    > I am not sure that the MOVE versus COPY/DELETE problem is as 
    > bad as you
    > say.  With the TC system that I implemented (sorry its the 
    > only one I have
    > to go on) when I fetched the parts I also fetched their 
    > attributes.  This
    > included an internal object ID, and type information along 
    > with last update
    > times etc.  When I return the objects this information is 
    > returned with
    > them, or in the case of new objects where I do not have the 
    > information a
    > basic set is returned which indicates the type (text or 
    > binary in this case
    > by inspecting the data of the object) and requesting the 
    > server to allocate
    > it an ID.  Then its user name becomes irrelvant.
    > 
    > Now I appreciate that this is not required when handling 
    > individual object
    > Get/Put-Post operations, because there you can now have the side
    > information, and the user has to go through the indovidual operations.
    > However for bulk operations, or for versioning aware clients 
    > this is no
    > problem to handle.
    > 
    > In the case of VAJ I kept the information with each project (which
    > corresponded to a WorkArea in TC terms) in a space provided 
    > by VAJ for side
    > information.  In the case of a directory fetch/save system that I also
    > implemented so that I could use this same WorkArea based approach for
    > directories of files I created two additional files, one with binary
    > information that was this side information, and the other 
    > with the history
    > of the WorkArea - which in the TC case included the text of the
    > defect/feature request that had created the area and the 
    > history of who had
    > done what to this request and any notes that had been 
    > attached to it along
    > the way.
    > 
    > I am sorry about the workspace/activity misunderstanding - I 
    > suspect that
    > the documentation is going to need to a lot clearer about the 
    > user's view
    > of the system versus the client and server internal views.
    > 
    > David
    > 
    > 
    > 
    > 
    > 
    > "Clemm, Geoff" <gclemm@rational.com> on 24-02-2000 03:30:19 PM
    > 
    > To:   ietf-dav-versioning@w3.org
    > cc:    (bcc: David Goodenough/DGA/GB)
    > Subject:  RE: Units of Work
    > 
    > 
    > 
    > 
    > One of the problems with attempting a "bulk update" of a collection
    > is that you cannot distinguish a "MOVE" from a "COPY/DELETE".  A MOVE
    > simply gives a new name to a versioned resource, and changes made at
    > the new location are simply added to the history of that versioned
    > resource.  A COPY creates a new versioned resource with an 
    > empty history.
    > 
    > So I agree that the problem is real, but coming up with an 
    > interoperable
    > protocol to deal with the problem is hard.
    > 
    > Just as an aside, the user visible notion of a "unit of work" 
    > is captured
    > by the notion of an activity.  This is very different from the user
    > invisible
    > implementation issues that David is raising.
    > 
    > Cheers,
    > Geoff
    > 
    > -----Original Message-----
    > From: David.Goodenough@dga.co.uk [mailto:David.Goodenough@dga.co.uk]
    > Sent: Thursday, February 24, 2000 6:51 AM
    > To: ietf-dav-versioning@w3.org
    > Subject: Units of Work
    > 
    > 
    > I am new to this list, so please forgive me if I charge in with only a
    > cursory read of all that has gone before (there is a lot of 
    > it to read).
    > 
    > I come from what is probably a slightly odd background 
    > compared to most of
    > you, in that I am not from a vendor of document management or version
    > control software, and I have not been involved in efforts 
    > like RCS and CVS
    > or any of their free software decendants.  I do work in a 
    > software house,
    > but we are users of such code.  We recently hit the problem 
    > of integrating
    > two IBM products, VisualAge for Java(VAJ) and Team 
    > Connection(TC).  These
    > can talk to each other using the SCC API, but we are not a 
    > Windows shop (we
    > used to use OS/2 and are moving to Linux) and so that option 
    > was not open
    > to us.  TC had just introduced an XML interface to allow 
    > access into the
    > repository, and as VAJ allows me to build tools which access 
    > its repository
    > I built a bridge between the two environments.  This bridge 
    > is available
    > (for free) from our web site should anyone be interested.  It is my
    > intention to build a new bridge which will connect VAJ to 
    > Delta-V/WebDAV.
    > 
    > Coming at the problem afresh, and to a degree influenced by 
    > what the TC and
    > VAJ interfaces allowed, I constructed a rather different 
    > bridge to those
    > that seemed to exist before, in that they concentrate on what are in
    > Delta-V terms Workspaces, rather than individual items such 
    > as source or
    > object files.  Thus the user only had to concentrate on the 
    > work item in
    > hand (the workspace) and all the relevant objects were 
    > fetched or saved for
    > them.  I take a rather simple approach to users (including 
    > myself) in that
    > I do not trust them to do other than simple things (like I do 
    > not trust
    > them back up their PCs, I arrange to have the servers backed 
    > up for them)
    > and frankly as a user I only want to do simple things, that 
    > way I do not
    > make mistakes.
    > 
    > TC provided a means where I could fetch and save multiple 
    > related items
    > with a single request, which I think is important not only because it
    > corresponds to the unit that the user is using, but also as HTTP is
    > stateless the only unit of work it recognises is an 
    > individual request.
    > This was done by using a compound XML datastream, not unlike the
    > "multistatus" sets that Delta-V already has.  Two compound objects,
    > "multistore" and "resultset" are available.  These allow all 
    > the members of
    > a workspace (workarea in TC terms) to be fetched or saved as 
    > a single unit.
    > 
    > Such an approach not only work with the HTTP stateless 
    > approach, but also
    > matches the users perception of what they are doing.  Additionally it
    > solves several "Unit of Work" problems in the case of problems.
    > 
    > The Unit of Work problem only really arrises when you are considering
    > bridge code like mine, it obviously does not arise if you have a user
    > client who can spot what is going on and take remedial 
    > action.  Recovering
    > from a situation where some updates have occured and others 
    > have not is a
    > seriously non-trivial problem, and will involve lots of queries to the
    > server to find out the state of the system so that the 
    > difference can be
    > worked out and the relevant retries built.  If the request is a single
    > unit, it either happens or it does not, and life is so much 
    > easier.  This
    > of course assumes (as I believe I should) that the weak link 
    > in the chain
    > is the communications part, not the client or the server.  
    > Even if it is
    > the server, we have transaction commit/rollback on databases 
    > which might be
    > used as repositories, and even journaling file systems now and those
    > problems are easily containable.
    > 
    > Currently in the WebDAV/Delta-V protocols the GET and 
    > PUT/POST operations
    > are explicitly forbidden for collections, and it strikes me 
    > that adding
    > such functionality would solve this problem.  For a 
    > collection a "partset"
    > compound XML stream would be required, and it would always 
    > work with  the
    > whole of the list of parts in a workspace.  This implies that 
    > if a part no
    > longer exists implicit deletions occur, and that new parts are created
    > dynamically using information in the datastream.  This says that the
    > datastream would have to include enough information to allow 
    > objects to be
    > created with the proper attributes should the need arrise.  I 
    > believe that
    > individual part creation may not be obvious to users (say 
    > they add a new
    > picture to an HTML document - it is an oddity of HTML 
    > compared to other
    > word processing formats that this is actually a separate 
    > file, not part of
    > the single document) and that such details must be masked from them.
    > 
    > Such a system could be implemented as a filter servlet in front of a
    > WebDAV/Delta-V server, but then you would loose some of the 
    > "Unit of Work"
    > benifits, as in the case of a DB based repository could not 
    > achieve backout
    > of the partially completed work, but this is only a problem 
    > in the case of
    > server failure, you still gain the communications resilience.
    > 
    > I realise that replication of WebDAV/Delta-V servers is not 
    > currently being
    > addressed, but I feel that it is needed and that a solution 
    > of this form
    > will be useful.  In particular even if full replication is 
    > not attempted,
    > disconnected working is required.  I work from home much of 
    > the time, and
    > the ability to check out the whole of a job of work in one 
    > go, and then to
    > replace it when I have finished would be extreemly useful - 
    > the whole world
    > is not (yet) permanently connected.  This means that this 
    > kind of solution
    > is required even if integrated products such as VAJ are not 
    > being used.
    > 
    > As as aside, and I realise that it is complicating this note 
    > and bringing
    > in items from other threads, one of the other things TC used 
    > the resultset
    > object for queries.  The one I use most frequently is (in 
    > logical terms) -
    > give me a list of all the workareas that are assigned to me 
    > and are open.
    > This gives me a resultset which describes each workarea, so that I can
    > present a list to the user and ask which they want fetched 
    > (saving is done
    > using information that I saved when I fetched the workarea).  
    > This is a
    > very simple user interface, and one which precisely answers 
    > the problem in
    > hand as that is what the user is concerned with.
    > 
    > In summary I believe that this is a real problem, and that connecting
    > products such as VAJ which has (currently) its own repository to a
    > WebDAV/Delta-V server will need to be done along with 
    > disconnected working.
    > This proposal would make such integration easier to use, eaiser to
    > implement and much more resilient.
    > 
    > 
    > 
    > 
    >