RE: Units of Work

From: David.Goodenough@dga.co.uk
Date: Thu, Feb 24 2000

  • Next message: jamsden@us.ibm.com: "RE: Enumerating repositories and workspaces"

    From: David.Goodenough@dga.co.uk
    To: ietf-dav-versioning@w3.org
    Message-ID: <8025688F.006057E7.00@mail.dga.co.uk>
    Date: Thu, 24 Feb 2000 17:44:37 +0000
    Subject: RE: Units of Work
    
    I am not sure that the MOVE versus COPY/DELETE problem is as bad as you
    say.  With the TC system that I implemented (sorry its the only one I have
    to go on) when I fetched the parts I also fetched their attributes.  This
    included an internal object ID, and type information along with last update
    times etc.  When I return the objects this information is returned with
    them, or in the case of new objects where I do not have the information a
    basic set is returned which indicates the type (text or binary in this case
    by inspecting the data of the object) and requesting the server to allocate
    it an ID.  Then its user name becomes irrelvant.
    
    Now I appreciate that this is not required when handling individual object
    Get/Put-Post operations, because there you can now have the side
    information, and the user has to go through the indovidual operations.
    However for bulk operations, or for versioning aware clients this is no
    problem to handle.
    
    In the case of VAJ I kept the information with each project (which
    corresponded to a WorkArea in TC terms) in a space provided by VAJ for side
    information.  In the case of a directory fetch/save system that I also
    implemented so that I could use this same WorkArea based approach for
    directories of files I created two additional files, one with binary
    information that was this side information, and the other with the history
    of the WorkArea - which in the TC case included the text of the
    defect/feature request that had created the area and the history of who had
    done what to this request and any notes that had been attached to it along
    the way.
    
    I am sorry about the workspace/activity misunderstanding - I suspect that
    the documentation is going to need to a lot clearer about the user's view
    of the system versus the client and server internal views.
    
    David
    
    
    
    
    
    "Clemm, Geoff" <gclemm@rational.com> on 24-02-2000 03:30:19 PM
    
    To:   ietf-dav-versioning@w3.org
    cc:    (bcc: David Goodenough/DGA/GB)
    Subject:  RE: Units of Work
    
    
    
    
    One of the problems with attempting a "bulk update" of a collection
    is that you cannot distinguish a "MOVE" from a "COPY/DELETE".  A MOVE
    simply gives a new name to a versioned resource, and changes made at
    the new location are simply added to the history of that versioned
    resource.  A COPY creates a new versioned resource with an empty history.
    
    So I agree that the problem is real, but coming up with an interoperable
    protocol to deal with the problem is hard.
    
    Just as an aside, the user visible notion of a "unit of work" is captured
    by the notion of an activity.  This is very different from the user
    invisible
    implementation issues that David is raising.
    
    Cheers,
    Geoff
    
    -----Original Message-----
    From: David.Goodenough@dga.co.uk [mailto:David.Goodenough@dga.co.uk]
    Sent: Thursday, February 24, 2000 6:51 AM
    To: ietf-dav-versioning@w3.org
    Subject: Units of Work
    
    
    I am new to this list, so please forgive me if I charge in with only a
    cursory read of all that has gone before (there is a lot of it to read).
    
    I come from what is probably a slightly odd background compared to most of
    you, in that I am not from a vendor of document management or version
    control software, and I have not been involved in efforts like RCS and CVS
    or any of their free software decendants.  I do work in a software house,
    but we are users of such code.  We recently hit the problem of integrating
    two IBM products, VisualAge for Java(VAJ) and Team Connection(TC).  These
    can talk to each other using the SCC API, but we are not a Windows shop (we
    used to use OS/2 and are moving to Linux) and so that option was not open
    to us.  TC had just introduced an XML interface to allow access into the
    repository, and as VAJ allows me to build tools which access its repository
    I built a bridge between the two environments.  This bridge is available
    (for free) from our web site should anyone be interested.  It is my
    intention to build a new bridge which will connect VAJ to Delta-V/WebDAV.
    
    Coming at the problem afresh, and to a degree influenced by what the TC and
    VAJ interfaces allowed, I constructed a rather different bridge to those
    that seemed to exist before, in that they concentrate on what are in
    Delta-V terms Workspaces, rather than individual items such as source or
    object files.  Thus the user only had to concentrate on the work item in
    hand (the workspace) and all the relevant objects were fetched or saved for
    them.  I take a rather simple approach to users (including myself) in that
    I do not trust them to do other than simple things (like I do not trust
    them back up their PCs, I arrange to have the servers backed up for them)
    and frankly as a user I only want to do simple things, that way I do not
    make mistakes.
    
    TC provided a means where I could fetch and save multiple related items
    with a single request, which I think is important not only because it
    corresponds to the unit that the user is using, but also as HTTP is
    stateless the only unit of work it recognises is an individual request.
    This was done by using a compound XML datastream, not unlike the
    "multistatus" sets that Delta-V already has.  Two compound objects,
    "multistore" and "resultset" are available.  These allow all the members of
    a workspace (workarea in TC terms) to be fetched or saved as a single unit.
    
    Such an approach not only work with the HTTP stateless approach, but also
    matches the users perception of what they are doing.  Additionally it
    solves several "Unit of Work" problems in the case of problems.
    
    The Unit of Work problem only really arrises when you are considering
    bridge code like mine, it obviously does not arise if you have a user
    client who can spot what is going on and take remedial action.  Recovering
    from a situation where some updates have occured and others have not is a
    seriously non-trivial problem, and will involve lots of queries to the
    server to find out the state of the system so that the difference can be
    worked out and the relevant retries built.  If the request is a single
    unit, it either happens or it does not, and life is so much easier.  This
    of course assumes (as I believe I should) that the weak link in the chain
    is the communications part, not the client or the server.  Even if it is
    the server, we have transaction commit/rollback on databases which might be
    used as repositories, and even journaling file systems now and those
    problems are easily containable.
    
    Currently in the WebDAV/Delta-V protocols the GET and PUT/POST operations
    are explicitly forbidden for collections, and it strikes me that adding
    such functionality would solve this problem.  For a collection a "partset"
    compound XML stream would be required, and it would always work with  the
    whole of the list of parts in a workspace.  This implies that if a part no
    longer exists implicit deletions occur, and that new parts are created
    dynamically using information in the datastream.  This says that the
    datastream would have to include enough information to allow objects to be
    created with the proper attributes should the need arrise.  I believe that
    individual part creation may not be obvious to users (say they add a new
    picture to an HTML document - it is an oddity of HTML compared to other
    word processing formats that this is actually a separate file, not part of
    the single document) and that such details must be masked from them.
    
    Such a system could be implemented as a filter servlet in front of a
    WebDAV/Delta-V server, but then you would loose some of the "Unit of Work"
    benifits, as in the case of a DB based repository could not achieve backout
    of the partially completed work, but this is only a problem in the case of
    server failure, you still gain the communications resilience.
    
    I realise that replication of WebDAV/Delta-V servers is not currently being
    addressed, but I feel that it is needed and that a solution of this form
    will be useful.  In particular even if full replication is not attempted,
    disconnected working is required.  I work from home much of the time, and
    the ability to check out the whole of a job of work in one go, and then to
    replace it when I have finished would be extreemly useful - the whole world
    is not (yet) permanently connected.  This means that this kind of solution
    is required even if integrated products such as VAJ are not being used.
    
    As as aside, and I realise that it is complicating this note and bringing
    in items from other threads, one of the other things TC used the resultset
    object for queries.  The one I use most frequently is (in logical terms) -
    give me a list of all the workareas that are assigned to me and are open.
    This gives me a resultset which describes each workarea, so that I can
    present a list to the user and ask which they want fetched (saving is done
    using information that I saved when I fetched the workarea).  This is a
    very simple user interface, and one which precisely answers the problem in
    hand as that is what the user is concerned with.
    
    In summary I believe that this is a real problem, and that connecting
    products such as VAJ which has (currently) its own repository to a
    WebDAV/Delta-V server will need to be done along with disconnected working.
    This proposal would make such integration easier to use, eaiser to
    implement and much more resilient.