From: David.Goodenough@dga.co.uk To: ietf-dav-versioning@w3.org Message-ID: <8025688F.006057E7.00@mail.dga.co.uk> Date: Thu, 24 Feb 2000 17:44:37 +0000 Subject: RE: Units of Work I am not sure that the MOVE versus COPY/DELETE problem is as bad as you say. With the TC system that I implemented (sorry its the only one I have to go on) when I fetched the parts I also fetched their attributes. This included an internal object ID, and type information along with last update times etc. When I return the objects this information is returned with them, or in the case of new objects where I do not have the information a basic set is returned which indicates the type (text or binary in this case by inspecting the data of the object) and requesting the server to allocate it an ID. Then its user name becomes irrelvant. Now I appreciate that this is not required when handling individual object Get/Put-Post operations, because there you can now have the side information, and the user has to go through the indovidual operations. However for bulk operations, or for versioning aware clients this is no problem to handle. In the case of VAJ I kept the information with each project (which corresponded to a WorkArea in TC terms) in a space provided by VAJ for side information. In the case of a directory fetch/save system that I also implemented so that I could use this same WorkArea based approach for directories of files I created two additional files, one with binary information that was this side information, and the other with the history of the WorkArea - which in the TC case included the text of the defect/feature request that had created the area and the history of who had done what to this request and any notes that had been attached to it along the way. I am sorry about the workspace/activity misunderstanding - I suspect that the documentation is going to need to a lot clearer about the user's view of the system versus the client and server internal views. David "Clemm, Geoff" <gclemm@rational.com> on 24-02-2000 03:30:19 PM To: ietf-dav-versioning@w3.org cc: (bcc: David Goodenough/DGA/GB) Subject: RE: Units of Work One of the problems with attempting a "bulk update" of a collection is that you cannot distinguish a "MOVE" from a "COPY/DELETE". A MOVE simply gives a new name to a versioned resource, and changes made at the new location are simply added to the history of that versioned resource. A COPY creates a new versioned resource with an empty history. So I agree that the problem is real, but coming up with an interoperable protocol to deal with the problem is hard. Just as an aside, the user visible notion of a "unit of work" is captured by the notion of an activity. This is very different from the user invisible implementation issues that David is raising. Cheers, Geoff -----Original Message----- From: David.Goodenough@dga.co.uk [mailto:David.Goodenough@dga.co.uk] Sent: Thursday, February 24, 2000 6:51 AM To: ietf-dav-versioning@w3.org Subject: Units of Work I am new to this list, so please forgive me if I charge in with only a cursory read of all that has gone before (there is a lot of it to read). I come from what is probably a slightly odd background compared to most of you, in that I am not from a vendor of document management or version control software, and I have not been involved in efforts like RCS and CVS or any of their free software decendants. I do work in a software house, but we are users of such code. We recently hit the problem of integrating two IBM products, VisualAge for Java(VAJ) and Team Connection(TC). These can talk to each other using the SCC API, but we are not a Windows shop (we used to use OS/2 and are moving to Linux) and so that option was not open to us. TC had just introduced an XML interface to allow access into the repository, and as VAJ allows me to build tools which access its repository I built a bridge between the two environments. This bridge is available (for free) from our web site should anyone be interested. It is my intention to build a new bridge which will connect VAJ to Delta-V/WebDAV. Coming at the problem afresh, and to a degree influenced by what the TC and VAJ interfaces allowed, I constructed a rather different bridge to those that seemed to exist before, in that they concentrate on what are in Delta-V terms Workspaces, rather than individual items such as source or object files. Thus the user only had to concentrate on the work item in hand (the workspace) and all the relevant objects were fetched or saved for them. I take a rather simple approach to users (including myself) in that I do not trust them to do other than simple things (like I do not trust them back up their PCs, I arrange to have the servers backed up for them) and frankly as a user I only want to do simple things, that way I do not make mistakes. TC provided a means where I could fetch and save multiple related items with a single request, which I think is important not only because it corresponds to the unit that the user is using, but also as HTTP is stateless the only unit of work it recognises is an individual request. This was done by using a compound XML datastream, not unlike the "multistatus" sets that Delta-V already has. Two compound objects, "multistore" and "resultset" are available. These allow all the members of a workspace (workarea in TC terms) to be fetched or saved as a single unit. Such an approach not only work with the HTTP stateless approach, but also matches the users perception of what they are doing. Additionally it solves several "Unit of Work" problems in the case of problems. The Unit of Work problem only really arrises when you are considering bridge code like mine, it obviously does not arise if you have a user client who can spot what is going on and take remedial action. Recovering from a situation where some updates have occured and others have not is a seriously non-trivial problem, and will involve lots of queries to the server to find out the state of the system so that the difference can be worked out and the relevant retries built. If the request is a single unit, it either happens or it does not, and life is so much easier. This of course assumes (as I believe I should) that the weak link in the chain is the communications part, not the client or the server. Even if it is the server, we have transaction commit/rollback on databases which might be used as repositories, and even journaling file systems now and those problems are easily containable. Currently in the WebDAV/Delta-V protocols the GET and PUT/POST operations are explicitly forbidden for collections, and it strikes me that adding such functionality would solve this problem. For a collection a "partset" compound XML stream would be required, and it would always work with the whole of the list of parts in a workspace. This implies that if a part no longer exists implicit deletions occur, and that new parts are created dynamically using information in the datastream. This says that the datastream would have to include enough information to allow objects to be created with the proper attributes should the need arrise. I believe that individual part creation may not be obvious to users (say they add a new picture to an HTML document - it is an oddity of HTML compared to other word processing formats that this is actually a separate file, not part of the single document) and that such details must be masked from them. Such a system could be implemented as a filter servlet in front of a WebDAV/Delta-V server, but then you would loose some of the "Unit of Work" benifits, as in the case of a DB based repository could not achieve backout of the partially completed work, but this is only a problem in the case of server failure, you still gain the communications resilience. I realise that replication of WebDAV/Delta-V servers is not currently being addressed, but I feel that it is needed and that a solution of this form will be useful. In particular even if full replication is not attempted, disconnected working is required. I work from home much of the time, and the ability to check out the whole of a job of work in one go, and then to replace it when I have finished would be extreemly useful - the whole world is not (yet) permanently connected. This means that this kind of solution is required even if integrated products such as VAJ are not being used. As as aside, and I realise that it is complicating this note and bringing in items from other threads, one of the other things TC used the resultset object for queries. The one I use most frequently is (in logical terms) - give me a list of all the workareas that are assigned to me and are open. This gives me a resultset which describes each workarea, so that I can present a list to the user and ask which they want fetched (saving is done using information that I saved when I fetched the workarea). This is a very simple user interface, and one which precisely answers the problem in hand as that is what the user is concerned with. In summary I believe that this is a real problem, and that connecting products such as VAJ which has (currently) its own repository to a WebDAV/Delta-V server will need to be done along with disconnected working. This proposal would make such integration easier to use, eaiser to implement and much more resilient.