RE: Versioning goals doc

Jim,

There are two discussions going on here and it will be useful to separate
those out. One is the notion of "identifying" changes and change sets. The
second is the mechanism for detecting changes or change sets. Content
specific diffs are useful for the second of the problems. In the worst case,
detection of changes and change sets can be implemented by creating newer
copies of the changed original and assigning a new identity to the change
(similar idea also applies to change sets).

Since I have not been personally involved in the face to face debates, it is
unclear to me as to whether this distinction has been discussed.
Additionally, it is also unclear to me the use scenarios that could be used
to rule one out or in.

For example, it seems that your objection is based on the fact that Web
servers are content blind and hence Web-DAV servers also ought to be. This
goal is addressed if you want to keep copies around (and assigning new
identities and having a principled way of dealing with the changes and
change sets based on these identities and other stuff) in the worst case.
This very similar to what a basic version control system would do (sort of
like a Web server that implements only a Get method).

However, as the Web deployment experience has shown, the deployed Web is not
really content blind. Vendors have developed content specific extensions on
the server and client side (e.g., Microsoft, Adobe, etc.). You can hope that
this won't happen, but it will. So, the question is whether you want to
provide a principled way to let this evolution happen. By separating the way
one deals with the identification of changes and change set and a way to
detect changes, we can achieve that goal.

Binary diffs are a rat hole that I don't want to get into. The broader
discussion to have is whether the objective is to provide preferred
treatment for basic data types that deal with 80% of uses (Unicode/ASCII and
binary does fit into this definition). It is a 80/20 discussion. However,
this is an orthogonal discussion to changes and change sets.

You are arguing for wide deployment and I am arguing for extensibility. We
can achieve both goals.



> -----Original Message-----
> From: w3c-dist-auth-request@w3.org
> [mailto:w3c-dist-auth-request@w3.org]On Behalf Of Jim Whitehead
> Sent: Wednesday, September 30, 1998 1:03 PM
> To: WEBDAV WG
> Subject: RE: Versioning goals doc
>
>
> > "David G. Durand" wrote:
> >
> > > The second is to point out that any object stored in a computer can be
> > > treated as a sequence of octets, and changes to those sequences are
> > > _always_ a possible, if sometimes far from optimal, way to represent
> > > changes.
>
> John Stracke replied:
> > Yes, you can come up with a format-blind diff syntax.  However,
> > I'm not convinced you could use such diffs for change-set versioning.
> > The problem is that, although you can generate a diff from A to B,
> > and then apply that diff to A to get B (or vice versa), you can't
> > necessarily compose those diffs  arbitrarily (which, as I understand
> > it, is the goal of change-set versioning, right?).
>
> This is exactly the problem with using a additive approach for Web
> versioning.
>
> While John provided an example of composition problems using GIF images,
> there are actually formats in use which have worse properties that GIFs.
> For example, if a data format includes any kind of a checksum, then
> arbitrarily combining diffs will require the application to know how to
> re-calculate the checksum, and where exactly to place the checksum in the
> object.  This requires content-type specific knowledge.  Or what if the
> resource includes a digital signature, and the signature must be kept
> correct in order to preserve the semantics of the resource?
>
> The content type versioning problem is also an issue.  If I perform a GET
> and receive back a resource, which version of that media type is it?  For
> many popular formats, there are several versions in wide use today.
>
> Requiring content-type specific knowledge may not seem like a big deal for
> well-known formats like GIF, where a description of the media
> type is widely
> available. But what about proprietary data formats that are widely used?
> There are several of these in use on the Web today.  If these formats
> require detailed knowledge to perform arbitrary additive diffs, then an
> implementor of a DAV versioning server will need to make
> agreements with the
> owners of these formats to get access to their descriptions. This would
> certainly have a chilling effect on the availability of public-domain DAV
> versioning servers.
>
> > Also, your points are about implementation approaches - not about
> protocol.
> > If you were bringing up this issue to show existence proof *does not*
> exist,
> > I disagree.
>
> No, my point was rather that the additive approach necessarily
> leads to the
> need for content-type specific knowledge for some content types (not all),
> and this is a poor design choice for the Web. To date, HTTP requires no
> knowledge whatsoever of the content-type of resources, beyond storing a
> string describing the content type (which does not have to match
> the actual
> content type of the resource for the protocol to function).  With additive
> versioning, HTTP+DAV will work better for some content types than others,
> and may actually be unable to provide versioning services for
> content types
> which require content-type specific knowledge, but which a
> particular server
> is unaware.
>
> Also, due to the rapid change in content types (Think back 15 years ago --
> which word processor were you using?  The same one as today?
> I'll bet not.
> troff and TeX users get a gold star :-), requiring any form of
> content-type
> specific knowledge will result in servers which are increasingly brittle
> over time.
>
> David Durand writes:
> > We should not be eliminating a model that is receiving growing attention
> > and implementation before we even start!
>
> Well, yes, let's go make a matrix of features found in versioning products
> and research systems, and then determine the market/mind share
> held by each.
> Every feature that gets over 50% market share goes into the
> protocol.  Sound
> good? :-)
>
> - Jim
>

Received on Thursday, 1 October 1998 10:09:24 UTC