RE: Versioning goals doc from Jim Whitehead on 1998-09-30 (w3c-dist-auth@w3.org from July to September 1998)

From: Jim Whitehead <ejw@ics.uci.edu>
Date: Wed, 30 Sep 1998 11:03:00 -0700
To: WEBDAV WG <w3c-dist-auth@w3.org>
Message-ID: <002401bdec9c$8fb1f620$d115c380@galileo.ics.uci.edu>
> "David G. Durand" wrote:
>
> > The second is to point out that any object stored in a computer can be
> > treated as a sequence of octets, and changes to those sequences are
> > _always_ a possible, if sometimes far from optimal, way to represent
> > changes.

John Stracke replied:
> Yes, you can come up with a format-blind diff syntax.  However,
> I'm not convinced you could use such diffs for change-set versioning.
> The problem is that, although you can generate a diff from A to B,
> and then apply that diff to A to get B (or vice versa), you can't
> necessarily compose those diffs  arbitrarily (which, as I understand
> it, is the goal of change-set versioning, right?).

This is exactly the problem with using a additive approach for Web
versioning.

While John provided an example of composition problems using GIF images,
there are actually formats in use which have worse properties that GIFs.
For example, if a data format includes any kind of a checksum, then
arbitrarily combining diffs will require the application to know how to
re-calculate the checksum, and where exactly to place the checksum in the
object.  This requires content-type specific knowledge.  Or what if the
resource includes a digital signature, and the signature must be kept
correct in order to preserve the semantics of the resource?

The content type versioning problem is also an issue.  If I perform a GET
and receive back a resource, which version of that media type is it?  For
many popular formats, there are several versions in wide use today.

Requiring content-type specific knowledge may not seem like a big deal for
well-known formats like GIF, where a description of the media type is widely
available. But what about proprietary data formats that are widely used?
There are several of these in use on the Web today.  If these formats
require detailed knowledge to perform arbitrary additive diffs, then an
implementor of a DAV versioning server will need to make agreements with the
owners of these formats to get access to their descriptions. This would
certainly have a chilling effect on the availability of public-domain DAV
versioning servers.

> Also, your points are about implementation approaches - not about
protocol.
> If you were bringing up this issue to show existence proof *does not*
exist,
> I disagree.

No, my point was rather that the additive approach necessarily leads to the
need for content-type specific knowledge for some content types (not all),
and this is a poor design choice for the Web. To date, HTTP requires no
knowledge whatsoever of the content-type of resources, beyond storing a
string describing the content type (which does not have to match the actual
content type of the resource for the protocol to function).  With additive
versioning, HTTP+DAV will work better for some content types than others,
and may actually be unable to provide versioning services for content types
which require content-type specific knowledge, but which a particular server
is unaware.

Also, due to the rapid change in content types (Think back 15 years ago --
which word processor were you using?  The same one as today?  I'll bet not.
troff and TeX users get a gold star :-), requiring any form of content-type
specific knowledge will result in servers which are increasingly brittle
over time.

David Durand writes:
> We should not be eliminating a model that is receiving growing attention
> and implementation before we even start!

Well, yes, let's go make a matrix of features found in versioning products
and research systems, and then determine the market/mind share held by each.
Every feature that gets over 50% market share goes into the protocol.  Sound
good? :-)

- Jim
Received on Wednesday, 30 September 1998 14:49:34 UTC