RE: Re: Versioning, responses to comments on our document from David G. Durand on 1998-06-01 (w3c-dist-auth@w3.org from April to June 1998)

From: David G. Durand <david@dynamicdiagrams.com>
Date: Mon, 1 Jun 1998 16:31:56 -0700
To: WEBDAV WG <w3c-dist-auth@w3.org>
Message-Id: <199806012331.QAA10114@dynamicdiagrams.com>
>From: Marcus Jager <mjager@microsoft.com>
	
>Hmm,
	
>All of these cases must be handled by the versioning system
>already.

Well, no, some of them may not be _best_ handled by the versioning
system.

>It must
>be able to refer to versions that are unavailable for various reasons.

Sure... I'm with you so far. Availability is not the same as
referenceability.

>The first and second examples are simply instances of version 1 that has
>branched into 1a and 1b, and version 1 has been deleted. But both branches
>are still available.

The translations may have separate revision trees. They may also have
variants by format that are automatically generated on request by the
server (this information should _not_ be part of the version tree). On
the other hand, some variants may be hand-derived from an original
(the relation between your leaf and root versions) this may start
separate revision histories, that diverge from the original document's
history, while occasionally synching up with it. For instance, when
the second edition of the German original is issued, a new set of
translated versions must be created (and may be separately revised
further for accuracy).

The point isn't that the DAV standard must define a system that
implements any particular policies for such situations, but that it
should enable interoperability between arbitrary clients and arbitrary
servers at the most appropriate common level.

One key to doing this is to isolate and modularize concerns as much as
possible. If we define the versioning stuff so that it works for
providing editorial support for change tracking, that's enough. Most
software engineering systems nowadays try to separate these concerns
as well, because it becomes to difficult to manage variants of a
complex system as if they were simply items with a peculiar historical
relationship. It's not helpful to have rev 1.3, and successors of
display.c be the Windows version, and rev 1.6 and successors be the
Motif version of display.c. While I think that document authorship and
software engineering make radically different demands on a system,
this is a case where the experience transfers well.

The analogy in this case is pretty close: alternative delivery
formats are like object code, in that they lack their own private
revision trees, and their content is not historically related to the
content of the source file (original document). They are the output of
an automatic process applied to the editorial object of interest
(source code, or an original document). Version relationships are
quite similar, except that documents often require rather different
sorts of change tracking since document formats are rather different
from source code. Variant documents, in both cases, are often
editorially derived from some version of an original, but once they
are so derived, the variant usually takes on a life of its own, and
acquires its own separate revision history.

When this list first started 2 years ago, we referred to the
configuration problem. This was a way to separate the issues of
managing the correlations between different versions of individual
resources into complete sets (configurations) representing a desired
global state like a proofread web site, or a working system
build. This separation also makes sense in software configuration
management, and also simplifies the problems by separating concerns.

As the translation examples we've been tossing around show, the total
problem of change in sets of related resources is terribly complex. We
can simplify it by making some orthogonal distinctions:

   1. Variants represent human-derived, related but separate
documents.

   2. We don't have a term for variant data formats, (perhaps
data-variants is clear enough). These, like object files in SCM, often
don't need to be tracked separately, as they can be derived from the
"base" or "source" files.

   3. Versions are historically related files that are under revision
(by human beings), and for which the system may provide support for
operations like registering new versions, applying a list of changes
to a version to create a new version, merging versions, providings
lists of differences between versions, etc.

   4. Configurations (or configuration management functions) support sets of
versions of resources that satisfy user or system requirements for
consistency (these may range from provable correctness, to
satisfaction of workflow requirements). This typically includes the
specification of cross-resource version relationships, like the fact
that version 1.3.4 of the Englsih translation corresponds to rev. 2.0
of the German original, and such like.

>I guess we need to split the question into two parts that are dealt with
>separately,

(more parts than that)

>(a) Should the relationship between variants of a document be expressed as
>parts of the version graph?

>I believe that (a) is an easy yes, because the relationships are not any
>different. 

Not be be reflexively contradictory, but this is an easy no. One way
to see that versioning is different from variants, is to see that (as
I listed in my last note) there are a lot of specific operations (like
difference tracking, and incremental update) that make sense for
historically related documents, and do not make any sense for
arbitrary variants, such as data format variants.

Variants don't sensibly accomodate the kinds of operations that
reasonable version handling demands. It's not as if it's sensible to
ask for the diffs that transform a tex file into the coresponding .ps
file because they are not editorially related.

Thus, however we represent versioning-related derived-from
relationships, WebDAV should support special operations for such
related documents. Similarly, it isn't sensible to support those
operations for variant objects. To me it seems self-evident that a
protocol for the support of editorial systems will probably need
special semantics for editorial processes and data relationships.

>And (b) Should that knowledge be used by the server in content negotiation?

>(b) is the hard one.

Since (a) is "no" this is actually easy to answer. Sometimes it makes
sense to hide variants behind content-negotiation, and sometimes they
need to be user-visible. My longer examples were intended to show that
both situations can occur. The point is that this is an area where
WebDAV should _not_ require a particular approach because there are
several sensible policies that could serve different applications.

Content negotiation and variants are sometimes intimately related, and
sometimes not. It depends on the server, and the policies that server
is implemented to support.

	
>> I agree.  I was thinking about it this weekend, and I came up some
>> scenarios
>> where trying to express alternate versions in the revision graph breaks
>> down:
>> 
>>    * The French and English versions are both online, but they are
>> translations
>>      of the German version, which is not.  This may be unlikely in the
>> case
>> of
>>      new documents (e.g., translations of a company's Web site), but not
>> in
>> the
>>      case of older books being translated--for example, consider a site in
>> Canada
>>      that wants to publish a translation of Goethe.
>>    * The document was composed electronically, but the original format is
>> not
>>      going to be put on the Web site--e.g., it was written in Word, but it
>> will
>>      be published in HTML and PDF.  Again, a "A derives from B" graph will
>> not
>>      contain the original document.
>>    * The Arabic version derives from revision 1.5 of the English version,
>> but
>>      later revisions have not been translated (say, because they contain
>>      references to things which would get the document censored in some
>> Muslim
>>      countries); some time later, the site manager wants to prune his
>> revision
>>      graph, but he can't prune revision 1.5, because then the graph will
>> not
	> show
>>      any relation between the Arabic and English versions.
>> 
>> These could all be handled by workarounds of some sort (permitting
>> revision
>> relations for documents that don't actually exist), but I believe it would
>> be
>> cleaner to express equivalent versions as a separate equivalence relation,
>> rather
>> than forcing it into the graph.  The graph for the Goethe example would
>> then
>> be a
>> forest of revision trees; certain revisions in each tree would then be
>> marked as
>> valid avatars of the document, and content negotiation would select the
>> best
>> available avatar.
>> 
>> Of course, I'm not saying that each avatar must be in a separate tree--we
>> certainly want to be able to express is-derived-from relationships where
>> they
>> exist--but those won't always be the only relationships.
------------------------------------------+----------------------------
David Durand                 dgd@cs.bu.edu| david@dynamicDiagrams.com
Boston University Computer Science        | Dynamic Diagrams
http://www.cs.bu.edu/students/grads/dgd/  | http://dynamicDiagrams.com/
                                          | MAPA: mapping for the WWW
Received on Monday, 1 June 1998 16:42:25 UTC