Marc's versioning comments from David Orchard on 2008-05-14 (www-tag@w3.org from May 2008)

From: David Orchard <orchard@pacificspirit.com>
Date: Tue, 13 May 2008 17:19:40 -0700
To: www-tag@w3.org
Message-ID: <2d509b1b0805131719v47583581nba2e9a3f37c6753f@mail.gmail.com>
 Replying to Marc's comments.

 Great review, and thanks.  I'll be publishing a new version shortly.  I've
snipped out the parts that I've agreed with, which is all the editorial and
the majority of the non-editorial comments. I've put DBO>> to indicate my
responses.

1.2: "Among the various kinds of languages, we find..."

It's obvious, but I think it should be made explicit that the doc does not
apply
to natural language.

1.2: "programming languages such as Java or ECMAScript..."

I don't think this Finding, which is mainly about forward compatibility,
applies
to programming languages either.

Suppose the final Python 3 release would include "x" as alternative notation
for
the multiplication operater. Take the following Python 3 source:

def double(i):
  i = 2 x i
  return i

If a Python 2.5 processor were to process this source in a forward
compatible
way, it would have to ignore the statement "i = 2 x i" and thus return the
input
without doubling it. I can't think of any context where such behaviour would
be
useful. I think there is a difference between languages which contain mainly
(text or typed) data and languages which contain processable instructions
(admittedly there is a large overlap between those two), and forward
compatibility does not apply to the latter category. Most of the 'Good
Practices' mentioned in your doc don't apply to programming languages.

DBO>> We continue to disagree.  There are some areas of flexibility in a
language.  I think you've picked a hard one, but then HTML doesn't allow
extra name characters etc. yet we say it is forwards compatible.

2. "Applications are expected to behave properly"

There are at least four common relevant behaviours when an application
receives
a document with an error:
1) produce an error and fail
2) proceed with errors/warnings
3) give a user the option to continue or abort
4) proceed silently
I think the Versioning Doc could mention such distinctions explicitly.

DBO>>II added these in with some wordsmithing, but it still feels rough to
me.

2.1 par. 3: "Typically, when introducing a new version using the
incompatible
approach, all of the software that produces or consumes the texts is
updated..."

In general, the hidden premise in this Finding is one of exchanging
messages,
which 'disappear' after consumption. But versioning applies equally well to
longer-lived documents. There are a few common cases I think deserve
mentioning:
- A very common approach is when a new consumer meets an old text, is for
the
consumer to upgrade the text (silently or at user option) to the new
version.
This approach is particularly common in word processors and databases.
- With longer-lived documents and data structures, the relation between a
producer and a text may not be one-to-one. For instance, in a database some
records may originate from an older producer, others from a newer one. Also
in
larger markup documents, it is quite possible that some parts have been
produced
by an older producer, other parts by a newer one. Using version identifiers
on a
per-record basis in a database is uncommon, as is the use of a version
identifier on a per-division (paragraph, sentence, chapter) basis in markup
documents. This makes the relation version-document more complicated: should
we
assume the entire document to be of the version of the latest producer?
There is
a relation with the previous point, since a common approach is for a
consumer/producer to open a document (or database), check the version, ask
the
user to convert to the latest version if necessary, or simply write new
structures in the old document (database) when this is allowed.

DBO>> I agree that there is a hidden premise that texts disappear after
consumption, hence the use of the term consumption.  I note your point
about upgrading the document, but I'm unsure how that affects the finding.
In the upgrade word processor document approach, I would see that the
consumer consumes and upgrades the text, producing a newer version.  Which
falls into the case mentioned where the consumer has been updated.   Also, I
don't think that this precludes version identifiers from being in
subsections of a document, where the subsections are mapped to various
database tables.  What would you like to see different?

2.1.1, par. 4: "If a name contains first, last, and middle then the previous
options yield answers of: 2, 1, 2, 1-2"

This is only true if the language has some ignore-unknown strategy, and
'ignore
unknown' hasn't been really introduced at this point. So either make
explicit
that the language V.1 has an ingnore-unknown strategy approach, or omit this
part of the example.

DBO>>I slightly disagree, because I'm just specifying the language
identification rules here, not the processing rules.  The issue of
identification vs processing and uncoupling them has been very tough for
many years now.  I do point forward to the rules for forwards compatibility.

2.1.1.1 general: programming languages, mentioned in 1.2, very often do not
have
version id's in their texts. C, SQL, Python sourcecode does not mention the
version of C, SQL or Python used.

DBO>>Agreed.

2.1.1.2 This paragraph raises the interesting but complicated issue whether
an
XML document with content in several namespaces should be considered a
document
in one language or in several languages. In one sense, it's all XML, in
another
sense, nested sublanguages.

DBO>>I had earlier text in the strategies document about extensibility vs
versioning at
http://www.w3.org/tag/docversioning-strategies-20070917.html#iddiv172692152.
I expunged this for space reasons..

3: "As this finding focuses on compatible versioning, we provide no more
focus
on incompatible evolution."

I MUST, strongly, persistently, vehemently object to this utter, complete...
-
well, words fail me - omission. There is a difference between publishing,
HMTL
style, for the world, where consumers may do as they wish with whatever is
published, and messaging, where senders and receivers are (often
contractually)
bound. Must accept unknowns is a very good approach, but it will only work
in
messaging if, and only if, it can be overruled by some 'must understand'
indicator. There is no way in medical prescriptions (my background) or stock
orders or any serious messaging to have 'must accept unknowns' as a blanket
policy for consumers, without being able to overrule this. Would you accept
it
if your bank executed a stock order above your maximum price, saying 'Well,
we
are still on v.1.0, which does not have the max price, and we read the W3C's
Versioning Strategies, so...'. This Finding needs some explicit texts on
overruling 'must accept unknowns' through 'must understand', or similar
mechanisms, which are, in effect, mechanisms which force consumers to be
incompatible in some circumstances.

DBO>>I'm very sensitive to this issue.  I have attempted to make the finding
as absolutely as short as possible so as to tell a coherent forwards
compatible versioning story.  This document is, as stated, about achieving
compatible versioning.  It is not a general document about versioning.  I
really believe that I have to limit the scope of this beast somehow to get
to an actual Finding.  I did add a bit in "Another example is where a
producer wants to indicate that an extension must be understood. This could
be indicated inline using a mustUnderstand model, such as SOAP or an
application specific model. " but I am very loathe to add any more text.
The document must stop growing.

5: "Please select one of the following 3 alternatives for the finding"

There are only 2. I'd prefer the second. As I mentioned before, I do not
think
this should apply to programming languages.

DBO>> the 3rd is the sentence "We have observed that languages that are
successfully versioned are generally extensible".  I am personally strongly
against this 3rd option and a big proponent of the first.

5: "Extensible" - the links don't jump to the definition, but a bit before
it.

DBO>> I don't know why this is.

5.1, par. 2: "Consumers MUST accept text portions..."

I think you should say something about what 'accept' means. See the points
on
levels of error / failure above.

DBO>>hmm...

5.1, pars. 4 and 6: The real distinction is not 'Accept and Ignore' vs.
'Accept
and Preserve'(since that approach ignores the content as well) but between
'Ignore and Discard' vs. 'Ignore and Preserve'.

DBO>>I think there are 3 rules: Accept, accept and discard, accept and
preserve.  The first says nothing about discarding or preserving.  It's
effectively ignore.

5.3, par. 1 "Good Practice: Default Unknown Version Identifier Handling
Rule:
Languages MUST provide a default model for unknown version identifiers for
forwards-compatible evolution."

I believe this is the wrong way around. Newer specs (and thus producers)
should
provide a way for older consumers to know whether they may process a
message.
The newer ones have the more complete knowledge. They can insert the old
version
identifier if desired. This good practice, and the following paragraph,
assume a
too simple approach of langauge versions being either compatible or not. In
the
Netherlands, we have a annual release of a medical (HL7 based) spec, which
contains lots of different messages, some compatible, some not, some partly
compatible, etc. There is no way a major.minor language version will do.
Even
per-message-type major.minor versions are not sufficient, since incompatible
content may be optional. I strongly believe the only way to resolve such
complexities is if newer producers provide the version identifiers the older
consumers expect (of course, only when the older consumers may process the
messages). So - in your examples - I'd say the 1.1 producer would haver to
insert the 1.0 and 1.1 versions id's, and this approach would not need the
above
Good Practice.

DBO>>I agree with your overall thesis that major.minor doesn't work in
non-trivial distributed extensibility.  I have struggled for years to
express that point but it's not gaining a lot of traction.
I've done a big rewrite of the relevent sections.

See you in Dublin,

DBO>>Indeed, it was really great to see you there!

Cheers,
Dave
Received on Wednesday, 14 May 2008 00:31:02 UTC