RE: Updated Versioning Strategies document XMLVersioning-41 from Marc de Graauw on 2008-04-09 (www-tag@w3.org from April 2008)

From: Marc de Graauw <marc@marcdegraauw.com>
Date: Wed, 9 Apr 2008 23:23:41 +0200
To: <orchard@pacificspirit.com>, <www-tag@w3.org>
Message-ID: <647B64D3842D4311A87CC87AE966BBCE@Marc>
Dave Orchard:

| Based upon feedback from Noah, the TAG's Feb f2f, and phone 
| discussions with Noah.
|  
| http://www.w3.org/2001/tag/doc/versioning-compatibility-strategies
| http://www.w3.org/2001/tag/doc/versioning-compatibility-strate
| gies-20080328.html
|  
| These are now ready for review by Ashok, Dan, Noah, Norm, and 
| Raman per our agreements at the Vancouver F2F in 
| http://www.w3.org/2008/02/26-tagmem-minutes#ActionSummary

Hi Dave, 

I'd like to drop in some comments as well (I know I promised to deliver them
sooner, apologies for that).

1.1 "Whether ten, a hundred, or a million *resources* have been deployed"

"applications" or "processors" would be better to avoid confusion with
URI-resources.

(Two older comments, I've re-inserted then here again)

1.2: "Among the various kinds of languages, we find..."

It's obvious, but I think it should be made explicit that the doc does not apply
to natural language.

1.2: "programming languages such as Java or ECMAScript..."

I don't think this Finding, which is mainly about forward compatibility, applies
to programming languages either. 

Suppose the final Python 3 release would include "x" as alternative notation for
the multiplication operater. Take the following Python 3 source:

def double(i):
  i = 2 x i
  return i

If a Python 2.5 processor were to process this source in a forward compatible
way, it would have to ignore the statement "i = 2 x i" and thus return the input
without doubling it. I can't think of any context where such behaviour would be
useful. I think there is a difference between languages which contain mainly
(text or typed) data and languages which contain processable instructions
(admittedly there is a large overlap between those two), and forward
compatibility does not apply to the latter category. Most of the 'Good
Practices' mentioned in your doc don't apply to programming languages.

2. "None. No distinction is made between versions of the language"

add "in the document instances", the language specification may very well
contain version info.

2. "Applications are expected to behave properly"

There are at least four common relevant behaviours when an application receives
a document with an error:
1) produce an error and fail
2) proceed with errors/warnings
3) give a user the option to continue or abort
4) proceed silently
I think the Versioning Doc could mention such distinctions explicitly.

2. "For example, many W3C languages adopt a strategy of incompatible changes are
allowed between Working Drafts and up to Candidate Recommendation, but then
Proposed Recommendation and Recommendation are all compatible versions."

editorial: 
"For example, many W3C languages adopt a strategy of allowing incompatible
changes Working Drafts and up to Candidate Recommendation, but then keeping
Proposed Recommendation and Recommendation compatible."

2.1 par. 2, editorial: "At the other end of the spectrum is <add>an</add>
incompatible versioning approach"

2.1 par. 3: "Typically, when introducing a new version using the incompatible
approach, all of the software that produces or consumes the texts is updated..."

In general, the hidden premise in this Finding is one of exchanging messages,
which 'disappear' after consumption. But versioning applies equally well to
longer-lived documents. There are a few common cases I think deserve mentioning:
- A very common approach is when a new consumer meets an old text, is for the
consumer to upgrade the text (silently or at user option) to the new version.
This approach is particularly common in word processors and databases.
- With longer-lived documents and data structures, the relation between a
producer and a text may not be one-to-one. For instance, in a database some
records may originate from an older producer, others from a newer one. Also in
larger markup documents, it is quite possible that some parts have been produced
by an older producer, other parts by a newer one. Using version identifiers on a
per-record basis in a database is uncommon, as is the use of a version
identifier on a per-division (paragraph, sentence, chapter) basis in markup
documents. This makes the relation version-document more complicated: should we
assume the entire document to be of the version of the latest producer? There is
a relation with the previous point, since a common approach is for a
consumer/producer to open a document (or database), check the version, ask the
user to convert to the latest version if necessary, or simply write new
structures in the old document (database) when this is allowed. 

2.1 par. 4, editorial: "For example, it might be that there are many messages
that don't use any features" - I'd use 'documents' or 'texts' instead of
'messages' which is more in line with the rest of the finding.

2.1.1, par. 4: "If a name contains first, last, and middle then the previous
options yield answers of: 2, 1, 2, 1-2" 

This is only true if the language has some ignore-unknown strategy, and 'ignore
unknown' hasn't been really introduced at this point. So either make explicit
that the language V.1 has an ingnore-unknown strategy approach, or omit this
part of the example.

2.1.1.1 par 1: "Usually the first broadly available version starts at "1.0""

Another common approach is to use 1.0 for the first version for which backward
compatibility is guaranteed for following versions, whereas no guarantees are
given for pre-1.0 versions. Django for instance will use 1.0 for the first
version for which upgrades will be guaranteed to be backward compatible.

2.1.1.1 general: Version identification can apply to the specification of a
language or the instance documents or texts produced by applications
implementing this specification. For instance, there is an XML 1.0 W3C
Recommendation, and there are XML 1.0 documents which may or may not identify
themselves a being XML 1.0 documents. This paragraph discusses version
identification in document instances only. 

2.1.1.1 par 2, editorial: "in the protocol messages containing <del>in</del> the
text"

2.1.1.1 general: programming languages, mentioned in 1.2, very often do not have
version id's in their texts. C, SQL, Python sourcecode does not mention the
version of C, SQL or Python used.

2.1.1.1 par, 6: "For example, RSS has 0.9x, 1.x, and 2.x versions, all being
actively developed in parallel." 

Is this still true?

2.1.1.2 This paragraph raises the interesting but complicated issue whether an
XML document with content in several namespaces should be considered a document
in one language or in several languages. In one sense, it's all XML, in another
sense, nested sublanguages.

3: "As this finding focuses on compatible versioning, we provide no more focus
on incompatible evolution."

I MUST, strongly, persistently, vehemently object to this utter, complete... -
well, words fail me - omission. There is a difference between publishing, HMTL
style, for the world, where consumers may do as they wish with whatever is
published, and messaging, where senders and receivers are (often contractually)
bound. Must accept unknowns is a very good approach, but it will only work in
messaging if, and only if, it can be overruled by some 'must understand'
indicator. There is no way in medical prescriptions (my background) or stock
orders or any serious messaging to have 'must accept unknowns' as a blanket
policy for consumers, without being able to overrule this. Would you accept it
if your bank executed a stock order above your maximum price, saying 'Well, we
are still on v.1.0, which does not have the max price, and we read the W3C's
Versioning Strategies, so...'. This Finding needs some explicit texts on
overruling 'must accept unknowns' through 'must understand', or similar
mechanisms, which are, in effect, mechanisms which force consumers to be
incompatible in some circumstances.

4, editorial: "Backwards compatibility evolution of a language means that
producers of texts in a language should be able to produce texts that consumers
that have been updated with a newer version of the language will understand."

I'd make that: "Backwards compatibility evolution of a language means that
*consumers* of texts in a language should be able to *consume* texts that
*producers* that *were based on an older* version of the language will
understand."

It's correct as it stands, but it seems to reverse the burden of effort, which
for BC is usually on the (newer) consumers.

4.1, par. 2, editorial: "Defined Text Set" - This is the first time the term is
used in this doc, so maybe you could add a reference to the Terminology Doc.

5, par. 1, editorial: "producers of texts in a language should be able to
produce texts in a <del>revision</del><add>newer version</add> of the language"
- this makes it more generic, not all new versions are revisions.

5: "Please select one of the following 3 alternatives for the finding" 

There are only 2. I'd prefer the second. As I mentioned before, I do not think
this should apply to programming languages.

5: "Extensible" - the links don't jump to the definition, but a bit before it.

5.1, par. 1, editorial: "If the software consuming the extension "knows" about
the extension, then it has been revised and uses the revised language that
incorporates the extension."

I'd drop this sentence, it's redundant and only obfuscates the point.

5.1, par. 2: "Consumers MUST accept text portions..."

I think you should say something about what 'accept' means. See the points on
levels of error / failure above.

5.1, par. 3: "any texts with extensions SHOULD be compatible with a text without
the extensions"

No, this uses 'compatible' in a confusing way, since this is not simple BC or
FC. The point is "any texts with extensions MAY be processed without the
extensions" or "removal of extensions SHOULD be allowed'.

5.1, pr 6, editorial: "Object systems typically call this "polymorphism", where
a new type can behave as the old type." 

I'd drop this, it is not needed and will only invoke discussion whether the
comparison is justified or not.

5.1, pars. 4 and 6: The real distinction is not 'Accept and Ignore' vs. 'Accept
and Preserve'(since that approach ignores the content as well) but between
'Ignore and Discard' vs. 'Ignore and Preserve'.

5.1.1, par. 4, editorial: "some elements <del>who's</del><add>whose</del>
children" - maybe I'm wrong, I'm not a native

5.2, NOFRAMES - a maybe even better example is IMG/ALT in HTML

5.3, par. 1 "Good Practice: Default Unknown Version Identifier Handling Rule:
Languages MUST provide a default model for unknown version identifiers for
forwards-compatible evolution."

I believe this is the wrong way around. Newer specs (and thus producers) should
provide a way for older consumers to know whether they may process a message.
The newer ones have the more complete knowledge. They can insert the old version
identifier if desired. This good practice, and the following paragraph, assume a
too simple approach of langauge versions being either compatible or not. In the
Netherlands, we have a annual release of a medical (HL7 based) spec, which
contains lots of different messages, some compatible, some not, some partly
compatible, etc. There is no way a major.minor language version will do. Even
per-message-type major.minor versions are not sufficient, since incompatible
content may be optional. I strongly believe the only way to resolve such
complexities is if newer producers provide the version identifiers the older
consumers expect (of course, only when the older consumers may process the
messages). So - in your examples - I'd say the 1.1 producer would haver to
insert the 1.0 and 1.1 versions id's, and this approach would not need the above
Good Practice.

See you in Dublin,

Regards,

Marc de Graauw

http://www.marcdegraauw.com
Received on Wednesday, 9 April 2008 21:22:25 UTC