RE: Updated Versioning Strategies document XMLVersioning-41 from David Orchard on 2008-04-09 (www-tag@w3.org from April 2008)

From: David Orchard <dorchard@bea.com>
Date: Wed, 9 Apr 2008 15:07:57 -0700
To: "Marc de Graauw" <marc@marcdegraauw.com>, <orchard@pacificspirit.com>, <www-tag@w3.org>
Message-ID: <BEBB9CBE66B372469E93FFDE3EDC493E01A4FF35@repbex01.amer.bea.com>
Awesome and timely comments.  I might be able to get to them before the
TAG starts cranking on their reviews...

Cheers,
Dave 

> -----Original Message-----
> From: www-tag-request@w3.org [mailto:www-tag-request@w3.org] 
> On Behalf Of Marc de Graauw
> Sent: Wednesday, April 09, 2008 2:24 PM
> To: orchard@pacificspirit.com; www-tag@w3.org
> Subject: RE: Updated Versioning Strategies document XMLVersioning-41
> 
> 
> Dave Orchard:
> 
> | Based upon feedback from Noah, the TAG's Feb f2f, and phone 
> | discussions with Noah.
> |  
> | http://www.w3.org/2001/tag/doc/versioning-compatibility-strategies
> | http://www.w3.org/2001/tag/doc/versioning-compatibility-strate
> | gies-20080328.html
> |  
> | These are now ready for review by Ashok, Dan, Noah, Norm, and Raman 
> | per our agreements at the Vancouver F2F in 
> | http://www.w3.org/2008/02/26-tagmem-minutes#ActionSummary
> 
> Hi Dave, 
> 
> I'd like to drop in some comments as well (I know I promised 
> to deliver them sooner, apologies for that).
> 
> 1.1 "Whether ten, a hundred, or a million *resources* have 
> been deployed"
> 
> "applications" or "processors" would be better to avoid 
> confusion with URI-resources.
> 
> (Two older comments, I've re-inserted then here again)
> 
> 1.2: "Among the various kinds of languages, we find..."
> 
> It's obvious, but I think it should be made explicit that the 
> doc does not apply to natural language.
> 
> 1.2: "programming languages such as Java or ECMAScript..."
> 
> I don't think this Finding, which is mainly about forward 
> compatibility, applies to programming languages either. 
> 
> Suppose the final Python 3 release would include "x" as 
> alternative notation for the multiplication operater. Take 
> the following Python 3 source:
> 
> def double(i):
>   i = 2 x i
>   return i
> 
> If a Python 2.5 processor were to process this source in a 
> forward compatible way, it would have to ignore the statement 
> "i = 2 x i" and thus return the input without doubling it. I 
> can't think of any context where such behaviour would be 
> useful. I think there is a difference between languages which 
> contain mainly (text or typed) data and languages which 
> contain processable instructions (admittedly there is a large 
> overlap between those two), and forward compatibility does 
> not apply to the latter category. Most of the 'Good 
> Practices' mentioned in your doc don't apply to programming languages.
> 
> 2. "None. No distinction is made between versions of the language"
> 
> add "in the document instances", the language specification 
> may very well contain version info.
> 
> 2. "Applications are expected to behave properly"
> 
> There are at least four common relevant behaviours when an 
> application receives a document with an error:
> 1) produce an error and fail
> 2) proceed with errors/warnings
> 3) give a user the option to continue or abort
> 4) proceed silently
> I think the Versioning Doc could mention such distinctions explicitly.
> 
> 2. "For example, many W3C languages adopt a strategy of 
> incompatible changes are allowed between Working Drafts and 
> up to Candidate Recommendation, but then Proposed 
> Recommendation and Recommendation are all compatible versions."
> 
> editorial: 
> "For example, many W3C languages adopt a strategy of allowing 
> incompatible changes Working Drafts and up to Candidate 
> Recommendation, but then keeping Proposed Recommendation and 
> Recommendation compatible."
> 
> 2.1 par. 2, editorial: "At the other end of the spectrum is 
> <add>an</add> incompatible versioning approach"
> 
> 2.1 par. 3: "Typically, when introducing a new version using 
> the incompatible approach, all of the software that produces 
> or consumes the texts is updated..."
> 
> In general, the hidden premise in this Finding is one of 
> exchanging messages, which 'disappear' after consumption. But 
> versioning applies equally well to longer-lived documents. 
> There are a few common cases I think deserve mentioning:
> - A very common approach is when a new consumer meets an old 
> text, is for the consumer to upgrade the text (silently or at 
> user option) to the new version.
> This approach is particularly common in word processors and databases.
> - With longer-lived documents and data structures, the 
> relation between a producer and a text may not be one-to-one. 
> For instance, in a database some records may originate from 
> an older producer, others from a newer one. Also in larger 
> markup documents, it is quite possible that some parts have 
> been produced by an older producer, other parts by a newer 
> one. Using version identifiers on a per-record basis in a 
> database is uncommon, as is the use of a version identifier 
> on a per-division (paragraph, sentence, chapter) basis in 
> markup documents. This makes the relation version-document 
> more complicated: should we assume the entire document to be 
> of the version of the latest producer? There is a relation 
> with the previous point, since a common approach is for a 
> consumer/producer to open a document (or database), check the 
> version, ask the user to convert to the latest version if 
> necessary, or simply write new structures in the old document 
> (database) when this is allowed. 
> 
> 2.1 par. 4, editorial: "For example, it might be that there 
> are many messages that don't use any features" - I'd use 
> 'documents' or 'texts' instead of 'messages' which is more in 
> line with the rest of the finding.
> 
> 2.1.1, par. 4: "If a name contains first, last, and middle 
> then the previous options yield answers of: 2, 1, 2, 1-2" 
> 
> This is only true if the language has some ignore-unknown 
> strategy, and 'ignore unknown' hasn't been really introduced 
> at this point. So either make explicit that the language V.1 
> has an ingnore-unknown strategy approach, or omit this part 
> of the example.
> 
> 2.1.1.1 par 1: "Usually the first broadly available version 
> starts at "1.0""
> 
> Another common approach is to use 1.0 for the first version 
> for which backward compatibility is guaranteed for following 
> versions, whereas no guarantees are given for pre-1.0 
> versions. Django for instance will use 1.0 for the first 
> version for which upgrades will be guaranteed to be backward 
> compatible.
> 
> 2.1.1.1 general: Version identification can apply to the 
> specification of a language or the instance documents or 
> texts produced by applications implementing this 
> specification. For instance, there is an XML 1.0 W3C 
> Recommendation, and there are XML 1.0 documents which may or 
> may not identify themselves a being XML 1.0 documents. This 
> paragraph discusses version identification in document 
> instances only. 
> 
> 2.1.1.1 par 2, editorial: "in the protocol messages 
> containing <del>in</del> the text"
> 
> 2.1.1.1 general: programming languages, mentioned in 1.2, 
> very often do not have version id's in their texts. C, SQL, 
> Python sourcecode does not mention the version of C, SQL or 
> Python used.
> 
> 2.1.1.1 par, 6: "For example, RSS has 0.9x, 1.x, and 2.x 
> versions, all being actively developed in parallel." 
> 
> Is this still true?
> 
> 2.1.1.2 This paragraph raises the interesting but complicated 
> issue whether an XML document with content in several 
> namespaces should be considered a document in one language or 
> in several languages. In one sense, it's all XML, in another 
> sense, nested sublanguages.
> 
> 3: "As this finding focuses on compatible versioning, we 
> provide no more focus on incompatible evolution."
> 
> I MUST, strongly, persistently, vehemently object to this 
> utter, complete... - well, words fail me - omission. There is 
> a difference between publishing, HMTL style, for the world, 
> where consumers may do as they wish with whatever is 
> published, and messaging, where senders and receivers are 
> (often contractually) bound. Must accept unknowns is a very 
> good approach, but it will only work in messaging if, and 
> only if, it can be overruled by some 'must understand'
> indicator. There is no way in medical prescriptions (my 
> background) or stock orders or any serious messaging to have 
> 'must accept unknowns' as a blanket policy for consumers, 
> without being able to overrule this. Would you accept it if 
> your bank executed a stock order above your maximum price, 
> saying 'Well, we are still on v.1.0, which does not have the 
> max price, and we read the W3C's Versioning Strategies, 
> so...'. This Finding needs some explicit texts on overruling 
> 'must accept unknowns' through 'must understand', or similar 
> mechanisms, which are, in effect, mechanisms which force 
> consumers to be incompatible in some circumstances.
> 
> 4, editorial: "Backwards compatibility evolution of a 
> language means that producers of texts in a language should 
> be able to produce texts that consumers that have been 
> updated with a newer version of the language will understand."
> 
> I'd make that: "Backwards compatibility evolution of a 
> language means that
> *consumers* of texts in a language should be able to 
> *consume* texts that
> *producers* that *were based on an older* version of the 
> language will understand."
> 
> It's correct as it stands, but it seems to reverse the burden 
> of effort, which for BC is usually on the (newer) consumers.
> 
> 4.1, par. 2, editorial: "Defined Text Set" - This is the 
> first time the term is used in this doc, so maybe you could 
> add a reference to the Terminology Doc.
> 
> 5, par. 1, editorial: "producers of texts in a language 
> should be able to produce texts in a 
> <del>revision</del><add>newer version</add> of the language"
> - this makes it more generic, not all new versions are revisions.
> 
> 5: "Please select one of the following 3 alternatives for the 
> finding" 
> 
> There are only 2. I'd prefer the second. As I mentioned 
> before, I do not think this should apply to programming languages.
> 
> 5: "Extensible" - the links don't jump to the definition, but 
> a bit before it.
> 
> 5.1, par. 1, editorial: "If the software consuming the 
> extension "knows" about the extension, then it has been 
> revised and uses the revised language that incorporates the 
> extension."
> 
> I'd drop this sentence, it's redundant and only obfuscates the point.
> 
> 5.1, par. 2: "Consumers MUST accept text portions..."
> 
> I think you should say something about what 'accept' means. 
> See the points on levels of error / failure above.
> 
> 5.1, par. 3: "any texts with extensions SHOULD be compatible 
> with a text without the extensions"
> 
> No, this uses 'compatible' in a confusing way, since this is 
> not simple BC or FC. The point is "any texts with extensions 
> MAY be processed without the extensions" or "removal of 
> extensions SHOULD be allowed'.
> 
> 5.1, pr 6, editorial: "Object systems typically call this 
> "polymorphism", where a new type can behave as the old type." 
> 
> I'd drop this, it is not needed and will only invoke 
> discussion whether the comparison is justified or not.
> 
> 5.1, pars. 4 and 6: The real distinction is not 'Accept and 
> Ignore' vs. 'Accept and Preserve'(since that approach ignores 
> the content as well) but between 'Ignore and Discard' vs. 
> 'Ignore and Preserve'.
> 
> 5.1.1, par. 4, editorial: "some elements 
> <del>who's</del><add>whose</del> children" - maybe I'm wrong, 
> I'm not a native
> 
> 5.2, NOFRAMES - a maybe even better example is IMG/ALT in HTML
> 
> 5.3, par. 1 "Good Practice: Default Unknown Version 
> Identifier Handling Rule:
> Languages MUST provide a default model for unknown version 
> identifiers for forwards-compatible evolution."
> 
> I believe this is the wrong way around. Newer specs (and thus 
> producers) should provide a way for older consumers to know 
> whether they may process a message.
> The newer ones have the more complete knowledge. They can 
> insert the old version identifier if desired. This good 
> practice, and the following paragraph, assume a too simple 
> approach of langauge versions being either compatible or not. 
> In the Netherlands, we have a annual release of a medical 
> (HL7 based) spec, which contains lots of different messages, 
> some compatible, some not, some partly compatible, etc. There 
> is no way a major.minor language version will do. Even 
> per-message-type major.minor versions are not sufficient, 
> since incompatible content may be optional. I strongly 
> believe the only way to resolve such complexities is if newer 
> producers provide the version identifiers the older consumers 
> expect (of course, only when the older consumers may process 
> the messages). So - in your examples - I'd say the 1.1 
> producer would haver to insert the 1.0 and 1.1 versions id's, 
> and this approach would not need the above Good Practice.
> 
> See you in Dublin,
> 
> Regards,
> 
> Marc de Graauw
> 
> http://www.marcdegraauw.com
> 
> 
> 
>
Received on Wednesday, 9 April 2008 22:09:29 UTC