- From: Marc de Graauw <marc@marcdegraauw.com>
- Date: Wed, 9 Apr 2008 23:23:41 +0200
- To: <orchard@pacificspirit.com>, <www-tag@w3.org>
Dave Orchard: | Based upon feedback from Noah, the TAG's Feb f2f, and phone | discussions with Noah. | | http://www.w3.org/2001/tag/doc/versioning-compatibility-strategies | http://www.w3.org/2001/tag/doc/versioning-compatibility-strate | gies-20080328.html | | These are now ready for review by Ashok, Dan, Noah, Norm, and | Raman per our agreements at the Vancouver F2F in | http://www.w3.org/2008/02/26-tagmem-minutes#ActionSummary Hi Dave, I'd like to drop in some comments as well (I know I promised to deliver them sooner, apologies for that). 1.1 "Whether ten, a hundred, or a million *resources* have been deployed" "applications" or "processors" would be better to avoid confusion with URI-resources. (Two older comments, I've re-inserted then here again) 1.2: "Among the various kinds of languages, we find..." It's obvious, but I think it should be made explicit that the doc does not apply to natural language. 1.2: "programming languages such as Java or ECMAScript..." I don't think this Finding, which is mainly about forward compatibility, applies to programming languages either. Suppose the final Python 3 release would include "x" as alternative notation for the multiplication operater. Take the following Python 3 source: def double(i): i = 2 x i return i If a Python 2.5 processor were to process this source in a forward compatible way, it would have to ignore the statement "i = 2 x i" and thus return the input without doubling it. I can't think of any context where such behaviour would be useful. I think there is a difference between languages which contain mainly (text or typed) data and languages which contain processable instructions (admittedly there is a large overlap between those two), and forward compatibility does not apply to the latter category. Most of the 'Good Practices' mentioned in your doc don't apply to programming languages. 2. "None. No distinction is made between versions of the language" add "in the document instances", the language specification may very well contain version info. 2. "Applications are expected to behave properly" There are at least four common relevant behaviours when an application receives a document with an error: 1) produce an error and fail 2) proceed with errors/warnings 3) give a user the option to continue or abort 4) proceed silently I think the Versioning Doc could mention such distinctions explicitly. 2. "For example, many W3C languages adopt a strategy of incompatible changes are allowed between Working Drafts and up to Candidate Recommendation, but then Proposed Recommendation and Recommendation are all compatible versions." editorial: "For example, many W3C languages adopt a strategy of allowing incompatible changes Working Drafts and up to Candidate Recommendation, but then keeping Proposed Recommendation and Recommendation compatible." 2.1 par. 2, editorial: "At the other end of the spectrum is <add>an</add> incompatible versioning approach" 2.1 par. 3: "Typically, when introducing a new version using the incompatible approach, all of the software that produces or consumes the texts is updated..." In general, the hidden premise in this Finding is one of exchanging messages, which 'disappear' after consumption. But versioning applies equally well to longer-lived documents. There are a few common cases I think deserve mentioning: - A very common approach is when a new consumer meets an old text, is for the consumer to upgrade the text (silently or at user option) to the new version. This approach is particularly common in word processors and databases. - With longer-lived documents and data structures, the relation between a producer and a text may not be one-to-one. For instance, in a database some records may originate from an older producer, others from a newer one. Also in larger markup documents, it is quite possible that some parts have been produced by an older producer, other parts by a newer one. Using version identifiers on a per-record basis in a database is uncommon, as is the use of a version identifier on a per-division (paragraph, sentence, chapter) basis in markup documents. This makes the relation version-document more complicated: should we assume the entire document to be of the version of the latest producer? There is a relation with the previous point, since a common approach is for a consumer/producer to open a document (or database), check the version, ask the user to convert to the latest version if necessary, or simply write new structures in the old document (database) when this is allowed. 2.1 par. 4, editorial: "For example, it might be that there are many messages that don't use any features" - I'd use 'documents' or 'texts' instead of 'messages' which is more in line with the rest of the finding. 2.1.1, par. 4: "If a name contains first, last, and middle then the previous options yield answers of: 2, 1, 2, 1-2" This is only true if the language has some ignore-unknown strategy, and 'ignore unknown' hasn't been really introduced at this point. So either make explicit that the language V.1 has an ingnore-unknown strategy approach, or omit this part of the example. 2.1.1.1 par 1: "Usually the first broadly available version starts at "1.0"" Another common approach is to use 1.0 for the first version for which backward compatibility is guaranteed for following versions, whereas no guarantees are given for pre-1.0 versions. Django for instance will use 1.0 for the first version for which upgrades will be guaranteed to be backward compatible. 2.1.1.1 general: Version identification can apply to the specification of a language or the instance documents or texts produced by applications implementing this specification. For instance, there is an XML 1.0 W3C Recommendation, and there are XML 1.0 documents which may or may not identify themselves a being XML 1.0 documents. This paragraph discusses version identification in document instances only. 2.1.1.1 par 2, editorial: "in the protocol messages containing <del>in</del> the text" 2.1.1.1 general: programming languages, mentioned in 1.2, very often do not have version id's in their texts. C, SQL, Python sourcecode does not mention the version of C, SQL or Python used. 2.1.1.1 par, 6: "For example, RSS has 0.9x, 1.x, and 2.x versions, all being actively developed in parallel." Is this still true? 2.1.1.2 This paragraph raises the interesting but complicated issue whether an XML document with content in several namespaces should be considered a document in one language or in several languages. In one sense, it's all XML, in another sense, nested sublanguages. 3: "As this finding focuses on compatible versioning, we provide no more focus on incompatible evolution." I MUST, strongly, persistently, vehemently object to this utter, complete... - well, words fail me - omission. There is a difference between publishing, HMTL style, for the world, where consumers may do as they wish with whatever is published, and messaging, where senders and receivers are (often contractually) bound. Must accept unknowns is a very good approach, but it will only work in messaging if, and only if, it can be overruled by some 'must understand' indicator. There is no way in medical prescriptions (my background) or stock orders or any serious messaging to have 'must accept unknowns' as a blanket policy for consumers, without being able to overrule this. Would you accept it if your bank executed a stock order above your maximum price, saying 'Well, we are still on v.1.0, which does not have the max price, and we read the W3C's Versioning Strategies, so...'. This Finding needs some explicit texts on overruling 'must accept unknowns' through 'must understand', or similar mechanisms, which are, in effect, mechanisms which force consumers to be incompatible in some circumstances. 4, editorial: "Backwards compatibility evolution of a language means that producers of texts in a language should be able to produce texts that consumers that have been updated with a newer version of the language will understand." I'd make that: "Backwards compatibility evolution of a language means that *consumers* of texts in a language should be able to *consume* texts that *producers* that *were based on an older* version of the language will understand." It's correct as it stands, but it seems to reverse the burden of effort, which for BC is usually on the (newer) consumers. 4.1, par. 2, editorial: "Defined Text Set" - This is the first time the term is used in this doc, so maybe you could add a reference to the Terminology Doc. 5, par. 1, editorial: "producers of texts in a language should be able to produce texts in a <del>revision</del><add>newer version</add> of the language" - this makes it more generic, not all new versions are revisions. 5: "Please select one of the following 3 alternatives for the finding" There are only 2. I'd prefer the second. As I mentioned before, I do not think this should apply to programming languages. 5: "Extensible" - the links don't jump to the definition, but a bit before it. 5.1, par. 1, editorial: "If the software consuming the extension "knows" about the extension, then it has been revised and uses the revised language that incorporates the extension." I'd drop this sentence, it's redundant and only obfuscates the point. 5.1, par. 2: "Consumers MUST accept text portions..." I think you should say something about what 'accept' means. See the points on levels of error / failure above. 5.1, par. 3: "any texts with extensions SHOULD be compatible with a text without the extensions" No, this uses 'compatible' in a confusing way, since this is not simple BC or FC. The point is "any texts with extensions MAY be processed without the extensions" or "removal of extensions SHOULD be allowed'. 5.1, pr 6, editorial: "Object systems typically call this "polymorphism", where a new type can behave as the old type." I'd drop this, it is not needed and will only invoke discussion whether the comparison is justified or not. 5.1, pars. 4 and 6: The real distinction is not 'Accept and Ignore' vs. 'Accept and Preserve'(since that approach ignores the content as well) but between 'Ignore and Discard' vs. 'Ignore and Preserve'. 5.1.1, par. 4, editorial: "some elements <del>who's</del><add>whose</del> children" - maybe I'm wrong, I'm not a native 5.2, NOFRAMES - a maybe even better example is IMG/ALT in HTML 5.3, par. 1 "Good Practice: Default Unknown Version Identifier Handling Rule: Languages MUST provide a default model for unknown version identifiers for forwards-compatible evolution." I believe this is the wrong way around. Newer specs (and thus producers) should provide a way for older consumers to know whether they may process a message. The newer ones have the more complete knowledge. They can insert the old version identifier if desired. This good practice, and the following paragraph, assume a too simple approach of langauge versions being either compatible or not. In the Netherlands, we have a annual release of a medical (HL7 based) spec, which contains lots of different messages, some compatible, some not, some partly compatible, etc. There is no way a major.minor language version will do. Even per-message-type major.minor versions are not sufficient, since incompatible content may be optional. I strongly believe the only way to resolve such complexities is if newer producers provide the version identifiers the older consumers expect (of course, only when the older consumers may process the messages). So - in your examples - I'd say the 1.1 producer would haver to insert the 1.0 and 1.1 versions id's, and this approach would not need the above Good Practice. See you in Dublin, Regards, Marc de Graauw http://www.marcdegraauw.com
Received on Wednesday, 9 April 2008 21:22:25 UTC