RE: Comments on July 26, 2006 Versioning Draft

Marc,

Thank you very much for your detailed comments!  I have gone through
them in detail with answers inline.  Please note that I am publishing a
new version of the finding on Friday which has a significant rewrite of
the terminology section .  I think this has picked up some of your
comments.  We will be reviewing this next week at our F2F meeting, which
I expect will result in Yet Another Terminology Rewrite.  I suggest
looking at that post next rewrite to see if your comments have been
completely addressed.  There are a couple of questions that I asked, and
any answers at any time would help and answers in the next day or so
might make it in by Friday.

> -----Original Message-----
> From: Marc de Graauw [mailto:marc@marcdegraauw.com] 
> Sent: Tuesday, August 29, 2006 5:59 AM
> To: David Orchard; www-tag@w3.org
> Subject: Comments on July 26, 2006 Versioning Draft
> 
> Hi David,
> 
> Below some more comments on the July 26 Versioning Draft. 
> BTW, I am co-responsible for the versioning standards in the 
> Dutch Healthcare and Criminal Justice Exchanges which are 
> currently being developed. As such, your document provides 
> very valuable input and I want to stress how much I 
> appreciate the existence of such a document and the effort 
> you've put into it.

Muchos gracias!  

> 
> 1.1, par. 3: "The set of information in a language almost 
> always has semantics." I believe the use of the term 
> "Information Set" here and in other parts is confusing. In an 
> XML context one tends to equate this with Information set as 
> defined in XML Infoset [1]. However, if an XML document
> (Text) is just syntax, the XML Infoset is just syntax too (in 
> fact [1] does not even contain the word "semantics"). Your 
> use of "Information" and "Information Set" is very semantical 
> in nature, as shown by the direct association with semantics 
> in the diagram, as well as the Act of Consumption which 
> impacts the consumer. So I think you should make clear that 
> "Information Set" is not the same as an XML Infoset, but a 
> semantical notion
> - or use another term.

We have struggled with this as well.  I expect it to change.
> 
> 1.1.1, par. 6: "The strings could be compatible but the 
> information conveyed is not compatible." This sentence is 
> unclear to me, I think it needs elaboration.

Elaborated.

> 
> 1.1.1, last par.: "We have shown that forwards and backwards 
> compatibility..." should probably read "We have shown that 
> *both* forwards and backwards compatibility...". In general, 
> I struggled sometimes with your use of the word 
> "compatibility", as you often use it to mean "both forwards 
> and backwards compatibility", what you also term "full 
> compatibility". Now backwards compatibility (without forwards 
> compatibility) of course is a form of compatibility as well, 
> so I think the text would become clearer if you always use 
> "full compatibility" when you mean "both forwards and 
> backwards compatibility", and use "compatibility" to just 
> mean any form of compatibility, full or forwards or backwards .
> 

Yes, trying to get there.

> 1.1.1.1, par. 3: "It may be very difficult for a language 
> designer to know many different language flavours are in 
> existance." should be "It may be very difficult for a 
> language designer to know *how* many different language 
> flavours are in exist*e*nce."
Yes.
> 
> 1.1.1.1, par. 4: If a flavour of a language was also used for 
> consumption, it would have create an instance that is valid 
> according to the Language V1 rules." should be "If a flavour 
> of a language was also used for *production*, it *should* 
> have *to* create an instance that is valid according to the 
> Language V1 rules."
Y.
> 
> 3.1: "If the language can be extended in a compatible way, 
> then a few specific schema design choices must be followed." 
> Further on you describe the possibility to transform new 
> (extended) instances to older instances. If a language makes 
> such transformation (strip all unkown content) required, the 
> Schema's do not need extensibility (with wildcards), so the 
> "must" in this sentence is too strong. 
> 
I think I'll phrase it as "if the language is intended to be capable of
compatible extensibility" 
That way the MUST is still true.
 
> 4, Good Practice #1: "Languages SHOULD be designed for 
> extensibility." I feel this is a bit too strong. Most 
> exchange languages I know of do not implement extensibility 
> mechanisms in the way you describe, and although this is a 
> SHOULD, not a MUST, it still means a lot of well-functioning 
> languages violate this Good Practice. Extensibility should be 
> an option for a language designer, not a SHOULD. You yourself 
> show with your discussions of closed systems and security 
> languages there are perfectly good reasons for not using 
> extensibilty. I myself work in Healthcare, and a Must-Ignore 
> default to medical information is often not the way to go either...
> 

I can understand that pushback and it's totally fair.  In general, this
has a tough time striking a balance between incompatible versioning and
compatible versioning.  I have erred on the side of compatible
versioning, because I know how difficult it can be to design systems for
this.  I would rather not get into a versioning finding that says "well,
you can version.  You can do it incompatibily or compatibly.  You
choose."  If I was to do anything, I would make the finding a harder
line on compatible versioning, and change it to "compatible versioning
finding" rather than generalizing.  I prefer findings that say "do x".
I know it's a choice between x and not x, but I think there is much more
pain in the world for not planning for compatible versioning than there
is for planning for compatible versioning.  I hope that helps explain
the motivation, feel free to pushback again. 

> 5, Example 3: I think you should mention XML Schema 
> (non)determinism pitfalls here if you show the Schema, now 
> you only mention in the last paragraph of 5 "Attribute 
> extensions do not have non-determinism issues..."
> without further explanation. The problem is important enough 
> to mention here.

Yes.  I've rewritten this a little bit to mention why this example is
used, and a bit more.

> 
> 8: The Chapter title should be "Version Identification 
> Strategies *Using Namespaces*" since in the previous part you 
> describe alternatives not using namespaces (although you 
> recommend using namespaces).

Yes

> 
> 8, numbered list:
> http://www.oasis-open.org/archives/ubl/200511/msg00014.html 
> and its predecessors list languages which use those various 
> approaches, maybe this could serve as illustrative examples. 
> As an addition, HL7v3 currently has a single namespace for 
> all versions and uses (sevaral flavours of) version 
> indicators in the instance.

Interesting idea.  I don't want to get into a survey though, because
there are potentially many different approaches and implementations.  It
seems like potentially anybody not listed will ask to be listed, and so
it goes..

> 
> As for approach 1, "all components in new namespace(s) for 
> each version", when proposing this we met with severe 
> opposition from our clients. Since a new namespace means new 
> Schema's, consumers which do not use any features of the new 
> language are faced with an upgrade if they want to process 
> messages from new producers - ven when those messages contain 
> only 'old' information content. Since our client's software 
> release policy required regression testing for all changes in 
> software (also Schema's) this would mean a serious effort 
> without any benefits. I think this drawback deserves mentioning here.

I totally agree with the pushback, which is also why I pushed back on
UBL's adoption of this.  But I do think that a new *or* old namespace
for a compatible change can (but not must) involve a new Schema.
However, I've added a paragraph on this con.
> 
> 9: There is another strategy to versioning which you do not 
> mention: a producer simply lists in an instance which 
> consumer versions may process the message. A producer could 
> thus simply say "Consumers who understand version
> 2 or 3 may process this message". The advantage is you don't 
> need mustUnderstand flags everywhere. If a newer version of a 
> language L2 contains an optional item whose understanding is 
> mandatory, the producer could require L2 consumers if the 
> optional item occurs, and L1 or L2 consumers if the optional 
> item does not occur. Of course the number of versions could 
> theoretically become high, but in practice there often aren't 
> that many versions of a language: we have two XML's, two 
> SOAP's, two UBL's, so this approach is feasible in practice. 
> It works for forward (in)compatibility since it requires a 
> newer producer who knows the capabilities of older consumers. 

Interesting approach.  However, I'm not quite sure I follow it
completely.  Let's take my name/given/family/middle example.  If I have
V2 which adds optional middle, would it look something like:

<name xmlns="http://www.example.org/name/1" worksForVersion="1">
  <given>Dave</given>
  <family>Orchard</family>
</name>

<!-- then a producer that knows about V1 and V2 creates an instance that
doesn't have the middle -->
<name xmlns="http://www.example.org/name/1" worksForVersion="1 2">
  <given>Dave</given>
  <family>Orchard</family>
</name>

<name xmlns="http://www.example.org/name/1" worksForVersion="2">
  <given>Dave</given>
  <family>Orchard</family>
  <middle>Bryce</middle>
</name>

?  I think the idea of listing multiple versions is extremely
interesting.  I have a long time blog post sitting on the backburner to
talk about what does a single version # *really* mean in scenario #2 (as
in, whats the version *of*).

> 
> In general can backwards (in)compatibility be defined in the 
> language specification, since the implementer of a new 
> consumer will know from the specification which older 
> versions of the language are processable by it, but forwards 
> (in)compatibility must be defined in the instance, since the 
> implementer a newer producer may not know which versions the 
> consumers are. 

I think that's right.  Languages always know about the previous
languages, but languages don't know about the subsequent languages..

> 
> 10. You probably should mention one drawback to extensioning: 
> if multiple parties "invent" the same (functional) extension 
> which comes in a new version, getting the extensions back in 
> sync in the new version may meet with opposition. I don't say 
> this is a reason for not using extensioning, but I think in 
> fairness it should be mentioned.

I think this is a big advantage of namespaces.  That way 2 different
entities can't come up with the same extension.  So I don't see that
problem coming up in the context of section 10.  Or do you mean that
there are identically named and incompatible extensions within the same
namespace? 
> 
> I hope this is helpful, and do want to stress again my 
> appreciation for this effort.
> 

Thanks again,
Dave

Received on Thursday, 28 September 2006 04:34:07 UTC