RE: (Partial) review of Versioning XML from David Orchard on 2007-06-12 (www-tag@w3.org from June 2007)

From: David Orchard <dorchard@bea.com>
Date: Tue, 12 Jun 2007 14:38:22 -0700
To: "Norman Walsh" <ndw@nwalsh.com>
Cc: <www-tag@w3.org>
Message-ID: <BEBB9CBE66B372469E93FFDE3EDC493E29140C@repbex01.amer.bea.com>
Follow-on comments inline 

> -----Original Message-----
> From: Norman Walsh [mailto:ndw@nwalsh.com] 
> Sent: Monday, May 21, 2007 7:11 AM
> To: David Orchard
> Cc: www-tag@w3.org
> Subject: Re: (Partial) review of Versioning XML
> 
> / "David Orchard" <dorchard@bea.com> was heard to say:
> |> Really? How does "used by a particular application" fit 
> in? I would 
> |> have thought that we meant the set of instances that 
> conform to the 
> |> rules of the language independent of any particular application. 
> |> Surely my XML language is a language even before there are any 
> |> applications that are expecting to process it.
> |
> | Right..  Here's what I think is the right solution.. 
> "Definition: An 
> | XML Language is a Language where the text MUST be well-formed XML"
> 
> I think that's just a definition of XML. I'd expect a 
> language to have some extra-syntactic constraints too: a 
> grammar of some sort.

How about "Definition: An XML Language is a Language where the text MUST
be well-formed XML and the texts are usually constrained by a schema
language.  The schema language may be machine processable such as DTDs,
XML Schema, Relax NG, or the schema language may be human readable
text."

> 
> |> >    as purchase orders. The purchase order texts may contain
> |> name elements.
> |> >    Thus instances of a language are always part of a text
> |> and also may
> |> > be the
> |> 
> |> This paragraph begins with a definition of the term 
> "instance" as a 
> |> specific, discrete Text, but this sentence says that instances are 
> |> always part of a text. I don't find those two uses of the word 
> |> "instance" compatible. What did you mean?
> |
> | I'm trying to come up with something where an instance is specific 
> | Text, but can also use the word instance to talk about a 
> fragment of 
> | text.  In the example of a PO that contains a Name, the PO 
> in it's entirety is an
> | instance, and so is the Name "part".   What do you think?
> 
> I think that's going to be confusing. If you want to make the 
> distinction between an instance as a specific Text and an 
> instance as a fragment, I think you're going to have to be 
> very careful to always say either "instance document" or 
> "element instance" or something like that; a qualified 
> "instance" in every case.

Yuck.  But in most cases, I don't think we need to make the distinction.
For our purposes, it's just the text extracted from a document either
the whole document or the element.  Do you think that the common
understanding of an XML Instance is both or one of those?

> 
> |> I suggest you drop the references to information models. 
> As far as I 
> |> can tell, you don't refer to it anywhere else in the document.
> |
> | I kept it because I wanted to differentiate the information 
> Set that 
> | our part 1 talks about, and the XML specific information 
> Set.  Perhaps 
> | a bit more elaboration?  Or still do you think it should be dropped.
> 
> I didn't find any place where I felt reference to an 
> information model would have helped, but perhaps that's 
> becaues I'm so familiar with XML information models. If you 
> think it's an important point then I think more elaboration, 
> or some subsequent reference to it, is necessary.

I think it will become necessary when we flesh out the notions of
compatibility, which are related to information extracted.  

> 
> |> >      * Fidelity of XML Schema for the versions of the language.
> |> 
> |> I don't understand that sentence.
> |
> | Fixed by saying "Fidelity (or richness or degree of description) of 
> | XML Schema for the versions of the language. By fidelity, 
> we mean the 
> | degree to which the language is described.  "
> 
> I don't think that helps. "Fidelity" is about accuracy or 
> faithfulness to some standard, it isn't about richness or 
> elaborateness.

I disagree a bit because I think fidelity has degrees such as "high" or
"low" that resonate with people.  But I can live with accuracy or
completeness and precision might even be a good word too.

How about "Accuracy of XML Schema for the versions of the language. "

<snip/>

> 
> |> [...]
> |> >    The use of types and the ability to re-use these types
> |> >    across elements is an important factor in component version
> |> >    identification.
> |> 
> |> How so?
> |
> | How about: The decision to use types and re-use types across 
> | components is an important factor in component version 
> identification 
> | because the component definition and the component's type may be 
> | versioned separately.
> 
> I'll have to see that in context again, but I think it's better.

Let me know.

> 
> |> >   3.3 Version Numbers
> |> [...]
> |> > 4 Component version identification strategies
> |> [...]
> |> >     1. all components in new namespace(s) for each version
> |> > 
> |> >        ie version 1 consists of namespaces a + b, version
> |> 1.1 consists of
> |> >        namespaces c + d; or version 1 consists of namespace
> |> a, version 1.1
> |> >        consists of namespace b.
> |> 
> |> I find it ironic that version numbers are treated somewhat 
> |> dismissively as a versioning strategy but the rest of the document 
> |> turns around and uses them almost exclusively for distinguishing 
> |> between versions.
> |> 
> |> This suggests to me that perhaps version numbers are a workable 
> |> strategy.
> |
> | I know, I know, I know.  But how in normal text can I 
> easily identify 
> | versions?  Should I say "The first version consists of 
> namespaces a + 
> | b, the 2nd version consists of namespaces c + d"
> | ?
> |
> | But changing from "1" to "First" seems like sophistry to me.
> 
> I'd play it the other way around, the fact that version 
> numbers are clearly useful and natural suggests that perhaps 
> they shouldn't be treated so dismissively as a strategy.

I think they are commonly used, but also commonly misused.  I am finding
that I'm starting to realize that the use of version #s means that more
interesting things are possible wrt forwards compatibility than
namespaces BUT it's rare to seen anybody utilize version #s really well.


I think that the point is becoming well taken, that version #s with XML
aren't just a crazy strategy with namespaces available.  If we can
provide some clear guidance on how to use them, that would be wonderful.


> 
> |> More importantly, is there really anything important to be 
> said about 
> |> the difference between versioning changes made by the original 
> |> authors and changes made by a third party.
> |
> | That is a huge point with namespaces, and currently we make 
> no use of 
> | the same domain for namespace names in any versioning work.
> 
> If you're making a proposal that
> 
>   http://nwalsh.com/ns/name-extension
> 
> is more different from
> 
>   http://www.example.com/ns/name
> 
> than
> 
>   http://www.example.com/ns/name-extension
> 
> And that applications might treat the extensions differently 
> because the domain name is or is not the same, I think you 
> need to expand on this quite a bit. That's a fairly 
> substantial and radical proposal.

I think it's a very interesting possibility to use the first part of a
namespace name and do pattern matching to determine who's doing the
extension.  I think it could be very appropriate for W3C
specificiations.  One example that I have in mind is WS-Policy.
Currently, the extensibility model roughly says that any unknown
extension is treated as a Policy Assertion.  This means it is subject to
the Policy normalization rules.  A more sophisticated way could say that
any unknown extension not defined with a base of
http://www.w3.org/ns/ws-policy is treated as a Policy Assertion, and any
unknown extension defined with a base of http://www.w3.org/ns/ws-policy
is a Policy Language extension and is not treated as a Policy Assertion.

But, I know of no places that do that so I don't want to go into too
much detail in the tag finding.  

<snip/>

> |> >    The last two examples show that the middle is now a
> |> mandatory part of the
> |> >    name. This is indicated by just the version number or a
> |> new namespace plus
> |> >    version number.
> |> 
> |> How does the change from version "1.0" to version "2.0" 
> |> indicate that the middle is now mandatory? I don't get that at all.
> |
> | Right, good point.  How about "The last two examples use a major 
> | version number change to show that the middle is now a 
> mandatory part of the
> | name.   This is indicated by just the version number or a 
> new namespace
> | plus version number."
> 
> Well, the problem is that the version number doesn't show 
> anything at all. I think this needs to be turned around. Perhaps:
> 
> In the last example, the version number has been changed from 
> 1.0 to 2.0. Incrementing the major part of a version number 
> often indicates a degree of backwards incompatible change. In 
> this case, perhaps it indicates that the middle name is now 
> mandator where it had previously been optional.

Done as "In the last two example, the version number has been changed
from 1.0 to 2.0. Incrementing the major part of a version number often
indicates an incompatible change. In this case, perhaps it indicates
that the middle name is now mandatory where it had previously been
optional." 

<snip/>
> |
> |> >    If the language designer has also
> |> >    allowed for forwards compatibility, then the forwards
> |> compatibility rule
> |> >    must be over-ridden
> |> > 
> |> >    Good Practice
> |> > 
> |> >    Provide Forwards Compatibility Override Rule: Languages
> |> with forwards
> |> >    compatibility support SHOULD provide an override for 
> indicating
> |> >    incompatible extensions.
> |> 
> |> I'm not sure I believe this good practice. As I recall, Roy argued 
> |> pretty strongly and persuasively against it.
> |> 
> |
> | How about I change the SHOULD to MAY?
> 
> Well, that's certainly ok, but it weakens the good practice 
> to the point where it becomes dubious to call it out 
> specifically as a good practice.
> 
> I think we should probably attempt to wrestle this one to the 
> ground and see if we really have community support that it is 
> a good practice.

How about making it a sentence, removing the GPN, and adding a
condition.  Something like:
"Languages with forwards compatibility support MAY provide an override
for indicating incompatible extensions but should only do so IF the
incompatible extensions can be clearly targeted or scoped".  
> 
> |> [...]
> |> >    Example 7: Using SOAP Must Understand
> |> > 
> |> >  <soap:envelope>
> |> >    <soap:body>
> |> >      <personName xmlns="http://www.example.org/name/1">
> |> >      <given>Dave</given>
> |> >      <family>Orchard</family>
> |> >     </personName>
> |> >    </soap:body>
> |> >  </soap:envelope>
> |> > 
> |> >  <soap:envelope>
> |> >    <soap:header>
> |> >    <midns:middle xmlns:midns="http://www.example.org/name/mid/1"
> |> >                  soap:mustUnderstand="true">
> |> >        Bryce
> |> >    </midns:middle>
> |> >    </soap:header>
> |> >    <soap:body>
> |> >      <personName xmlns="http://www.example.org/name/1">
> |> >      <given>Dave</given>
> |> >      <family>Orchard</family>
> |> >     </personName>
> |> >    </soap:body>
> |> >  </soap:envelope>
> |> 
> |> I imagine that midns:middle header is designed to make 
> sure that the 
> |> middle name will be understood. Is it then intentional and/or 
> |> significant that the body doesn't contain a middle?
> |> 
> |
> | I added "Use of a SOAP header for an extension may be 
> because the body 
> | was not designed to be extensible, or because the extension is 
> | considered semantically separate from the body and will 
> typically be 
> | processed differently than the body."
> 
> That still leaves my question: is it intentional and/or 
> significant that the body doesn't contain a midns:middle 
> after you've gone to the trouble of making sure the consumer 
> will understand it? If the example would not be correct 
> and/or more clear if the personName in the soap:body 
> contained a midns:middle, then I'm missing something 
> significant about the example.

It is intentional.  There are two reasons why:
1) the personName might not have been extensible, regardless of
MustUnderstand
2) the personName may have been extensible, but personName didn't
support applying a mustUnderstand flag.  

The case of where personName is extensible and has a mustUnderstand flag
is shown in the other mustUnderstand example.

> 
> |> The implicit focus of the document is clearly XML versioning 
> |> strategies in a W3C XML Schema-based, web-services style 
> environment. 
> |> I appreciate that that is a large and significant environment. But 
> |> it's not the only environment and I don't think that the 
> document is 
> |> as explicit as it could be about its scope.
> |
> | What limits the document to web-services style environment? 
>  I think 
> | this document completely applies to any XML-Schema based 
> environment, 
> | like a Yahoo Search API that uses Schema.  Or when you say 
> | "web-services style environment", do you mean roughly what 
> we called 
> | "open systems" in part 1?  It is definitely about systems that are 
> | under more than one adminstrative domain and attempts to 
> help authors 
> | avoid that one in Deutsch's 8 fallacies.
> 
> What I mean is that there seems to be a bias towards systems that are
> (1) using XML Schema for describing constraints (2) 
> constructing "typed object graphs" as a mechanism for 
> representing XML documents and (3) aborting processing unless 
> full validity is obtained.
> 
> There are clearly other strategies. At the other end of the 
> spectrum is the HTML model where the browser accepts just 
> about anything and does its best. In the middle are systems 
> like the one I use every day for formatting DocBook documents 
> where failure to validate may produce distinctive error 
> output but it doesn't prevent the user from pressing on if 
> they really insist.

Hmm.  The first part of the document is non-XML Schema, says little
except a description about types, and says nothing about Compatible
extensions.  Now do you think that up to Section 6, it's still missing
the mark?  Obviously, starting in section 6 it's XML Schema all the way
but maybe we can figure out how to make 1-5 more reflective of what
you'd like.

Cheers,
Dave
Received on Tuesday, 12 June 2007 21:38:45 UTC