Re: (Partial) review of Versioning XML from Norman Walsh on 2007-05-21 (www-tag@w3.org from May 2007)

From: Norman Walsh <ndw@nwalsh.com>
Date: Mon, 21 May 2007 10:10:31 -0400
To: "David Orchard" <dorchard@bea.com>
Cc: <www-tag@w3.org>
Message-ID: <87irams1wo.fsf@nwalsh.com>
/ "David Orchard" <dorchard@bea.com> was heard to say:
|> Really? How does "used by a particular application" fit in? I 
|> would have thought that we meant the set of instances that 
|> conform to the rules of the language independent of any 
|> particular application. Surely my XML language is a language 
|> even before there are any applications that are expecting to 
|> process it.
|
| Right..  Here's what I think is the right solution.. "Definition: An XML
| Language is a Language where the text MUST be well-formed XML"

I think that's just a definition of XML. I'd expect a language to
have some extra-syntactic constraints too: a grammar of some sort.

|> >    as purchase orders. The purchase order texts may contain 
|> name elements.
|> >    Thus instances of a language are always part of a text 
|> and also may 
|> > be the
|> 
|> This paragraph begins with a definition of the term 
|> "instance" as a specific, discrete Text, but this sentence 
|> says that instances are always part of a text. I don't find 
|> those two uses of the word "instance" compatible. What did you mean?
|
| I'm trying to come up with something where an instance is specific Text,
| but can also use the word instance to talk about a fragment of text.  In
| the example of a PO that contains a Name, the PO in it's entirety is an
| instance, and so is the Name "part".   What do you think?

I think that's going to be confusing. If you want to make the
distinction between an instance as a specific Text and an instance as a
fragment, I think you're going to have to be very careful to always
say either "instance document" or "element instance" or something like
that; a qualified "instance" in every case.

|> I suggest you drop the references to information models. As 
|> far as I can tell, you don't refer to it anywhere else in the 
|> document.
|
| I kept it because I wanted to differentiate the information Set that our
| part 1 talks about, and the XML specific information Set.  Perhaps a bit
| more elaboration?  Or still do you think it should be dropped.

I didn't find any place where I felt reference to an information model
would have helped, but perhaps that's becaues I'm so familiar with XML
information models. If you think it's an important point then I think
more elaboration, or some subsequent reference to it, is necessary.

|> >      * Fidelity of XML Schema for the versions of the language.
|> 
|> I don't understand that sentence.
|
| Fixed by saying "Fidelity (or richness or degree of description) of XML
| Schema for the versions of the language. By fidelity, we mean the degree
| to which the language is described.  "

I don't think that helps. "Fidelity" is about accuracy or faithfulness
to some standard, it isn't about richness or elaborateness.

Perhaps "completeness" instead of fidelity?

|> > 3 Version Identification technologies
|> > 
|> >    Version identification of elements and attributes is critical for
|> >    correctly processing xml documents.
|> 
|> I don't believe that. As a blanket statement, it's too 
|> extreme. I process DocBook documents everyday with hardly a 
|> thought about versioning.
|
| Often critical?

Often important?

|> [...]
|> >    The use of types and the ability to re-use these types
|> >    across elements is an important factor in component version
|> >    identification.
|> 
|> How so?
|
| How about: The decision to use types and re-use types across components
| is an important factor in component version identification because the
| component definition and the component's type may be versioned
| separately.  

I'll have to see that in context again, but I think it's better.

|> >   3.3 Version Numbers
|> [...]
|> > 4 Component version identification strategies
|> [...]
|> >     1. all components in new namespace(s) for each version
|> > 
|> >        ie version 1 consists of namespaces a + b, version 
|> 1.1 consists of
|> >        namespaces c + d; or version 1 consists of namespace 
|> a, version 1.1
|> >        consists of namespace b.
|> 
|> I find it ironic that version numbers are treated somewhat 
|> dismissively as a versioning strategy but the rest of the 
|> document turns around and uses them almost exclusively for 
|> distinguishing between versions.
|> 
|> This suggests to me that perhaps version numbers are a 
|> workable strategy.
|
| I know, I know, I know.  But how in normal text can I easily identify
| versions?  Should I say "The first version consists of namespaces a + b,
| the 2nd version consists of namespaces c + d"
| ?
|
| But changing from "1" to "First" seems like sophistry to me.

I'd play it the other way around, the fact that version numbers are
clearly useful and natural suggests that perhaps they shouldn't be
treated so dismissively as a strategy.

| I did say at the top "A few of the most common are listed below and
| described in more detail
| later."  More note needed?

No, perhaps I just missed that.

|> More importantly, is there really anything important to be 
|> said about the difference between versioning changes made by 
|> the original authors and changes made by a third party.
|
| That is a huge point with namespaces, and currently we make no use of
| the same domain for namespace names in any versioning work.  

If you're making a proposal that

  http://nwalsh.com/ns/name-extension

is more different from

  http://www.example.com/ns/name

than

  http://www.example.com/ns/name-extension

And that applications might treat the extensions differently because the
domain name is or is not the same, I think you need to expand on this
quite a bit. That's a fairly substantial and radical proposal.

|> I think saying "most of the software will break" is, again, 
|> too extreme.
|> Whether or not the software will break depends on a large 
|> number of factors.
|
| Hmm.. I don't have a problem with say that "most of the software will
| break with a namespace name change for all components".  It seems that
| the only places are specially designed systems are types like UBL.
| Every other XML system I know of will break if the namespace names
| change.   What's wrong with saying "most"?  I don't say "all"...

I guess my concern about "most" is that it's a pretty strong judgement.
While it's probably true in some domains, I'm not entirely comfortable
asserting that it's true in all (or even most :-) domains.

But if it doesn't bother anyone else, I'll let it go. As you point out,
you didn't say "all".

|> >    The last two examples show that the middle is now a 
|> mandatory part of the
|> >    name. This is indicated by just the version number or a 
|> new namespace plus
|> >    version number.
|> 
|> How does the change from version "1.0" to version "2.0" 
|> indicate that the middle is now mandatory? I don't get that at all.
|
| Right, good point.  How about "The last two examples use a major version
| number change to show that the middle is now a mandatory part of the
| name.   This is indicated by just the version number or a new namespace
| plus version number."

Well, the problem is that the version number doesn't show anything at
all. I think this needs to be turned around. Perhaps:

In the last example, the version number has been changed from 1.0 to
2.0. Incrementing the major part of a version number often indicates a
degree of backwards incompatible change. In this case, perhaps it
indicates that the middle name is now mandator where it had previously
been optional.

| <snip/>
|> [...]
|> >   5.2 Incompatible
|> > 
|> >    A version author can use new namespace names, local 
|> names, or version
|> >    numbers to indicate an incompatible change. An extension 
|> author may not
|> >    have these mechanisms available for indicating an 
|> incompatible extension.
|> >    A language designer that wants to allow extension 
|> authors to indicate
|> >    incompatible extension must provide a mechanism for 
|> indicating that
|> >    consumers must understand the extension.
|> 
|> Which consumers must understand it and why? And what if they 
|> misunderstand it?
|
| Right.  How about something like: A language designer that wants to
| allow
| extension authors to indicate that an extension is incompatible must
| provide a mechanism
| for indicating that consumers must understand the extension, and the
| consumer must generate an error if it does not understand the extension.
| If only specific consumers must understand the extension, then the
| language designer must also provide a mechanism for indicating which
| consumers.  

I like that better.

|
|> >    If the language designer has also
|> >    allowed for forwards compatibility, then the forwards 
|> compatibility rule
|> >    must be over-ridden
|> > 
|> >    Good Practice
|> > 
|> >    Provide Forwards Compatibility Override Rule: Languages 
|> with forwards
|> >    compatibility support SHOULD provide an override for indicating
|> >    incompatible extensions.
|> 
|> I'm not sure I believe this good practice. As I recall, Roy 
|> argued pretty strongly and persuasively against it.
|> 
|
| How about I change the SHOULD to MAY?

Well, that's certainly ok, but it weakens the good practice to the point
where it becomes dubious to call it out specifically as a good practice.

I think we should probably attempt to wrestle this one to the ground and
see if we really have community support that it is a good practice.

|> [...]
|> >    Example 7: Using SOAP Must Understand
|> > 
|> >  <soap:envelope>
|> >    <soap:body>
|> >      <personName xmlns="http://www.example.org/name/1">
|> >      <given>Dave</given>
|> >      <family>Orchard</family>
|> >     </personName>
|> >    </soap:body>
|> >  </soap:envelope>
|> > 
|> >  <soap:envelope>
|> >    <soap:header>
|> >    <midns:middle xmlns:midns="http://www.example.org/name/mid/1"
|> >                  soap:mustUnderstand="true">
|> >        Bryce
|> >    </midns:middle>
|> >    </soap:header>
|> >    <soap:body>
|> >      <personName xmlns="http://www.example.org/name/1">
|> >      <given>Dave</given>
|> >      <family>Orchard</family>
|> >     </personName>
|> >    </soap:body>
|> >  </soap:envelope>
|> 
|> I imagine that midns:middle header is designed to make sure 
|> that the middle name will be understood. Is it then 
|> intentional and/or significant that the body doesn't contain a middle?
|> 
|
| I added "Use of a SOAP header for an extension may be because the body
| was not designed to be extensible, or because the extension is
| considered semantically separate from the body and will typically be
| processed differently than the body."

That still leaves my question: is it intentional and/or significant that
the body doesn't contain a midns:middle after you've gone to the trouble
of making sure the consumer will understand it? If the example would not
be correct and/or more clear if the personName in the soap:body
contained a midns:middle, then I'm missing something significant about
the example.

|> The implicit focus of the document is clearly XML versioning 
|> strategies in a W3C XML Schema-based, web-services style 
|> environment. I appreciate that that is a large and 
|> significant environment. But it's not the only environment 
|> and I don't think that the document is as explicit as it 
|> could be about its scope.
|
| What limits the document to web-services style environment?  I think
| this document completely applies to any XML-Schema based environment,
| like a Yahoo Search API that uses Schema.  Or when you say "web-services
| style environment", do you mean roughly what we called "open systems" in
| part 1?  It is definitely about systems that are under more than one
| adminstrative domain and attempts to help authors avoid that one in
| Deutsch's 8 fallacies. 

What I mean is that there seems to be a bias towards systems that are
(1) using XML Schema for describing constraints (2) constructing "typed
object graphs" as a mechanism for representing XML documents and (3)
aborting processing unless full validity is obtained.

There are clearly other strategies. At the other end of the spectrum is
the HTML model where the browser accepts just about anything and does
its best. In the middle are systems like the one I use every day for
formatting DocBook documents where failure to validate may produce
distinctive error output but it doesn't prevent the user from pressing
on if they really insist.

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <ndw@nwalsh.com> | All professional men are handicapped by
http://nwalsh.com/            | not being allowed to ignore things
                              | which are useless.-- Goethe
Received on Monday, 21 May 2007 14:11:13 UTC