RE: (Partial) review of Versioning XML

Great comments.  I've gone through and done things based upon everything
you've suggested.  Perhaps to the degree you are happy with, and I'm
sure you'll let me know.  I'm responding inline to many of them..
> >   1.1 XML Terminology
> > 
> >    There are many different systems for exchanging texts in 
> languages, such
> >    as SQL, Java, XML, ECMAScript, C#. We will briefly 
> describe some key
> >    refinements to our lexicon for XML. An XML language has 
> a vocabulary that
> >    may use terms from one or more XML Namespaces (or none), 
> each of which has
> >    a namespace name. [Definition: An XML language is an 
> identifiable set of
> >    vocabulary terms with defined XML syntactic and semantic 
> constraints. ] By
> >    XML language, we mean the set of elements and 
> attributes, or instances,
> >    used by a particular application.
> 
> Really? How does "used by a particular application" fit in? I 
> would have thought that we meant the set of instances that 
> conform to the rules of the language independent of any 
> particular application. Surely my XML language is a language 
> even before there are any applications that are expecting to 
> process it.

Right..  Here's what I think is the right solution.. "Definition: An XML
Language is a Language where the text MUST be well-formed XML"

>
> >    as purchase orders. The purchase order texts may contain 
> name elements.
> >    Thus instances of a language are always part of a text 
> and also may 
> > be the
> 
> This paragraph begins with a definition of the term 
> "instance" as a specific, discrete Text, but this sentence 
> says that instances are always part of a text. I don't find 
> those two uses of the word "instance" compatible. What did you mean?

I'm trying to come up with something where an instance is specific Text,
but can also use the word instance to talk about a fragment of text.  In
the example of a PO that contains a Name, the PO in it's entirety is an
instance, and so is the Name "part".   What do you think?

> 
> >    entire text. XML instances (and all other instances of 
> markup languages)
> >    consist of markup and content. In the name example, the 
> given and family
> >    elements including the end markers are the markup. The 
> values between the
> >    start and end markers are the content. An instance has 
> an information
> >    model. There are a variety of data models within and 
> without the W3C, and
> >    the one standardized by the W3C is the XML infoset.
> 
> I suggest you drop the references to information models. As 
> far as I can tell, you don't refer to it anywhere else in the 
> document.

I kept it because I wanted to differentiate the information Set that our
part 1 talks about, and the XML specific information Set.  Perhaps a bit
more elaboration?  Or still do you think it should be dropped.
> 
> >    The XML related terms and their relationships are shown below
> > 
> >    UML diagram of XML terms
> > 
> >    A stylesheet processor is a consumer of the XML text that it is 
> > processing
> 
> A stylesheet processor? What's the context for this 
> paragraph. I'm lost.

Right.  I've fixed that up by adding "Some examples of XML consumers and
producers are: "
> 
> [...]
> >        There are a couple types of XML extension languages, 
> element extension
> >        and attribute extension.
> >           * Element Extension. Languages that are elements. 
> SOAP, etc. are
> >             element extensions.
> > 
> >           * Attribute or type Extensions. Languages that 
> are types or
> 
> How can a language be a type?

Fixed by saying "Languages that define types.."
> 
> >             attributes. These languages must exist in the 
> context of an
> >             element. Sometimes called "parasite" languages 
> as they require a
> >             "host" element. XLink is an example.
> 
> The introductory sentence says "element extension" and 
> "attribute extension", but the bullets are "element 
> extension" and "attribute or type extension" Where'd the 
> "type" bit come from?

Fixed by adding "or type" 

> 
> >      * Mixtures: languages designed for, or often used for, 
> encapsulating
> >        some semantics inside another language. For example, 
> MathML might be
> >        mixed inside of another language.
> > 
> >    This is by no means an exhaustive list. Nor are these categories
> >    completely clear cut. MathML can certainly be used 
> standalone, for
> >    example, and languages like SVG are a combination of standalone,
> >    containers, and mixtures.
> > 
> > 2 XML Language Requirements
> > 
> >    The general language questions described in Part 1 Requirements
> >    (../versioning#requirements). These requirements are 
> augmented in XML by:
> > 
> >      * Fidelity of XML Schema for the versions of the language.
> 
> I don't understand that sentence.

Fixed by saying "Fidelity (or richness or degree of description) of XML
Schema for the versions of the language. By fidelity, we mean the degree
to which the language is described.  "
> 
> 
<snip/>
> 
> >        tools (like XPath) are more difficult to use with 
> multiple namespaces
> >        containing the same "thing", like XHTML's P element.
> 
> But XHTML only has one namespace, so how is this example relevant?

I thought there were 3 versions.  
> 
> > 3 Version Identification technologies
> > 
> >    Version identification of elements and attributes is critical for
> >    correctly processing xml documents.
> 
> I don't believe that. As a blanket statement, it's too 
> extreme. I process DocBook documents everyday with hardly a 
> thought about versioning.

Often critical?
> 
> [...]
> >   3.1 Qualified Name: Namespace + Local name
> > 
> >    The Namespaces specification defines a Qualified Name as 
> the Namespace and
> >    Local Name of a component.
> 
> This is the first use of the word "component". It's used 
> often in the text that follows, but I'm not completely sure 
> what it is. Is it really something for which we need a new 
> term? If so, can you give a crisp definition of "component"?
> 

Right.  I've add a "component" definition that is element or attribute.

> [...]
> >   3.2 Type
> > 
> >    Many systems use type information associated with the 
> component as part of
> >    the version identification of the component. There are 
> generally two
> >    strategies for determining the type of a component, 
> which we will call
> >    "Top-typing" and "Bottom-typing". In many of the 
> examples that will 
> > be
> 
> I suggest you describe more explicitly what you mean by 
> top-typing and bottom-typing. I can infer it from what 
> follows, but it would be better to be explicit, I think.

Revised extensively.

> 
> [...]
> >    The use of types and the ability to re-use these types
> >    across elements is an important factor in component version
> >    identification.
> 
> How so?

How about: The decision to use types and re-use types across components
is an important factor in component version identification because the
component definition and the component's type may be versioned
separately.  
> 
> >   3.3 Version Numbers
> [...]
> > 4 Component version identification strategies
> [...]
> >     1. all components in new namespace(s) for each version
> > 
> >        ie version 1 consists of namespaces a + b, version 
> 1.1 consists of
> >        namespaces c + d; or version 1 consists of namespace 
> a, version 1.1
> >        consists of namespace b.
> 
> I find it ironic that version numbers are treated somewhat 
> dismissively as a versioning strategy but the rest of the 
> document turns around and uses them almost exclusively for 
> distinguishing between versions.
> 
> This suggests to me that perhaps version numbers are a 
> workable strategy.

I know, I know, I know.  But how in normal text can I easily identify
versions?  Should I say "The first version consists of namespaces a + b,
the 2nd version consists of namespaces c + d"
?

But changing from "1" to "First" seems like sophistry to me.

> 
> >     2. all new components in new namespace(s) for each compatible 
> > version
> > 
> >        ie version 1 consists of namespaces a + b; version 
> 1.1 consists of
> >        namespaces a + b + c; version 2.0 consists of 
> namespaces d + e.
> > 
> >     3. all new components in existing or new namespace(s) 
> for each compatible
> >        version
> > 
> >        ie version 1 consists of namespace a, version 1.1 consists of
> >        namespace a, version 2 consists of namespace b; or 
> version 1 consists
> >        of namespace a, version 1.1 consists of namespace a + b.
> > 
> >     4. all new components in existing or new namespace(s) 
> for each version
> >        and a version identifier
> > 
> >        ie version 1 consists of namespace a + b + version 
> attribute "1",
> >        version 2 consists of namespace c + d + version 
> attribute "2".
> > 
> >     5. all components in existing namespace(s) for each 
> version (compatible
> >        and incompatible) and a version identifier
> > 
> >        ie version 1 consists of namespace a + version 
> attribute "1.0",
> >        version 1.1 consists of namespace a + version 
> attribute "1.1", version
> >        2.0 consists of namespace a + version attribute "2.0".
> 
> It's probably worth noting that this isn't an exhaustive list.
> 

I did say at the top "A few of the most common are listed below and
described in more detail
later."  More note needed?

> [...]
> >    Example 1: All components in new namespace(s) instances
> > 
> [...]
> >  <personName xmlns="http://www.example.org/name/3">
> >    <given>Dave</given>
> >    <family>Orchard</family>
> >    <midns:middle 
> > 
> xmlns:midns="http://www.example.org/name/3/mid/1">Bryce</midns:middle>
> >  </personName>
> > 
> >  <personName xmlns="http://www.example.org/name/3">
> >    <given>Dave</given>
> >    <family>Orchard</family>
> >    <middiffdomain:middle 
> > 
> xmlns:middiffdomain="http://www.example.com/mid/1">Bryce</middiffdomai
> > n:middle>
> >  </personName>
> > 
> >    The 2nd and 3rdexamples shows all the components in the same new
> >    namespace, with the 3rd showing a new name as well.. The 
> 4th and 5th
> >    example show an additional middle element in 2 different 
> namespace names.
> >    The 4th example comes from a namespace name that is in 
> the same domain as
> >    the name element's new namespace name. One reason for 2 
> namespaces is to
> >    modularize the language. The 4th example shows a namespace name 
> > from a
> 
> s/4th/5th/
> 
> >    different domain for the middle. It is probable that the 
> midns:middle was
> >    created by the name author, and the middiffdomain:middle 
> was created by a
> >    3rd party.
> 
> I wouldn't care to state a probability for that assertion :-)
> 

I did switch from example.org to example.com :-)

> More importantly, is there really anything important to be 
> said about the difference between versioning changes made by 
> the original authors and changes made by a third party.

That is a huge point with namespaces, and currently we make no use of
the same domain for namespace names in any versioning work.  
> 
> Do we really think that software might check the authority 
> component of the two URIs and behave differently based on 
> whether or not they're the same?

I actually think we ought to, but that's separate
> 
> It might be worth mentioning this observation in the prose, 
> but I think it's unnecessary and confusing to include it in 
> an example.

Right, I've removed it.

> 
> >     4.1.1 Compatibility
> > 
> >    In this strategy, forwards compatibility is not desired. 
> Any change or
> >    extension is an incompatible change with an existing 
> consumer. When an
> >    older consumer receives the new texts in the new 
> namespace, most of the
> >    software will break,
> 
> I think saying "most of the software will break" is, again, 
> too extreme.
> Whether or not the software will break depends on a large 
> number of factors.

Hmm.. I don't have a problem with say that "most of the software will
break with a namespace name change for all components".  It seems that
the only places are specially designed systems are types like UBL.
Every other XML system I know of will break if the namespace names
change.   What's wrong with saying "most"?  I don't say "all"...

> 
> [...]
> >    Example 4: New components in existing or new 
> namespace(s) with version
> >    identifier instances
> > 
> [...]
> >  <personName xmlns="http://www.example.org/name/1" version="1.0">
> >    <given>Dave</given>
> >    <family>Orchard</family>
> >    <midns:middle 
> > xmlns:midns="http://www.example.org/name/mid/1">Bryce</midns:middle>
> >  </personName>
> > 
> >  <personName xmlns="http://www.example.org/name/1" version="2.0">
> >    <given>Dave</given>
> >    <family>Orchard</family>
> >    <midns:middle 
> > xmlns:midns="http://www.example.org/name/mid/1">Bryce</midns:middle>
> >  </personName>
> > 
> >  <personName xmlns="http://www.example.org/name/2" version="2.0">
> >    <given>Dave</given>
> >    <family>Orchard</family>
> >    <middle>Bryce</middle>
> >  </personName>
> > 
> >    The last two examples show that the middle is now a 
> mandatory part of the
> >    name. This is indicated by just the version number or a 
> new namespace plus
> >    version number.
> 
> How does the change from version "1.0" to version "2.0" 
> indicate that the middle is now mandatory? I don't get that at all.

Right, good point.  How about "The last two examples use a major version
number change to show that the middle is now a mandatory part of the
name.   This is indicated by just the version number or a new namespace
plus version number."

<snip/>
> [...]
> >   5.2 Incompatible
> > 
> >    A version author can use new namespace names, local 
> names, or version
> >    numbers to indicate an incompatible change. An extension 
> author may not
> >    have these mechanisms available for indicating an 
> incompatible extension.
> >    A language designer that wants to allow extension 
> authors to indicate
> >    incompatible extension must provide a mechanism for 
> indicating that
> >    consumers must understand the extension.
> 
> Which consumers must understand it and why? And what if they 
> misunderstand it?

Right.  How about something like: A language designer that wants to
allow
extension authors to indicate that an extension is incompatible must
provide a mechanism
for indicating that consumers must understand the extension, and the
consumer must generate an error if it does not understand the extension.
If only specific consumers must understand the extension, then the
language designer must also provide a mechanism for indicating which
consumers.  

> 
> >    If the language designer has also
> >    allowed for forwards compatibility, then the forwards 
> compatibility rule
> >    must be over-ridden
> > 
> >    Good Practice
> > 
> >    Provide Forwards Compatibility Override Rule: Languages 
> with forwards
> >    compatibility support SHOULD provide an override for indicating
> >    incompatible extensions.
> 
> I'm not sure I believe this good practice. As I recall, Roy 
> argued pretty strongly and persuasively against it.
> 

How about I change the SHOULD to MAY?
	
> [...]
> >    Example 7: Using SOAP Must Understand
> > 
> >  <soap:envelope>
> >    <soap:body>
> >      <personName xmlns="http://www.example.org/name/1">
> >      <given>Dave</given>
> >      <family>Orchard</family>
> >     </personName>
> >    </soap:body>
> >  </soap:envelope>
> > 
> >  <soap:envelope>
> >    <soap:header>
> >    <midns:middle xmlns:midns="http://www.example.org/name/mid/1"
> >                  soap:mustUnderstand="true">
> >        Bryce
> >    </midns:middle>
> >    </soap:header>
> >    <soap:body>
> >      <personName xmlns="http://www.example.org/name/1">
> >      <given>Dave</given>
> >      <family>Orchard</family>
> >     </personName>
> >    </soap:body>
> >  </soap:envelope>
> 
> I imagine that midns:middle header is designed to make sure 
> that the middle name will be understood. Is it then 
> intentional and/or significant that the body doesn't contain a middle?
> 

I added "Use of a SOAP header for an extension may be because the body
was not designed to be extensible, or because the extension is
considered semantically separate from the body and will typically be
processed differently than the body."

> > 6 XML Schema 1.0
> 
> I'll try to get the rest of this reviewed this weekend.
> 
> In broad strokes, I think it's good work. Editorially, there 
> are a lot of incomplete sentences and other issues, but I'm 
> sure we can fix those.

Excellent!
> 
> The implicit focus of the document is clearly XML versioning 
> strategies in a W3C XML Schema-based, web-services style 
> environment. I appreciate that that is a large and 
> significant environment. But it's not the only environment 
> and I don't think that the document is as explicit as it 
> could be about its scope.

What limits the document to web-services style environment?  I think
this document completely applies to any XML-Schema based environment,
like a Yahoo Search API that uses Schema.  Or when you say "web-services
style environment", do you mean roughly what we called "open systems" in
part 1?  It is definitely about systems that are under more than one
adminstrative domain and attempts to help authors avoid that one in
Deutsch's 8 fallacies. 

Received on Friday, 18 May 2007 04:46:32 UTC