(Partial) review of Versioning XML from Norman Walsh on 2007-05-11 (www-tag@w3.org from May 2007)

From: Norman Walsh <ndw@nwalsh.com>
Date: Fri, 11 May 2007 16:10:38 -0400
To: www-tag@w3.org
Message-ID: <87d5176s01.fsf@nwalsh.com>
At the 23 Apr telcon, I took an action to review

  http://www.w3.org/2001/tag/doc/versioning-xml

What follows is the beginning of that review. More to follow this
weekend.

> Abstract
> 
>    This document is the XML related aspects of versioning. It describes XML

s/is the/discusses the/

>    based terminology, technologies and versioning strategies. It provides XML
>    Schema schemas for each of the strategies and discussion about various

s/Schema schemas/Schema examples/

>    schema design. A number of XML languages, including XHTML and Atom, are

s/design/design patterns/

>    used as case studies in different strategies.
[...]
> 1 Introduction
> 
>    Extending and Versioning XML Languages Part 1 described extending and
>    versioning languages. Part 2 focuses on XML and includes schema language
>    specific aspects of extending and versioning XML. The choices, decisions,
>    and strategies described in Part 1 are augmented with xml and schema

s/xml/XML/

>    instances herein.
> 
>   1.1 XML Terminology
> 
>    There are many different systems for exchanging texts in languages, such
>    as SQL, Java, XML, ECMAScript, C#. We will briefly describe some key
>    refinements to our lexicon for XML. An XML language has a vocabulary that
>    may use terms from one or more XML Namespaces (or none), each of which has
>    a namespace name. [Definition: An XML language is an identifiable set of
>    vocabulary terms with defined XML syntactic and semantic constraints. ] By
>    XML language, we mean the set of elements and attributes, or instances,
>    used by a particular application.

Really? How does "used by a particular application" fit in? I would have
thought that we meant the set of instances that conform to the rules of
the language independent of any particular application. Surely my XML
language is a language even before there are any applications that are
expecting to process it.

>    The Name Language - consisting of name,
>    given, family terms - has a namespace for the terms. We use the prefix
>    "namens" to refer to that namespace. The Name Language could consist of
>    terms from other vocabularies, such as Dublin Core or UBL. These terms
>    each have their own namespaces, illustrating that a language can comprise
>    vocabularies from multiple namespaces. An XML Namespace is a convenient

s/can comprise vocabularies/be comprised of terms/


>    container for collecting terms that are intended to be used together
>    within a language or across languages. It provides a mechanism for
>    creating globally unique names.
> 
>    We shall use the term instance when speaking of sequences of characters
>    (aka text) in XML. [Definition: An instance is a specific, discrete Text
>    in XML format.] Documents are instances of a language. In XML, they must

s/discrete Text in XML format/discrete Text of well-formed XML/

>    have a root element. A name text might have a name element as the root
>    element. Alternatively, the name vocabulary may be used by a language such

s/used by a language/used by another language/

>    as purchase orders. The purchase order texts may contain name elements.
>    Thus instances of a language are always part of a text and also may be the

This paragraph begins with a definition of the term "instance" as a
specific, discrete Text, but this sentence says that instances are
always part of a text. I don't find those two uses of the word
"instance" compatible. What did you mean?

>    entire text. XML instances (and all other instances of markup languages)
>    consist of markup and content. In the name example, the given and family
>    elements including the end markers are the markup. The values between the
>    start and end markers are the content. An instance has an information
>    model. There are a variety of data models within and without the W3C, and
>    the one standardized by the W3C is the XML infoset.

I suggest you drop the references to information models. As far as I can
tell, you don't refer to it anywhere else in the document.

>    The XML related terms and their relationships are shown below
> 
>    UML diagram of XML terms
> 
>    A stylesheet processor is a consumer of the XML text that it is processing

A stylesheet processor? What's the context for this paragraph. I'm lost.

[...]
>        There are a couple types of XML extension languages, element extension
>        and attribute extension.
>           * Element Extension. Languages that are elements. SOAP, etc. are
>             element extensions.
> 
>           * Attribute or type Extensions. Languages that are types or

How can a language be a type?

>             attributes. These languages must exist in the context of an
>             element. Sometimes called "parasite" languages as they require a
>             "host" element. XLink is an example.

The introductory sentence says "element extension" and "attribute
extension", but the bullets are "element extension" and "attribute or
type extension" Where'd the "type" bit come from?

>      * Mixtures: languages designed for, or often used for, encapsulating
>        some semantics inside another language. For example, MathML might be
>        mixed inside of another language.
> 
>    This is by no means an exhaustive list. Nor are these categories
>    completely clear cut. MathML can certainly be used standalone, for
>    example, and languages like SVG are a combination of standalone,
>    containers, and mixtures.
> 
> 2 XML Language Requirements
> 
>    The general language questions described in Part 1 Requirements
>    (../versioning#requirements). These requirements are augmented in XML by:
> 
>      * Fidelity of XML Schema for the versions of the language.

I don't understand that sentence.


>        We will see
>        how some designs preclude full XML Schema description. Often this
>        results in Schemas that are incomplete at the first and subsequent
>        versions. The options are typically: Complete in all versions,
>        complete in first version only, incomplete in all versions.
> 
>      * Use of generic XML and namespace only (precluding vocabulary specific
>        versions) tools. This itself is a trade-off because some generic XML

s/\(...\) tools/tools \(...\)/

>        tools (like XPath) are more difficult to use with multiple namespaces
>        containing the same "thing", like XHTML's P element.

But XHTML only has one namespace, so how is this example relevant?

> 3 Version Identification technologies
> 
>    Version identification of elements and attributes is critical for
>    correctly processing xml documents.

I don't believe that. As a blanket statement, it's too extreme. I
process DocBook documents everyday with hardly a thought about
versioning.

[...]
>   3.1 Qualified Name: Namespace + Local name
> 
>    The Namespaces specification defines a Qualified Name as the Namespace and
>    Local Name of a component.

This is the first use of the word "component". It's used often in the
text that follows, but I'm not completely sure what it is. Is it really
something for which we need a new term? If so, can you give a crisp
definition of "component"?

[...]
>   3.2 Type
> 
>    Many systems use type information associated with the component as part of
>    the version identification of the component. There are generally two
>    strategies for determining the type of a component, which we will call
>    "Top-typing" and "Bottom-typing". In many of the examples that will be

I suggest you describe more explicitly what you mean by top-typing and
bottom-typing. I can infer it from what follows, but it would be better
to be explicit, I think.

[...]
>    The use of types and the ability to re-use these types
>    across elements is an important factor in component version
>    identification.

How so?

>   3.3 Version Numbers
[...]
> 4 Component version identification strategies
[...]
>     1. all components in new namespace(s) for each version
> 
>        ie version 1 consists of namespaces a + b, version 1.1 consists of
>        namespaces c + d; or version 1 consists of namespace a, version 1.1
>        consists of namespace b.

I find it ironic that version numbers are treated somewhat dismissively
as a versioning strategy but the rest of the document turns around and
uses them almost exclusively for distinguishing between versions.

This suggests to me that perhaps version numbers are a workable
strategy.

>     2. all new components in new namespace(s) for each compatible version
> 
>        ie version 1 consists of namespaces a + b; version 1.1 consists of
>        namespaces a + b + c; version 2.0 consists of namespaces d + e.
> 
>     3. all new components in existing or new namespace(s) for each compatible
>        version
> 
>        ie version 1 consists of namespace a, version 1.1 consists of
>        namespace a, version 2 consists of namespace b; or version 1 consists
>        of namespace a, version 1.1 consists of namespace a + b.
> 
>     4. all new components in existing or new namespace(s) for each version
>        and a version identifier
> 
>        ie version 1 consists of namespace a + b + version attribute "1",
>        version 2 consists of namespace c + d + version attribute "2".
> 
>     5. all components in existing namespace(s) for each version (compatible
>        and incompatible) and a version identifier
> 
>        ie version 1 consists of namespace a + version attribute "1.0",
>        version 1.1 consists of namespace a + version attribute "1.1", version
>        2.0 consists of namespace a + version attribute "2.0".

It's probably worth noting that this isn't an exhaustive list.

[...]
>    Example 1: All components in new namespace(s) instances
> 
[...]
>  <personName xmlns="http://www.example.org/name/3">
>    <given>Dave</given>
>    <family>Orchard</family>
>    <midns:middle xmlns:midns="http://www.example.org/name/3/mid/1">Bryce</midns:middle>
>  </personName>
> 
>  <personName xmlns="http://www.example.org/name/3">
>    <given>Dave</given>
>    <family>Orchard</family>
>    <middiffdomain:middle xmlns:middiffdomain="http://www.example.com/mid/1">Bryce</middiffdomain:middle>
>  </personName>
> 
>    The 2nd and 3rdexamples shows all the components in the same new
>    namespace, with the 3rd showing a new name as well.. The 4th and 5th
>    example show an additional middle element in 2 different namespace names.
>    The 4th example comes from a namespace name that is in the same domain as
>    the name element's new namespace name. One reason for 2 namespaces is to
>    modularize the language. The 4th example shows a namespace name from a

s/4th/5th/

>    different domain for the middle. It is probable that the midns:middle was
>    created by the name author, and the middiffdomain:middle was created by a
>    3rd party.

I wouldn't care to state a probability for that assertion :-)

More importantly, is there really anything important to be said about the
difference between versioning changes made by the original authors and
changes made by a third party.

Do we really think that software might check the authority component of
the two URIs and behave differently based on whether or not they're the
same?

It might be worth mentioning this observation in the prose, but I think
it's unnecessary and confusing to include it in an example.

>     4.1.1 Compatibility
> 
>    In this strategy, forwards compatibility is not desired. Any change or
>    extension is an incompatible change with an existing consumer. When an
>    older consumer receives the new texts in the new namespace, most of the
>    software will break,

I think saying "most of the software will break" is, again, too extreme.
Whether or not the software will break depends on a large number of
factors.

[...]
>    Example 4: New components in existing or new namespace(s) with version
>    identifier instances
> 
[...]
>  <personName xmlns="http://www.example.org/name/1" version="1.0">
>    <given>Dave</given>
>    <family>Orchard</family>
>    <midns:middle xmlns:midns="http://www.example.org/name/mid/1">Bryce</midns:middle>
>  </personName>
> 
>  <personName xmlns="http://www.example.org/name/1" version="2.0">
>    <given>Dave</given>
>    <family>Orchard</family>
>    <midns:middle xmlns:midns="http://www.example.org/name/mid/1">Bryce</midns:middle>
>  </personName>
> 
>  <personName xmlns="http://www.example.org/name/2" version="2.0">
>    <given>Dave</given>
>    <family>Orchard</family>
>    <middle>Bryce</middle>
>  </personName>
> 
>    The last two examples show that the middle is now a mandatory part of the
>    name. This is indicated by just the version number or a new namespace plus
>    version number.

How does the change from version "1.0" to version "2.0" indicate that
the middle is now mandatory? I don't get that at all.

[...]
>    A downside with using new namespace names is that some tools, like XPath,
>    can be harder to use in the face of new namespace names. Software that
>    extracts the given and family name based upon the expanded name will often
>    break if a new namespace name is used.

Wouldn't this paragraph make more sense up in section 4.1?

[...]
>   5.2 Incompatible
> 
>    A version author can use new namespace names, local names, or version
>    numbers to indicate an incompatible change. An extension author may not
>    have these mechanisms available for indicating an incompatible extension.
>    A language designer that wants to allow extension authors to indicate
>    incompatible extension must provide a mechanism for indicating that
>    consumers must understand the extension.

Which consumers must understand it and why? And what if they
misunderstand it?

>    If the language designer has also
>    allowed for forwards compatibility, then the forwards compatibility rule
>    must be over-ridden
> 
>    Good Practice
> 
>    Provide Forwards Compatibility Override Rule: Languages with forwards
>    compatibility support SHOULD provide an override for indicating
>    incompatible extensions.

I'm not sure I believe this good practice. As I recall, Roy argued pretty
strongly and persuasively against it.

[...]
>    Example 7: Using SOAP Must Understand
> 
>  <soap:envelope>
>    <soap:body>
>      <personName xmlns="http://www.example.org/name/1">
>      <given>Dave</given>
>      <family>Orchard</family>
>     </personName>
>    </soap:body>
>  </soap:envelope>
> 
>  <soap:envelope>
>    <soap:header>
>    <midns:middle xmlns:midns="http://www.example.org/name/mid/1"
>                  soap:mustUnderstand="true">
>        Bryce
>    </midns:middle>
>    </soap:header>
>    <soap:body>
>      <personName xmlns="http://www.example.org/name/1">
>      <given>Dave</given>
>      <family>Orchard</family>
>     </personName>
>    </soap:body>
>  </soap:envelope>

I imagine that midns:middle header is designed to make sure that the
middle name will be understood. Is it then intentional and/or significant
that the body doesn't contain a middle?

> 6 XML Schema 1.0

I'll try to get the rest of this reviewed this weekend.

In broad strokes, I think it's good work. Editorially, there are a lot
of incomplete sentences and other issues, but I'm sure we can fix those.

The implicit focus of the document is clearly XML versioning strategies
in a W3C XML Schema-based, web-services style environment. I appreciate
that that is a large and significant environment. But it's not the only
environment and I don't think that the document is as explicit as it 
could be about its scope.

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <ndw@nwalsh.com> | If you run after wit you will succeed
http://nwalsh.com/            | in catching folly.-- Montesquieu
Received on Friday, 11 May 2007 20:10:54 UTC