Re: (Partial) review of Versioning XML from Norman Walsh on 2007-05-13 (www-tag@w3.org from May 2007)

From: Norman Walsh <ndw@nwalsh.com>
Date: Sun, 13 May 2007 17:05:31 -0400
To: www-tag@w3.org
Message-ID: <87mz084ep0.fsf@nwalsh.com>
Here's the rest of my review.

> 7 Schemas for Version Identification Strategies
>
>   7.1 #1: all components in new namespace(s) for each version
[...]
>    The author has 5 options for the v2 schema for name and middle, listed
>    below and detailed subsequently:
>
>     1. optional middle, extensibility retained, but new name type does not
>        refer to middle;
>
>     2. optional middle, extensibility is lost, new name type refers to
>        middle;
>
>     3. required middle, extensibility retained, new name type refers to
>        middle but compatibility is lost (essentially strategy #1);
>
>     4. optional middle, extensibility retained, no new name type
>
>     5. no update to the Schema

I think it would be good to provide a little bit more explanation of
these five options. I'm not confident I understand all of the
distinctions from just this one sentence explanation.

[...]
>     7.2.1 Redefine
>
>    Redefine allows incompatible and incompatible changes to be made to a
>    type. This can be very dangerous because a document cannot use namespaces
>    or names to indicate which type is being used, either the original or the
>    redefined.

I think this would be better stated as:

     Redefine allows both compatible and incompatible changes to be
     made to a type. Unlike other schema extension mechanisms which
     provide new names for extended or restricted types, redefine
     changes the definition of a type without changing its name. This
     means that the name alone is no longer sufficient to determine of
     two types are really the same.

>    The schema author must take extreme caution to ensure that
>    compatible changs are made.

I think "very dangerous" and "extreme caution" are perhaps a little
too strong.

[...]
>   7.3 #3: All new components in existing or new namespace(s) for each compatible
>   version
>
>    It is possible to create Schemas with additional optional components. This
>    requires re-using the namespace name for optional components and special
>    schema design techniques. The re-using namespace rule is:
>
>    Good Practice
>
>    Re-use namespace names Rule: If a backwards compatible change can be made
>    to a specification, then the old namespace name SHOULD be used in
>    conjunction with XML's extensibility model.
>
>    It is important to note that that a new namespace name is not required
>    whenever a specification evolves - strategies #1 and #2 - but rather a new
>    namespace name can be required only if an incompatible change is made.

Required here is a policy decision, yes? I could make an incompatible
change without changing the namespace, but you're asserting that I
should not do that, right?

>    Strategy #1 uses a new namespace for all existing components and any
>    additions, Strategy #2 uses a new namespace for all additions. Strategy #3
>    re-uses namespaces for compatible extensions.
>
>    Good Practice
>
>    New namespaces to break Rule: A new namespace name is used when backwards
>    compatibility is not permitted, that is software MUST break if it does not
>    understand the new language components.

I think it might be clearer to say must reject. And do we really mean must?
Certainly in some contexts it might be necessary to abort, but I'm not sure
that's true in all contexts.

[...]
>    Example 19: New components in existing or new namespace(s) with version
>    identifier instances
>
>  <personName xmlns="http://www.example.org/name/1" version="1.0">
>    <given>Dave</given>
>    <family>Orchard</family>
>  </personName>
>
>  <personName xmlns="http://www.example.org/name/1" version="1.0">
>    <given>Dave</given>
>    <family>Orchard</family>
>    <middle>Bryce</middle>
>  </personName>
>
>  <personName xmlns="http://www.example.org/name/1" version="1.1">
>    <given>Dave</given>
>    <family>Orchard</family>
>    <pref1:middle xmlns:mid1="http://www.example.org/name/mid/1">Bryce</pref1:middle>
>  </personName>
>
>  <personName xmlns="http://www.example.org/name/1" version="1.0">
>    <given>Dave</given>
>    <family>Orchard</family>
>    <pref2:middle xmlns:mid2="http://www.example.org/name/mid/1">Bryce</pref2:middle>
>  </personName>
>
>  <personName xmlns="http://www.example.org/name/1" version="2.0">
>    <given>Dave</given>
>    <family>Orchard</family>
>    <pref1:middle xmlns:mid1="http://www.example.org/name/mid/1">Bryce</pref1:middle>
>  </personName>
>
>    The last example shows that the middle is now a mandatory part of the
>    name.

How does it show that?

>    As with Design #2, the schema for the optional middle cannot fully
>    express the content model. A schema for the mandatory middle is
>
>    Example 20: New components in existing or new namespace(s) with version
>    identifier schema v2, incompatible change
>
>  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
>        targetNamespace="http://www.example.org/name/1"
>        xmlns:namens="http://www.example.org/name/1"
>        xmlns:midns="http://www.example.org/name/mid/1">
>
>    <xs:complexType name="nameType">
>      <xs:sequence>
>        <xs:element name="given" type="xs:string"/>
>        <xs:element name="family" type="xs:string"/>
>        <xs:element name="middle" type="xs:string" minOccurs="0"/>
>        <xs:element ref="midns:middle"/>
>        <xs:any namespace="##other" processContents="lax"
>                minOccurs="0" maxOccurs="unbounded"/>
>      </xs:sequence>
>      <xs:anyAttribute/>
>    </xs:complexType>
>
>    <xs:element name="personName" type="namens:nameType"/>
>  </xs:schema>

Shouldn't the version number be in the schema somewhere?

[...]
>    This is not a very helpful XML Schema change. The problem is that they
>    cannot insert the reference to the optional midns:middle element in the
>    name schema and retain the extensibility point because of the
>    aforementioned Non-Determinism Constraint.

I think it would be more helpful to refer to the determinism
constraint using the Schema terminology throughout. Yes, "unique
particle attribution" constraint is a jargony mouthful, but using other
terminology just raises the possibility of confusion.

[...]
>   9.1 DocBook
>
>             Requirement
>             Schema Lang          RelaxNG
>     3rd party compatibly extend  Yes
>    3rd party incompatibly extend No

Third parties can make incompatible changes.

>    Designer incompatibly extend  Yes
>             stand-alone          Yes
>            Schema design         Wildcards

There are wildcards, but wildcards and other XML Schema design
mechanisms don't translate to RELAX NG in a very direct way. Changing
RELAX NG patterns is perhaps more like redefine.

>       Substitution Mechanism     Ignore Uknowns

It's only "ignore unknowns" in a few specific contexts. Mostly, you're expected
to understand all the extensions.

>                                  Strategy #5 all components in existing
>      Component Identification    namespace(s) for each version (compatible
>                                  and incompatible) and a version identifier
>          Incompatible Ext        No
>           identification
>         Schema Completeness      N/A

[...]
> 10 Determinism
>
>    This Finding has spent considerable material describing deterministic
>    content models, and so it is worthy of describing the W3C XML Schema

s/worthy of describing/worth describing/

>    determinism rules in more detail. The reader is reminded that these rules
>    are unique to W3C XML Schema and other XML Schema languages like RELAX NG

s/and other/and that other/

>    do not use these rules and so do not suffer from the contortions one is
>    forced through when using W3C XML Schema. XML DTDs and W3C XML Schema have
>    a rule that requires schemas to have deterministic content models. From
>    the XML 1.0 specification,
>
>    "For example, the content model ((b, c) | (b, d)) is non-deterministic,
>    because given an initial b the XML processor cannot know which b in the
>    model is being matched without looking ahead to see which element follows
>    the b."
>
>    The use of ##any means there are some schemas that we might like to
>    express, but that aren't allowed.
>
>      * Wildcards with ##any, where minOccurs does not equal maxOccurs, are
>        not allowed before an element declaration. An instance of the element
>        would be valid for the ##any or the element. ##other could be used.
>
>      * The element before a wildcard with ##any must have cardinality of
>        maxOccurs equals its minOccurs. If these were different, say
>        minOccurs="1" and maxOccurs="2", then the optional occurrences could
>        match either the element definition or the ##any. As a result of this
>        rule, the minOccurs must be greater than zero.
>
>      * Derived types that add element definitions after a wildcard with ##any
>        must be avoided. A derived type might add an element definition after
>        the wildcard, then an instance of the added element definition could
>        match either the wildcard or the derived element definition.
>
>    Good Practice
>
>    Be Deterministic rule: Use of wildcards MUST be deterministic. Location of
>    wildcards, namespace of wildcard extensions, minOccurs and maxOccurs
>    values are constrained, and type restriction is controlled.

Well, if you're using XSD... :-)

> 11 Other technologies

Yes, we really do need to cover these in more detail.

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <ndw@nwalsh.com> | Everything should be made as simple as
http://nwalsh.com/            | possible, but no simpler.
Received on Sunday, 13 May 2007 21:05:43 UTC