Two versions cannot be both back- and forwards compatible: Comment on versioning finding from Marc de Graauw on 2006-08-28 (www-tag@w3.org from August 2006)

From: Marc de Graauw <marc@marcdegraauw.com>
Date: Mon, 28 Aug 2006 18:23:17 +0200
To: "'David Orchard'" <dorchard@bea.com>, <www-tag@w3.org>
Message-ID: <002b01c6cabe$44e01150$fd00a8c0@MARCNOTE>
Hi David,

A comment on the 26 July 2006 "Extending and Versioning Languages Part 1"
document.

I agree with your principles on extending languages, but I think there are
problems with the definitions you use. Especially, I do not think two
versions of a language can be both forwards and backwards compatible with
each other in any meaningful way.

If we define X as the set of all well-formed XML instances, and XL as the
subset of X whose members are valid instances of a language L, then it
follows from the FOLDOC definitions as well as from your own definitions
that:

1) a language V2 is backwards compatible with V1 iff XV1 is a subset of XV2
(i.e. V2 processors can process all V1 instances)
2) a language V2 is forwards compatible with V1 iff XV2 is a subset of XV1
(i.e. V1 processors can process all V2 instances)

>From 1) and 2) follows:

3) if language V2 is both backwards and forwards compatible with V1 then XV1
is a subset of XV2 and XV2 is a subset of XV1, therefore XV1 = XV2

This does not mean V1 = V2, since V1 might be expressed in XML Schema and V2
in RelaxNG, or the semantic interpretation of instances might have changed,
but a new version which validates exactly the same set of instances is
uncommon to say the least. (This argument looks only at languages which
define a single XML document, not parts or collections of documents, but I
think the same would apply there.)

At first this argument puzzled me a lot since it seems to contradict your
principle:

4) Language V2 is (backwards and forwards) compatible with Language V1 if
Language V1 Syntax > Language V2 Syntax > Language V2 Information Set >
Language V1 Information set.

and your line of reasoning seems valid as well.

The difference between the two conclusions is caused by a hidden assumption
in your argument: namely that version 1 producers will not use the wildcard,
which in turn means producers and consumers use a different language. In an
example, if I have a version 1 Schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
    targetNamespace="http://www.example.org/name/1"
xmlns:name="http://www.example.org/name/1"
    elementFormDefault="qualified">
    <xs:element name="name" type="name:nameType"/>
    <xs:complexType name="nameType">
        <xs:sequence>
            <xs:element name="given" type="xs:string"/>
            <xs:element name="family" type="xs:string"/>
            <xs:any namespace="##targetNamespace" processContents="lax"
minOccurs="0"
                maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:anyAttribute/>
    </xs:complexType>
</xs:schema>

the next three instances will validate against it:

<name xmlns="http://www.example.org/name/1">
    <given>Dave</given>
    <family>Orchard</family>
</name>

and 

<name xmlns="http://www.example.org/name/1">
    <given>Dave</given>
    <family>Orchard</family>
    <middle>Bryce</middle>
</name>

and

<name xmlns="http://www.example.org/name/1">
    <given>Dave</given>
    <family>Orchard</family>
    <nonsense>Blah blah blah</nonsense>
</name>

Version 1 consumers are expected to accept all.

When I release version 2 of the language with the following schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
    targetNamespace="http://www.example.org/name/1"
xmlns:name="http://www.example.org/name/1"
    elementFormDefault="qualified">
    <xs:element name="name" type="name:nameType"/>
    <xs:complexType name="nameType">
        <xs:sequence>
            <xs:element name="given" type="xs:string"/>
            <xs:element name="family" type="xs:string"/>
            <xs:element name="middle" type="xs:string" minOccurs="0"/>
        </xs:sequence>
        <xs:anyAttribute/>
    </xs:complexType>
</xs:schema>

the first two instances will validate against it, the third will not, so
version 2 consumers are expected to accept only the first two instances. (I
haven't put a wildcard in version 2 to avoid determinacy problems, but the
inclusion in a proper way would not change the argument.)

The problem is version 1 producers are not expected to produce the latter
two instances, only instances of the first. Effectively this means version 1
producers are not allowed to produce instances using the version 1 schema
consumers use, but they are expected to produce only instances which
validate against this schema - without the wildcard:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
    targetNamespace="http://www.example.org/name/1"
xmlns:name="http://www.example.org/name/1"
    elementFormDefault="qualified">
    <xs:element name="name" type="name:nameType"/>
    <xs:complexType name="nameType">
        <xs:sequence>
            <xs:element name="given" type="xs:string"/>
            <xs:element name="family" type="xs:string"/>
        </xs:sequence>
        <xs:anyAttribute/>
    </xs:complexType>
</xs:schema>

Version 1 producers must use another schema to check their outgoing messages
than version 1 consumers are using to check incoming messages, and version 1
producers therefore use another language than version 1 consumers. If
version 1 producers were allowed to produce the "nonsense" instance,
backward compatibility would be lost since version 2 consumers cannot handle
the "nonsense" instance and thus would not be able to process all instances
version 1 consumers could process (or version 1 producers could produce).

So using your approach there are effectively 4 languages involved in a
single new release: P1 for version 1 producers, C1 for version 1 consumers,
P2 for version 2 producers and C2 for version 2 consumers. Now, if we follow
your recommendations, applies:

4) XP1 is a subset of XC1 (version 1 consumers must accept more than version
1 producers may produce)
5) XP2 is a subset of XC2 (ditto v2) 
6) XP1 is a subset of XC2 (C2 is backwards compatible with P1, i.e. C2
consumers must accept all instances produced by P1 producers)
7) XP2 is a subset of XC1 (P2 is forwards compatible with C1, i.e. C1
consumers must accept all instances produced by P2 producers) 

So the pair of languages P1 and C1 is in a certain way backwards and
forwards compatible with the pair of languages P2 and C2, given proper
usage, but there is no single pair of languages L1 and L2 which are both
backwards and forwards compatible with each other.

Regards,

Marc de Graauw
Received on Monday, 28 August 2006 16:23:16 UTC