Re: Versioning of XML Schema and namespaces from Kurt Riede on 2005-05-11 (xmlschema-dev@w3.org from May 2005)

From: Kurt Riede <kurt.riede@web.de>
Date: Wed, 11 May 2005 17:24:59 +0200
To: <xmlschema-dev@w3.org>
Message-ID: <000a01c5563d$98a919e0$e4aaa8c0@bea.de>
> An infinite set cannot meaningfully be versioned because you cannot
> distinguish one version from another (because you can never enumerate
> all its members in order to prove equality or difference).

Yes you can. That's more or less what Kurt Goedel did in his incompleteness
theory.
You *can* enumerate all it's members, even if the set has an infinite size.
And you *can* also check if such two sets are different or not.

I don't see your philosophical reason for not versioning namespaces.

> This does require that when there are different schema versions for a
> given namespace that documents specify the correctly schemaLocation
> value, otherwise John has no choice to be retrieve an arbitrary
> (presumably the latest) version of the schema for that namespace.

No, I think the author of the instance should declare, to which *version* of
the namespace the instance conforms to.
And the consumer of the instance *must know* that there are different
versions of the schema.
If the instance author provides a schemaLocation, the consumer should
*never* rely on it, because
it could be a fake to smuggle invalid documents to the consumer.
If the author of the instance doesn't provide a version, the behaviour of
the consumer is application dependent. One application might e.g. assume the
latest (by the consumer) known version, another might just reject the
instance.

Cheers
Kurt

----- Original Message ----- 
From: "Eliot Kimber" <ekimber@innodata-isogen.com>
To: <xmlschema-dev@w3.org>
Cc: "Fraser Crichton" <fraser.crichton@solnetsolutions.co.nz>; "Dan Vint"
<dvint@dvint.com>; <John.Hockaday@ga.gov.au>
Sent: Wednesday, May 11, 2005 4:50 PM
Subject: Re: Versioning of XML Schema and namespaces


>
> Fraser Crichton wrote:
> > Hi,
> >
> > I'm very interested in the reasons behind this -
> >
> >  > Putting a version in the namespace is definitely not the right thing
> > to do.
> >
> > I ask because I've seen that as a possible approach to versioning
> > (http://www.xfront.com/Versioning.pdf) and it seems a number of
> > practitioners have adopted this e.g. the US Dept of Navy, xCIL, etc.
>
> Per the W3C namespace spec, a namespace identifies an abstraction, an
> infinite set of names distinguished from all other possible names by
> having a unique prefix (the namespace URI).
>
> Thus a namespace URI identifies an abstraction--there is no particular
> mechanism defined within the namespace spec for defining what names are
> actually in the namespace. That is, a namespace URI identifies an
> unbounded set of names, that is, an infinite set.
>
> An infinite set cannot meaningfully be versioned because you cannot
> distinguish one version from another (because you can never enumerate
> all its members in order to prove equality or difference).
>
> This is the philosophical reason for not versioning namespaces.
>
> The practical reason derives from this idea of namespaces naming
> unversionable abstractions:
>
> In practice, namespaces are bound to XML "applications" [I put
> "application" in quotes because it's not a precisely-defined term and to
> distinguish it from the narrow usage of _application_ to mean a specific
> software program.] For example, XSLT is an XML application, as are
> DocBook and XHTML. This binding is done in application specifications.
>
> As an abstraction, the XSLT application is invariant over time: its
> basic purpose and usage will always be what it is now, regardless of the
> details of how it is implemented.
>
> Thus, in this use case, namespace URIs represent the abstract idea of
> the application (that is, the concept of XSLT or DocBook or XHTML) and
> that abstract idea cannot be versioned and doesn't change over time.
>
> That is, as long as the fundamental nature of a given application
> doesn't change, it would be inappropriate and unnecessary to change it's
> namespace URI simply because some implementation detail of the
> application changed.
>
> Or said another way, if you change the namespace URI, in any way, you
> are identifying a fundamentally *different* application.
>
> Or said another way, the namespace URI names *all current and future
> versions" of the concrete expressions of the application.
>
> What *does* change are the concrete implementation artifacts that make
> up the application at any point in time. As concrete objects, they are
> versionable and will likely have different versions in time. Thus it is
> appropriate (in fact essential) that the resource locators for those
> concrete objects reflect the versions of them, otherwise you could only
> locate a single version of any one of them, which would be very limiting
> in most cases (for example, if I have two versions of the schema for a
> given application and documents that validate against one version or the
> other).
>
> Thus, while the namespace URI for a given application should be
> invariant, the resource URLs for the concrete implementation components
> (schemas, transforms, java classes, documentation, etc.) will be variant
> as new versions are created. Of course, you might also offer URLs that
> represent the "latest" version--resources may have any number of URLs
> associated with them. But, in the general case, there should always be
> version-specific URLs for the resources.
>
> How can this work in practice?
>
> The best solution, in the abstract, I think, is what Mike suggests,
> namely an attribute that specifies the schema version, which the
> processor then uses to determine the correct schema instance to apply.
> This suggests that it might be useful for the XSD spec (or perhaps a
> separate, more general spec, since this requirement isn't XSD-specific)
> to define a "schema-version" attribute that can be used independently
> from the schemaLocation attribute.
>
> But, given that current software (and certainly the Xerces processor,
> which provides schema-awareness in many tool chains) depends primarily
> on schemaLocation and/or catalogs, I think that a productive approach
> would be as described below.
>
> John Hockaday writes:
>
> > If I don't already have a copy of the
> > XSDs referred to in the XML document instances then I need to download
those
> > XSDs and validate them.
> >
> > If the XSDs are not valid then I report my findings to my clients and
reject
> > the relevant XML document instances.  If the XSDs are valid then I
validate
> > the XML document instances against those XSDs and report my findings to
my
> > clients.  Again only valid XML document instances are accepted.
>
> > If I do have a copy of the XSDs then I will have already validated them
and I
> > hope to use OASIS Catalogue files to refer to local copies of those XSDs
when
> > validating related XML document instances.  This will of course reduce
> > bandwidth, time and costs and is essential when validating 40,000+
metadata
> > records at a time.
>
> Here there are two key and common requirements:
>
> 1. Validate documents against whatever schema they say they conform to
> (and, as a side effect, validate the schemas themselves).
>
> 2. Provide local copies of schemas to reduce processing time and network
> overhead.
>
> John knows that there may be different versions of schemas for the same
> namespace.
>
> I think the solution here is use the catalogs as follows:
>
> 1. Require that incoming documents use absolute URIs for all
> schemaLocation specifications (not sure if this is currently the case in
> John's case).
>
> 2. Use the catalog to map these absolute URIs to the local copy of the
> schema (if there is one--if there's not one, fetch it and update the
> catlaog).
>
> 3. As a fallback, map namespace URIs to schema URIs, which the
> appropriate schema for that namespace is known.
>
> This does require that when there are different schema versions for a
> given namespace that documents specify the correctly schemaLocation
> value, otherwise John has no choice to be retrieve an arbitrary
> (presumably the latest) version of the schema for that namespace.
>
> In the case where the version has been used in the namespace and there
> is no schemaLocation, the problem is the same: either there's exactly
> one schema for that namespace or John has to arbitrarily pick one.
>
> This all puts the onus on document authors to specify correctly which
> version of a namespace's schema they want to use. There is no way around
> this--it's simply an unavoidable consequence of the fact that there can
> be different versions of a schema for a given namespace.
>
> Note too, that this basic approach can be used to prevent authors from
> using schemaLocation= to nefarious ends where you have the requirement
> that documents conform only to a known, and controlled, set of schemas.
> Because you are remapping the schemaLocation URIs to local files, if
> authors specify a schemaLocation URI that you don't recognize (meaning
> that it's not mapped in the catalog), you can fall back to pointing to
> some local schema that will cause the document in question to fail its
> validation check. This is the functional equivalent of ignoring
> schemaLocation=.
>
> Cheers,
>
> Eliot
>
> -- 
> W. Eliot Kimber
> Professional Services
> Innodata Isogen
> 9390 Research Blvd, #410
> Austin, TX 78759
> (512) 372-8155
>
> ekimber@innodata-isogen.com
> www.innodata-isogen.com
>
>
Received on Friday, 13 May 2005 06:02:59 UTC