Re: Versioning of XML Schema and namespaces from Fraser Crichton on 2005-05-11 (xmlschema-dev@w3.org from May 2005)

From: Fraser Crichton <fraser.crichton@solnetsolutions.co.nz>
Date: Thu, 12 May 2005 11:32:01 +1200
To: Eliot Kimber <ekimber@innodata-isogen.com>
Cc: xmlschema-dev@w3.org, Dan Vint <dvint@dvint.com>, John.Hockaday@ga.gov.au
Message-ID: <428295F1.1010900@solnetsolutions.co.nz>
Hi, Eliot,

Thanks very much - that's an extremely articulate answer.

Would I be correct in assuming that visibility is the reason behind so 
many schemas using a namespace approach to versioning?

So for example I create a messaging schema which incorporates components 
from several published OASIS schemas, without versioning info in the 
namespaces of the incorporated schemas it would be difficult for 
receivers of my messages to work out what version of the public schemas 
I had intended they validate against. Does that make sense?

Cheers,

Fraser


Eliot Kimber wrote:

>
> Fraser Crichton wrote:
>
>> Hi,
>>
>> I'm very interested in the reasons behind this -
>>
>>  > Putting a version in the namespace is definitely not the right 
>> thing to do.
>>
>> I ask because I've seen that as a possible approach to versioning 
>> (http://www.xfront.com/Versioning.pdf) and it seems a number of 
>> practitioners have adopted this e.g. the US Dept of Navy, xCIL, etc.
>
>
> Per the W3C namespace spec, a namespace identifies an abstraction, an 
> infinite set of names distinguished from all other possible names by 
> having a unique prefix (the namespace URI).
>
> Thus a namespace URI identifies an abstraction--there is no particular 
> mechanism defined within the namespace spec for defining what names 
> are actually in the namespace. That is, a namespace URI identifies an 
> unbounded set of names, that is, an infinite set.
>
> An infinite set cannot meaningfully be versioned because you cannot 
> distinguish one version from another (because you can never enumerate 
> all its members in order to prove equality or difference).
>
> This is the philosophical reason for not versioning namespaces.
>
> The practical reason derives from this idea of namespaces naming 
> unversionable abstractions:
>
> In practice, namespaces are bound to XML "applications" [I put 
> "application" in quotes because it's not a precisely-defined term and 
> to distinguish it from the narrow usage of _application_ to mean a 
> specific software program.] For example, XSLT is an XML application, 
> as are DocBook and XHTML. This binding is done in application 
> specifications.
>
> As an abstraction, the XSLT application is invariant over time: its 
> basic purpose and usage will always be what it is now, regardless of 
> the details of how it is implemented.
>
> Thus, in this use case, namespace URIs represent the abstract idea of 
> the application (that is, the concept of XSLT or DocBook or XHTML) and 
> that abstract idea cannot be versioned and doesn't change over time.
>
> That is, as long as the fundamental nature of a given application 
> doesn't change, it would be inappropriate and unnecessary to change 
> it's namespace URI simply because some implementation detail of the 
> application changed.
>
> Or said another way, if you change the namespace URI, in any way, you 
> are identifying a fundamentally *different* application.
>
> Or said another way, the namespace URI names *all current and future 
> versions" of the concrete expressions of the application.
>
> What *does* change are the concrete implementation artifacts that make 
> up the application at any point in time. As concrete objects, they are 
> versionable and will likely have different versions in time. Thus it 
> is appropriate (in fact essential) that the resource locators for 
> those concrete objects reflect the versions of them, otherwise you 
> could only locate a single version of any one of them, which would be 
> very limiting in most cases (for example, if I have two versions of 
> the schema for a given application and documents that validate against 
> one version or the other).
>
> Thus, while the namespace URI for a given application should be 
> invariant, the resource URLs for the concrete implementation 
> components (schemas, transforms, java classes, documentation, etc.) 
> will be variant as new versions are created. Of course, you might also 
> offer URLs that represent the "latest" version--resources may have any 
> number of URLs associated with them. But, in the general case, there 
> should always be version-specific URLs for the resources.
>
> How can this work in practice?
>
> The best solution, in the abstract, I think, is what Mike suggests, 
> namely an attribute that specifies the schema version, which the 
> processor then uses to determine the correct schema instance to apply. 
> This suggests that it might be useful for the XSD spec (or perhaps a 
> separate, more general spec, since this requirement isn't 
> XSD-specific) to define a "schema-version" attribute that can be used 
> independently from the schemaLocation attribute.
>
> But, given that current software (and certainly the Xerces processor, 
> which provides schema-awareness in many tool chains) depends primarily 
> on schemaLocation and/or catalogs, I think that a productive approach 
> would be as described below.
>
> John Hockaday writes:
>
>> If I don't already have a copy of the
>> XSDs referred to in the XML document instances then I need to 
>> download those
>> XSDs and validate them. 
>> If the XSDs are not valid then I report my findings to my clients and 
>> reject
>> the relevant XML document instances.  If the XSDs are valid then I 
>> validate
>> the XML document instances against those XSDs and report my findings 
>> to my
>> clients.  Again only valid XML document instances are accepted.
>
>
>> If I do have a copy of the XSDs then I will have already validated 
>> them and I
>> hope to use OASIS Catalogue files to refer to local copies of those 
>> XSDs when
>> validating related XML document instances.  This will of course reduce
>> bandwidth, time and costs and is essential when validating 40,000+ 
>> metadata
>> records at a time.
>
>
> Here there are two key and common requirements:
>
> 1. Validate documents against whatever schema they say they conform to 
> (and, as a side effect, validate the schemas themselves).
>
> 2. Provide local copies of schemas to reduce processing time and 
> network overhead.
>
> John knows that there may be different versions of schemas for the 
> same namespace.
>
> I think the solution here is use the catalogs as follows:
>
> 1. Require that incoming documents use absolute URIs for all 
> schemaLocation specifications (not sure if this is currently the case 
> in John's case).
>
> 2. Use the catalog to map these absolute URIs to the local copy of the 
> schema (if there is one--if there's not one, fetch it and update the 
> catlaog).
>
> 3. As a fallback, map namespace URIs to schema URIs, which the 
> appropriate schema for that namespace is known.
>
> This does require that when there are different schema versions for a 
> given namespace that documents specify the correctly schemaLocation 
> value, otherwise John has no choice to be retrieve an arbitrary 
> (presumably the latest) version of the schema for that namespace.
>
> In the case where the version has been used in the namespace and there 
> is no schemaLocation, the problem is the same: either there's exactly 
> one schema for that namespace or John has to arbitrarily pick one.
>
> This all puts the onus on document authors to specify correctly which 
> version of a namespace's schema they want to use. There is no way 
> around this--it's simply an unavoidable consequence of the fact that 
> there can be different versions of a schema for a given namespace.
>
> Note too, that this basic approach can be used to prevent authors from 
> using schemaLocation= to nefarious ends where you have the requirement 
> that documents conform only to a known, and controlled, set of 
> schemas. Because you are remapping the schemaLocation URIs to local 
> files, if authors specify a schemaLocation URI that you don't 
> recognize (meaning that it's not mapped in the catalog), you can fall 
> back to pointing to some local schema that will cause the document in 
> question to fail its validation check. This is the functional 
> equivalent of ignoring schemaLocation=.
>
> Cheers,
>
> Eliot
>


-- 
Fraser Crichton
XML Developer
SolNet Solutions Limited
L12, SolNet House, 70 The Terrace
PO Box 397, Wellington, Aotearoa / New Zealand
www.solnetsolutions.co.nz <http://www.solnetsolutions.co.nz>
DDI: 04-462-5078
Mob: 027-278-3392
Fax: 04-462-5011
email: fraser.crichton@solnetsolutions.co.nz 
<mailto:fraser.crichton@solnetsolutions.co.nz>

Attention:
This email may contain information intended for the sole use of
the original recipient. Please respect this when sharing or
disclosing this email's contents with any third party. If you
believe you have received this email in error, please delete it
and notify the sender or postmaster@solnetsolutions.co.nz as
soon as possible. The content of this email does not necessarily
reflect the views of SolNet Solutions Ltd.
Received on Wednesday, 11 May 2005 23:32:12 UTC