- From: Eliot Kimber <ekimber@innodata-isogen.com>
- Date: Wed, 11 May 2005 09:50:42 -0500
- To: xmlschema-dev@w3.org
- CC: Fraser Crichton <fraser.crichton@solnetsolutions.co.nz>, Dan Vint <dvint@dvint.com>, John.Hockaday@ga.gov.au
Fraser Crichton wrote: > Hi, > > I'm very interested in the reasons behind this - > > > Putting a version in the namespace is definitely not the right thing > to do. > > I ask because I've seen that as a possible approach to versioning > (http://www.xfront.com/Versioning.pdf) and it seems a number of > practitioners have adopted this e.g. the US Dept of Navy, xCIL, etc. Per the W3C namespace spec, a namespace identifies an abstraction, an infinite set of names distinguished from all other possible names by having a unique prefix (the namespace URI). Thus a namespace URI identifies an abstraction--there is no particular mechanism defined within the namespace spec for defining what names are actually in the namespace. That is, a namespace URI identifies an unbounded set of names, that is, an infinite set. An infinite set cannot meaningfully be versioned because you cannot distinguish one version from another (because you can never enumerate all its members in order to prove equality or difference). This is the philosophical reason for not versioning namespaces. The practical reason derives from this idea of namespaces naming unversionable abstractions: In practice, namespaces are bound to XML "applications" [I put "application" in quotes because it's not a precisely-defined term and to distinguish it from the narrow usage of _application_ to mean a specific software program.] For example, XSLT is an XML application, as are DocBook and XHTML. This binding is done in application specifications. As an abstraction, the XSLT application is invariant over time: its basic purpose and usage will always be what it is now, regardless of the details of how it is implemented. Thus, in this use case, namespace URIs represent the abstract idea of the application (that is, the concept of XSLT or DocBook or XHTML) and that abstract idea cannot be versioned and doesn't change over time. That is, as long as the fundamental nature of a given application doesn't change, it would be inappropriate and unnecessary to change it's namespace URI simply because some implementation detail of the application changed. Or said another way, if you change the namespace URI, in any way, you are identifying a fundamentally *different* application. Or said another way, the namespace URI names *all current and future versions" of the concrete expressions of the application. What *does* change are the concrete implementation artifacts that make up the application at any point in time. As concrete objects, they are versionable and will likely have different versions in time. Thus it is appropriate (in fact essential) that the resource locators for those concrete objects reflect the versions of them, otherwise you could only locate a single version of any one of them, which would be very limiting in most cases (for example, if I have two versions of the schema for a given application and documents that validate against one version or the other). Thus, while the namespace URI for a given application should be invariant, the resource URLs for the concrete implementation components (schemas, transforms, java classes, documentation, etc.) will be variant as new versions are created. Of course, you might also offer URLs that represent the "latest" version--resources may have any number of URLs associated with them. But, in the general case, there should always be version-specific URLs for the resources. How can this work in practice? The best solution, in the abstract, I think, is what Mike suggests, namely an attribute that specifies the schema version, which the processor then uses to determine the correct schema instance to apply. This suggests that it might be useful for the XSD spec (or perhaps a separate, more general spec, since this requirement isn't XSD-specific) to define a "schema-version" attribute that can be used independently from the schemaLocation attribute. But, given that current software (and certainly the Xerces processor, which provides schema-awareness in many tool chains) depends primarily on schemaLocation and/or catalogs, I think that a productive approach would be as described below. John Hockaday writes: > If I don't already have a copy of the > XSDs referred to in the XML document instances then I need to download those > XSDs and validate them. > > If the XSDs are not valid then I report my findings to my clients and reject > the relevant XML document instances. If the XSDs are valid then I validate > the XML document instances against those XSDs and report my findings to my > clients. Again only valid XML document instances are accepted. > If I do have a copy of the XSDs then I will have already validated them and I > hope to use OASIS Catalogue files to refer to local copies of those XSDs when > validating related XML document instances. This will of course reduce > bandwidth, time and costs and is essential when validating 40,000+ metadata > records at a time. Here there are two key and common requirements: 1. Validate documents against whatever schema they say they conform to (and, as a side effect, validate the schemas themselves). 2. Provide local copies of schemas to reduce processing time and network overhead. John knows that there may be different versions of schemas for the same namespace. I think the solution here is use the catalogs as follows: 1. Require that incoming documents use absolute URIs for all schemaLocation specifications (not sure if this is currently the case in John's case). 2. Use the catalog to map these absolute URIs to the local copy of the schema (if there is one--if there's not one, fetch it and update the catlaog). 3. As a fallback, map namespace URIs to schema URIs, which the appropriate schema for that namespace is known. This does require that when there are different schema versions for a given namespace that documents specify the correctly schemaLocation value, otherwise John has no choice to be retrieve an arbitrary (presumably the latest) version of the schema for that namespace. In the case where the version has been used in the namespace and there is no schemaLocation, the problem is the same: either there's exactly one schema for that namespace or John has to arbitrarily pick one. This all puts the onus on document authors to specify correctly which version of a namespace's schema they want to use. There is no way around this--it's simply an unavoidable consequence of the fact that there can be different versions of a schema for a given namespace. Note too, that this basic approach can be used to prevent authors from using schemaLocation= to nefarious ends where you have the requirement that documents conform only to a known, and controlled, set of schemas. Because you are remapping the schemaLocation URIs to local files, if authors specify a schemaLocation URI that you don't recognize (meaning that it's not mapped in the catalog), you can fall back to pointing to some local schema that will cause the document in question to fail its validation check. This is the functional equivalent of ignoring schemaLocation=. Cheers, Eliot -- W. Eliot Kimber Professional Services Innodata Isogen 9390 Research Blvd, #410 Austin, TX 78759 (512) 372-8155 ekimber@innodata-isogen.com www.innodata-isogen.com
Received on Wednesday, 11 May 2005 14:50:02 UTC