Re: Versioning of XML Schema and namespaces

John.Hockaday@ga.gov.au wrote:
> I expect that document instances using W3C XML Schemas will use a "namespace"
> declaration to identify which XML Schema should be used to validate that
> document instance.  The problem that I see with the namespace it that a URI
> is the unique identifier.  There is no PUBLIC identifier.  As we have all
> probably experienced with old bookmarks, the content at URLs change a lot.
> If an XML Schema's version is not part of the URI and a new version of that
> XML Schema is made then it is likely that this will *not* be reflected in the
> URI and hence the namespace.

I think you're confusing two similar but distinct functions here: 
namespaces and schema locations. DTD declarations with external 
identifiers are equivalent to schemaLocation specifications, not 
namespace declarations--SGML (and XML without namespaces) had no 
equivalent of namespaces [except for HyTime architectures].

For DTDs, which are a syntactic part of the document that references 
them, the PUBLIC identifier is nothing more than an alias for the 
external DTD subset's storage location (i.e., it's filename). Thus it is 
completely appropriate that it include a version identifier since if the 
external declaration subset is changed it's a new object and should be 
identified as such.

By contrast, a document's namespace *does not* directly identify a 
schema. It identifies (or rather, can be exclusively associated with) an 
(abstract) "application" that might have any number of schemas 
associated with it. That is, for a given application, with a single 
associated namespace, there might be different schemas *at the same 
time*, reflecting different profiles or uses of the application, or 
there might be different schema versions over time reflecting changes 
over time to the details of the namespace. But the namespace itself is 
unchanged because the namespace identifies the application independent 
of it's various implementations over time. [For example, the XSLT 
namespace is invariant across versions of the XSLT spec because, as an 
abstract application, XSLT is XSLT regardless of the currently-defined 
details of it.]

In DTD-based documents that use external declaration subsets you always 
have to have an external identifier for the subset, so you always had 
something you could resolve or use in a catalog.

For non-DTD-based documents, there are two possible cases (assuming 
namespaced documents--the no-namespace case is degenerate and not worth 
considering because it allows no good general solution):

1. The document uses the schemaLocation= hint to say which specific 
schema it wants you to use.

2. The document specifies only a namespace.

In the first case, the schema location can either be local, relative URI 
or it can be an absolute URI. In this case of the absolute URI, the URI 
functions essentially as a PUBLIC ID does: it essentially demands local 
mapping to a local resource via some sort of catalog method (for the 
simple practical reason that most processing environments aren't always 
net connected or because the schema is not in fact served on a 
publicly-available server). If the absolute URI includes some sort of 
version value, then you have *exactly the same* functionality and 
semantics as with PUBLIC IDs for external DTD subsets.

In the second case, the implication is that the system must determine 
which version of the schema to use, which typically would be done using 
a catalog and probably implies that in most cases you want the latest or 
more general version of the schema. But, a processor could use other 
heuristics to decide which version to use, for example, looking in the 
document for other clues or using some outside information, such as 
metadata held in a document management system.

Thus, I think the appropriate approach in your case is to require the 
use of schemaLocation= with absolute URIs that include version 
information--that gives you the same control you had before.

It's important to remember that there is (and never was) any particular 
magic to PUBLIC identifiers--they are just magic strings that require 
indirection to be resolved. In that respect that are indistinguishable 
for URIs that also require indirection to be resolved to real resources.

Cheers,

Eliot

-- 
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(512) 372-8155

ekimber@innodata-isogen.com
www.innodata-isogen.com

Received on Wednesday, 4 May 2005 15:10:14 UTC