Re: One Schema Per Namespace [was Visibility modifiers for namedSchema components -- Schema 1.1 feature?]

Michael Kay wrote:
> I think your use case raises another point, however, which is the fact that
> it's entirely reasonable for more than one schema to exist for the same
> namespace. The spec seems to be very confused about whether this is a good
> idea or not. 

I agree that this is an important issue. I've been running into this in 
the context of generalized XML content management, where you want your 
repository to be able to both manage your schema instances and provide 
the service of relating documents to schemas through some sort of mapping.

The easiest solution is for there to be exactly one schema instance for 
a given namespace. However, it quickly becomes clear that this 
constraint is unreasonable for several reasons:

- You might have exactly one *compound* schema that is decomposed into 
separate schema instances that all target the same namespace. It is 
unreasonable to disallow this useful approach to managing your schema 
source.

- You might have different schema variants for the same namespace for 
different purposes (authoring vs rendering vs interchange, for example).

In the case of managing schemas generally, you would like to be able to 
import a schema document and have the system automatically associate the 
governed namespace with the schema instance so that subsequent requests 
for the schema associated with the namespace will return the appropriate 
schema instance. However, both cases there is ambiguity about which 
schema instance to use. Likewise, if you import a new document you would 
like to be able to tell unambiguously if the repository already has a 
copy of the appropriate schema without having to do compares of the 
schema documents (which isn't necessarily a definitive test anyway).

In the case of a compound schema there is no obvious way to determine 
which of the schema instances is the "root" instance of the compound 
schema document by simple inspection. The only way I've found so far to 
address this is to assume that only schemas that are not used by any 
other schema are top-level schemas documents and all others are not, but 
I realize that this is not always true in the general case so it's not a 
perfect solution. In particular, if you create a naive 
schema-instance-to-namespace mapping, after importing a compound schema 
you will have many schema instances that map to the same namespace and 
no obvious way to tell which one is the top-level. Your import process 
has to know which one is the root (or you have to be able to determine 
it after the fact or simply require a human to identify the root).

In the second case, where you have variants, there is nothing you can do 
since even if you have single-instance schemas they all govern the same 
namespace and therefore there's no basis, based on the contents of the 
schemas themselves, on which to choose one or the other as the "right" 
one to govern a given document in the governed namespace. This means 
that the only solution is to use application- and use-case-specific 
metadata to both distinguish the different variants and to resolve 
references as needed. This makes the schema selection mechanism even 
less standard than it is by default since the input to the 
"getSchemaForDocument()" method must include application-specific 
parameters and/or heuristics to choose the right variant.

And note that you cannot rely on schemaLocation= to disambiguate when 
the schemaLocation= value itself is a reference to an abstract resource 
that the repository is expected to resolve to a specific variant at 
resolution time (many CMSes have the concept of resources with variants).

I'm not sure that there's anything the XSD spec can do to help other 
than perhaps formally recognizing the issue and either requiring that 
schema-aware applications provide some way to discriminate among schema 
instances for the same namespace or provide a place to put 
discriminating metadata in both the schema and in schema references. It 
would be useful if you could indicate for a given schema instance 
whether it can or cannot be the root of a compound schema. That would at 
least make it possible to quickly not include non-root schema instances 
in your namespace-to-schema map when you import a new compound schema.

My point here simply to emphasize that the issues Mike identifies are 
both real and important, at least in the context of generalized XML 
content management.

Cheers,

Eliot
-- 
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(214) 954-5198

ekimber@innodata-isogen.com
www.innodata-isogen.com

Received on Thursday, 14 September 2006 12:37:43 UTC