Re: target namespace and namespaces from W. Eliot Kimber on 2004-12-06 (xmlschema-dev@w3.org from December 2004)

From: W. Eliot Kimber <ekimber@innodata-isogen.com>
Date: Mon, 06 Dec 2004 13:24:21 -0600
To: Simon.Cox@csiro.au
CC: xmlschema-dev@w3.org
Message-ID: <41B4B1E5.1040904@innodata-isogen.com>
Simon.Cox@csiro.au wrote:

> This discussion appears to confuse "schema" with "schema document". 
> 
> Most complex schemas are factored/modularised into several schema
> documents, 
> with <includes> managing the dependencies within a single target
> namespace. 
> The precise subset of components from the target namespace that are
> found in each schema document is essentially arbitrary - it is just a
> packaging device to assist in maintenance. You could have one global
> component per document, or just one document for the namespace, or
> anything in between. 
> The exact packaging is logically unimportant - all that matters is the
> absence of clashes within the namespace for global components of the
> same name. 
> 
> A corollary of this is that different (potentially overlapping) subsets
> from the same namespace may be packaged into different documents, as
> convenient, for processors interested in different pieces of the
> complete schema. 

I have not confused them.

In the abstract I agree with your assertion that the organization of the 
declarations into different files is arbitrary and doesn't change the 
meaning of the *abstact* schema.

However, we have no *standard* way to talk about *abstract* schemas in a 
way that is unambiguous (that is, there is no defined way to utter the 
names of abstract schemas--namespaces are the closest we have, but the 
namespace spec is clear (as it should be) that there is no necessary 
relationship between namespace names and any defining artifacts; note 
that the W3C could, if it chose, define a standard for uttering the 
names of abstract schemas, but even if it did it would not, I think, 
solve the problem under discussion unless that specification since the 
issue is one of implicit binding between names (namespace names or 
yet-to-be-invented schema names) and concrete XSD schema documents.

So in practice, when trying to manage the actual associations between 
documents and their governing XSD schema documents, we have to have 
actual objects that represent the abstract schemas. That is, there exist 
XSD documents that are intended to be the root of a (possibly compound) 
schema definition. Those XSD documents, and only those XSD documents, 
can act as concrete representations of abstract schemas.

In that case, the problem is complicated when there is not a clearly 
distinguished root document for a given XSD schema that governs a given 
namespace. Or rather, if there are multiple XSD schema documents that 
specify targetNamespace= and some of those XSD documents are not schema 
roots, things can get confused.

For example, given a repository containing a whole bunch of XSD 
documents, I need to be able to answer the question "which of these XSD 
documents should I associate with a document that uses namespace 'X'?".

The answer to that question needs to consist of only those XSD documents 
that are the root of complete schema definitions. If some of the XSD 
documents are not roots but declare a target namespace then this cannot 
be done automatically given only the information in the schema documents 
themselves. This means that humans would have to then get involved to 
characterize particular schema documents as root or not-root.

One problem in this scenario is that many of the XSD documents would get 
into the system because they were associated with XML documents that 
were imported and thus were themselves automatically imported because 
there was a dependency relationship that needed to be maintained within 
the repository.

Having been imported, one would expect the system to automatically 
associate those schemas documents with other XML documents that used the 
same namespace but didn't, for example, use schemaLocation to point to 
them (and even if they did you'd still want to add the association).

That is, my goal, from an information management standpoint, is to 
satisfy this use case:

1. Import XML document A that uses namespace "X". This document has a 
schemaLocation= attribute that points to XSD document X1 governing 
namespace X.

2. During import XSD document X1 is imported, along with included or 
imported XSD document S1, which does not specify a target namespace 
(because it is not intended to be a schema root). The repository 
registers XSD document X1 as being a root XSD for namespace X. It 
registers S1 as being an XSD document that is used by X1 but does not 
associate it with any namespaces.

3. Import XML document B that uses namespace "X". This document *does 
not* have a schemaLocation= attribute.

4. The repository *automatically* associates XSD document X1 with 
document B as a governing schema but *does not* associate S1 (because it 
is not a schema root).

If S1 also declared a targetNamespace, then the respository would be 
obligated to associate it with namespace X as well and in step 4 would 
also associate it with document B even though S1 is not actually useful 
as a separate schema (because it's not a schema root).

If there was some way to reliably and automatically determine that a 
given XSD document is or is not a schema root then the usecase could be 
satisfied even when non-root XSD documents declared target namespaces.

Cheers,

Eliot

-- 
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(512) 372-8122

eliot@innodata-isogen.com
www.innodata-isogen.com
Received on Monday, 6 December 2004 19:24:38 UTC