Re: Is schemaLocation just a hint in schema import?

Let me give a slightly different spin on Xan's responses.

Xan Gregg writes:

> > [Leo Antoli writes]:
> > Does it mean that you can't have 2 schemas
> > defining different elements for the same
> > namespace and then import both from another
> > schema?
> 
> Not reliably, anyway. For a given schema usage,
> there is one schema for a given namespace URI. A
> "schema document" is different from a
> "schema". From the XML Schema perspective, a
> schema is roughly a set of schema components all
> having the same namespace URI.

Not quite how I see it.  A schema is a set of components to be used for a 
validation, and crucially it can intgrate multiple target namespaces. 
E.g., if a type and its basetype are in different target NS's, or if an 
element and its substitution group head are in different targetNS's, then 
it becomes clear why there is one schema integrating all the components, 
rather than a separate schema per namespace.  For example, I might have a 
schema document with the single declaration:

<schema targetNamespace="ns1uri" xmlns:ns2="ns2uri">
<simpleType name="derived">
  <restriction base="ns2:othertype>
    <maxInclusive value="10"/>
  </restriction>
</simpleType>
</schema>

So, Xan is exactly right that schema documents (I.e. the files that 
usually have a suffix .xsd) and schemas (the abstract components) are very 
different, but a schema typically involves multiple target namespaces.  It 
should be clear that the actual component corresponding to type "derived" 
will get many of its facets from a base type in another namespace.

> > Should everything for a given schema be defined
> > in just one schema file? If you have a big
> > schema definition for different functional
> > areas, it might be useful to split the schema in
> > several files so you don't need to import
> > elements that you won't be using.
> 
> There may be multiple schema documents used to
> create a schema; however, you can only reliably
> import a schema using a single location. Instead,
> you should build a grouping schema document that
> <xs:include>s the other schema documents that are
> in the same namespace and then import the schema
> using the location of the grouping schema
> document.

If you look closely, you'll find that the schemaLocation, if it's honored, 
contributes components not just for use in the importing schema document, 
but to the overall schema being assembled.  In general, the way to think 
about it is that when the dust settles, there either is or there isn't in 
your schema a declaration for element n:e (for example).  If there is, 
then a ref="n:e" will resolve from anywhere, regardless of whether the 
schema document doing the reference has an import for that schemaLocation 
(actually, for reasons not worth going into here, the schema document does 
need an import for the namespace, but not necessarily for that 
schemaLocation.  Indeed, it's perfectly possible to provide an import with 
no schemaLocation.  Where the schemaLocation is has nothing to do with 
which references resolve.) 

The key point is that global elements, types, etc. are pervasively visible 
if they exist at all. 

> > Can anybody tell me the motivation for this note
> > in the schema spec?
> 
> To allow maximum flexibility in schema processors. For instance, to 
> allow a processor to keep known schemas in a cache, while ignoring 
> the location hint altogether. However, this flexibility can lead to 
> other problems, for instance, in the presence of multiple versions of 
> the same schema.

I'd refine that a bit.  It's very important that schemaLocation >in the 
instance< be a hint.  Otherwise, in eCommerce scenarios, I could send you 
an electronic document and force you to validate it with my schema, which 
could be totally bogus.  In the typical eCommerce scenario, the receiver 
wants to validate with its own schema.

Following the analogy, by making schemaLocation a hint, we avoid a 
situation in which some external schema document you may want to use for 
some particular namespace can force you to use declarations you don't want 
for some other.  So, for example, the schema that defines the purchase 
order body of a SOAP envelope cannot, without the agreement of the 
invoking processor (which chooses to follow the hint), force you to adopt 
some faulty definitions of the containing SOAP envelope.

Having said all that, I find the case for making schemaLocation a hint on 
the instance to be more compelling than on import.  We did try to keep 
them parallel, but perhaps that was a mistake.  Maybe we should have 
provided a mode in which the schemaLocation on import was mandatory, but 
the processor could decline to validate at all if it was unhappy using it.

> xan

Noah

P.S. I see that this thread is cross-posted to xml-dev and schema-dev.  As 
a defense against inbox overload, I'm afraid I regularly monitor only the 
latter.  I may not see responses sent only to xml-dev.  Thank you.

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Monday, 9 October 2006 23:38:52 UTC