Re: clarification of redefine semantics from James Taylor on 2005-02-11 (xmlschema-dev@w3.org from February 2005)

From: James Taylor <JTaylor@nextance.com>
Date: Fri, 11 Feb 2005 15:06:50 -0800
To: <xmlschema-dev@w3.org>
Message-ID: <9F5ED6009B16CE47B2C02694103CFBE30B5877@mail-1.nextance.com>
Below are two responses I received from Noah Mendelsohn regarding my question.  Thanks,

 

    James

 

>> Should the redefinition of Address apply throughout the
>> Company.xsd schema, including its usage within Employee.xsd,
>> with no errors regarding duplicate definitions of types?

That certainly is how I hope and expect that any clarification in Schema 1.1 would leave things.  My reading of Schema 1.0 is that it's absolutely unambiguous that if the combined schema is accepted at all, the redefinition of Address is pervasive and applies throughout the schema that results from the transitive closure of the files referenced from Company.xsd. 

Regarding the possibility that the includes would result in an error from the includes, the schema recommendation says [1]: 

"Note: The above is carefully worded so that multiple <include>ing of the same schema document will not constitute a violation of clause 2 of Schema Properties Correct (§3.15.6), but applications are allowed, indeed encouraged, to avoid <include>ing the same schema document more than once to forestall the necessity of establishing identity component by component." 

So it's clear that in the absence of the <redefine>, the multiple includes of address.xsd MUST not result in an error.  I believe that the intention is that all such includes be resolved and checked before any redefinitions are applied, and thus that the schema will be correctly constructed in the manner you summarize above.  I can see that there is some wiggle room on the spec prose that might not deal clearly rule out the possibility that there was a conflict between the redefining component and one of the other included versions.  As I say, I don't believe that such an error was intended, but other members of the schema WG might disagree, and I suppose the 1.0 Recommendation could be read as not being clear on this point.  It certainly does not require an error in my opinion.   

The xmlschema-dev@w3.org mailing list is a great place to ask questions like this, and you have my permission to relay my part of this correspondence to that list if you want to see whether others would agree with my reading.  Thanks! 

Noah 

[1] http://www.w3.org/TR/2004/PER-xmlschema-1-20040318/#compound-schema

 

<Here is the prior response from Noah>

 

FWIW, one of the goals of the XML Schema 1.1 Recommendation will be to clarify the semantics of all the composition operations, including <redefine>.   I believe the intention is clear if you read Schema 1.0, but certainly the presentation isn't as clear as it should be.   

Here's a brief overview of how I've always taken the design to work.  Keep in mind that XML schema gives no significant semantics to a schema document such as a.xsd in isolation.  The definitions in it don't stand on their own.  For example facets may be inherited from a base type in an imported namespace, and where you get the definitions from that imported namespace can vary from one use of a.xsd to another.   Note that schemaLocation on an import is always a hint:  one processor may follow it and another may not.  Of course, you should get a processor that reliably implements the policy you need, but a.xsd is in principle usable in any processor, and you can't know from that document in isolation what will be imported.  Of course, even if you knew b.xsd on Feb. 1, it might change the next day. 

As the schema recommendation says [1]: 

"This mechanism is intended to provide a declarative and modular approach to schema modification, with functionality no different except in scope from what would be achieved by wholesale text copying and redefinition by editing. In particular redefining a type is not guaranteed to be side-effect free: it may have unexpected impacts on other type definitions which are based on the redefined one, even to the extent that some such definitions become ill-formed. 
Note: The pervasive impact of redefinition reinforces the need for implementations to adopt some form of lazy or 'just-in-time' approach to component construction, which is also called for in order to avoid inappropriate dependencies on the order in which definitions and references appear in (collections of) schema documents." 


and, using simple types as an example: 

Each schema contains 

"Corresponding to each non-<annotation> member of the [children] of a <redefine> there are one or two schema components in the <redefine>ing schema: 
1 The <simpleType> and <complexType> [children] information items each correspond to two components: 
1.1 One component which corresponds to the top-level definition item with the same name in the <redefine>d schema document, as defined in Schema Component Details (§3), except that its {name} is ·absent·; 
1.2 One component which corresponds to the information item itself, as defined in Schema Component Details (§3), except that its {base type definition} is the component defined in 1.1 above. 
This pairing ensures the coherence constraints on type definitions are respected, while at the same time achieving the desired effect, namely that references to names of redefined components in both the <redefine>ing and <redefine>d schema documents resolve to the redefined component as specified in 1.2 above." 

I know that second one is a bit dense, but what it basically says this: 

* We're building a schema (collection of components) that can be used for validation.   

*If simple type "ST" is redefined, then there are two components in the schema:  the base one and the redefining one.   

*Only the latter gets the name ST. Thus, all references from anywhere in the schema to the simple type ST are to the redefined one.  This is true regardless of the schema document in which such a reference occurs.  It's specifically true even if there is a reference in the document that contained the original unredefined XT.  I suppose you could make a broad analogy to virtual method calls.  Even if you call a seemingly local method,  you may actually get a redefinition from a derived class. (That's what the somewhat oblique reference to lazy implementations in the first quote is trying to say: you can't resolve any reference to a redefineable QName from any schema document until you are sure you have seen and accounted for all possible redefinitions.  It is the most redefined version that is used in all cases; the sole exception is that, using our example, the redefining ST has as its base type the now anonymous original verion.) 

My expectation is that Schema 1.1 will attempt to clarify and make more formal the explanation of this behavior, but it is unlikely to substantially change.  I hope this is helpful, 

Noah 

[1] http://www.w3.org/TR/2004/PER-xmlschema-1-20040318/#modify-schema 



--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
Received on Saturday, 12 February 2005 00:14:13 UTC