- From: <noah_mendelsohn@us.ibm.com>
- Date: Mon, 9 Apr 2007 17:18:15 -0400
- To: "Michael Kay" <mike@saxonica.com>
- Cc: "'Pete Cordell'" <petexmldev@tech-know-ware.com>, xmlschema-dev@w3.org
(sorry for the belated reply -- I've been traveling) Michael: I can't help feeling that you're looking for something in the schema language that isn't there, and I believe for good reason. I do understand, correctly I think, that you are building in Saxon an environment in which the work of "compiling" or preprocessing the individual components of a schema is ammortized across multiple validations, typically by preparing them in advance of first use. This is surely a good strategy in the case where the schema as a whole is invariant between runs, but I think you're looking for something more, which is for the component such as a complexType resulting from a given source level declaration in a schema document to have the same validation semantics regardless of the larger schema in which it's employed. Let me explain my reservation about this goal using an analogy. Let's say that instead of a schema repository, I were building a Java programming system, and I was tempted to insist that each source-level class definition have the same execution semantic, regardless of the program in which it's run. As I expect you're aware, that would preclude your system from executing some very important combinations of Java programs. An example I have in mind is where you have: class derivedClass extends com.example.baseClass but in which two different programs use their class loaders to get different versions of com.example.baseClass. Pretty much all bets are off. The derivation may be legal in one program and not in another. A particular method may resolve successfully to the base class in one program and not in another, or may quite likely resolve to two completely different methods in the respective versions of the base class. As I expect you're aware, Java runtimes are required to handle this by doing late binding of classes to their bases, late binding of method invocation, etc. I see schema as being very much in the same spirit. You can do a degree of preprocessing on the source declarations of an element, type, attribute, etc., but the semantics of a particular component, or even whether the component violates any constraints, may depend on knowledge of other components, and those may usefully differ from one validation episode to another. Insofar as Saxon makes conflicting assumptions, I would expect that it would at least have difficulty handling some use cases, and perhaps at times be in the position of not being fully conformant (e.g. rejecting a schema that the recommendation would consider legal.) My intuition is that, to fully support the schema language, a processor like Saxon would need to use techniques in the spirit of those used in Java, I.e. to do final binding of the components once the combinations have been fully determined. Of course, you can cache the results of such bindings too, which should get you good performance in the very common case where exactly the same schema is used repeatedly. -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- "Michael Kay" <mike@saxonica.com> Sent by: xmlschema-dev-request@w3.org 03/21/2007 07:10 PM To: <noah_mendelsohn@us.ibm.com> cc: "'Pete Cordell'" <petexmldev@tech-know-ware.com>, <xmlschema-dev@w3.org> Subject: RE: Permit (greedy) conflicting wildcards > > You may well have some sort of cache as an implementation > strategy, but it's not an abstraction that appears in the > schema language. By and large, the spec simply says that the schema is a collection of schema components gathered from some implementation-defined source. > There are already a number of constructs > that have the same closed world feel. That's true: for example lax validation, and redefines. They're all a bit problematic, because you can't inspect a schema document and an instance and know whether the instance is valid without knowing somethng else about the validation environment. However, I don't think there are currently any cases where an element E that conforms to a declaration D causes the instance to be valid when D is absent from the schema but invalid when it is present. Intuitively, this seems a little weird. It's likely to be particularly problematic, I think, in an XQuery scenario where you are typically dealing with a pool of long-lived schema information rather than with one validation episode at a time. > > While not written specifically to deal with these "action at > a distance" > mechanisms, this text makes pretty clear that the definition > of assessment is indeed of a completely assembled schema in > which all components are known. Anything more incremental is > a processor implementation strategy that must not have > externally visible characteristics that conflict with the > normative rule. Generally, the way Saxon enforces this rule is to freeze components in the cache once they have been used: for example, once a type has been used for validation then it can't be redefined. So it sounds as if the implementation strategy for the not-in-schema wildcard will be "once you have used the fact that element E is not in the schema, E must not be added to the schema". But I may also have to think about providing a mechanism to remove components from the schema since their mere presence can now cause problems. Michael Kay http://www.saxonica.com/
Received on Monday, 9 April 2007 21:18:28 UTC