RE: Permit (greedy) conflicting wildcards from noah_mendelsohn@us.ibm.com on 2007-04-09 (xmlschema-dev@w3.org from April 2007)

From: <noah_mendelsohn@us.ibm.com>
Date: Mon, 9 Apr 2007 17:18:15 -0400
To: "Michael Kay" <mike@saxonica.com>
Cc: "'Pete Cordell'" <petexmldev@tech-know-ware.com>, xmlschema-dev@w3.org
Message-ID: <OF3CED9952.9171C8B3-ON852572B8.00740E7D-852572B8.00750866@lotus.com>
(sorry for the belated reply -- I've been traveling)

Michael:  I can't help feeling that you're looking for something in the 
schema language that isn't there, and I believe for good reason.  I do 
understand, correctly I think, that you are building in Saxon an 
environment in which the work of "compiling" or preprocessing the 
individual components of a schema is ammortized across multiple 
validations, typically by preparing them in advance of first use.  This is 
surely a good strategy in the case where the schema as a whole is 
invariant between runs, but I think you're looking for something more, 
which is for the component such as a complexType resulting from a given 
source level declaration in a schema document to have the same validation 
semantics regardless of the larger schema in which it's employed.

Let me explain my reservation about this goal using an analogy.  Let's say 
that instead of a schema repository, I were building a Java programming 
system, and I was tempted to insist that each source-level class 
definition have the same execution semantic, regardless of the program in 
which it's run.  As I expect you're aware, that would preclude your system 
from executing some very important combinations of Java programs.  An 
example I have in mind is where you have:

        class derivedClass  extends com.example.baseClass

but in which two different programs use their class loaders to get 
different versions of com.example.baseClass.  Pretty much all bets are 
off.  The derivation may be legal in one program and not in another.  A 
particular method may resolve successfully to the base class in one 
program and not in another, or may quite likely resolve to two completely 
different methods in the respective versions of the base class.  As I 
expect you're aware, Java runtimes are required to handle this by doing 
late binding of classes to their bases, late binding of method invocation, 
etc.

I see schema as being very much in the same spirit.  You can do a degree 
of preprocessing on the source declarations of an element, type, 
attribute, etc., but the semantics of a particular component, or even 
whether the component violates any constraints, may depend on knowledge of 
other components, and those may usefully differ from one validation 
episode to another.  Insofar as Saxon makes conflicting assumptions, I 
would expect that it would at least have difficulty handling some use 
cases, and perhaps at times be in the position of not being fully 
conformant (e.g. rejecting a schema that the recommendation would consider 
legal.)  My intuition is that, to fully support the schema language, a 
processor like Saxon would need to use techniques in the spirit of those 
used in Java, I.e. to do final binding of the components once the 
combinations have been fully determined.  Of course, you can cache the 
results of such bindings too, which should get you good performance in the 
very common case where exactly the same schema is used repeatedly.

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








"Michael Kay" <mike@saxonica.com>
Sent by: xmlschema-dev-request@w3.org
03/21/2007 07:10 PM
 
        To:     <noah_mendelsohn@us.ibm.com>
        cc:     "'Pete Cordell'" <petexmldev@tech-know-ware.com>, 
<xmlschema-dev@w3.org>
        Subject:        RE: Permit (greedy) conflicting wildcards



> 
> You may well have some sort of cache as an implementation 
> strategy, but it's not an abstraction that appears in the 
> schema language. 

By and large, the spec simply says that the schema is a collection of 
schema
components gathered from some implementation-defined source.

> There are already a number of constructs 
> that have the same closed world feel.

That's true: for example lax validation, and redefines. They're all a bit
problematic, because you can't inspect a schema document and an instance 
and
know whether the instance is valid without knowing somethng else about the
validation environment. However, I don't think there are currently any 
cases
where an element E that conforms to a declaration D causes the instance to
be valid when D is absent from the schema but invalid when it is present.
Intuitively, this seems a little weird. It's likely to be particularly
problematic, I think, in an XQuery scenario where you are typically 
dealing
with a pool of long-lived schema information rather than with one 
validation
episode at a time.

> 
> While not written specifically to deal with these "action at 
> a distance" 
> mechanisms, this text makes pretty clear that the definition 
> of assessment is indeed of a completely assembled schema in 
> which all components are known.  Anything more incremental is 
> a processor implementation strategy that must not have 
> externally visible characteristics that conflict with the 
> normative rule.

Generally, the way Saxon enforces this rule is to freeze components in the
cache once they have been used: for example, once a type has been used for
validation then it can't be redefined. So it sounds as if the 
implementation
strategy for the not-in-schema wildcard will be "once you have used the 
fact
that element E is not in the schema, E must not be added to the schema". 
But
I may also have to think about providing a mechanism to remove components
from the schema since their mere presence can now cause problems.

Michael Kay
http://www.saxonica.com/
Received on Monday, 9 April 2007 21:18:28 UTC