Re: Multiple and circular import/include from noah_mendelsohn@us.ibm.com on 2005-03-16 (xmlschema-dev@w3.org from March 2005)

From: <noah_mendelsohn@us.ibm.com>
Date: Wed, 16 Mar 2005 15:52:54 -0500
To: Kasimier Buchcik <kbuchcik@4commerce.de>
Cc: xmlschema-dev@w3.org
Message-ID: <OF0C902319.FB8CAEE7-ON85256FC6.0071A537@lotus.com>

Replying to questions in several of your notes: 

Kasimier Buchcik writes:

> Can you already make any statement whether
> component identity checks will still play a role
> in the forthcoming spec?

Everyone agrees that having some well crafted notion of identity is 
important.  For example, you probably want two different conforming 
processors to agree on how many components are created from a given 
combination of schema documents.  The intention is that he rules be as 
consistent as possible with those in Schema 1.0, at least insofar as 
Schema 1.0 was unambiguous.  How best to formulate notions of component 
identity in 1.1 is under discussion.   Several proposals are being 
actively considered.

> Wouldn't this break streaming validation? If
> streaming, such a schema to be imported is not
> known until that specific importing node is
> reached - the preceeding nodes of the tree do not
> know of it. Are streaming validators expected to
> prescan the instance, resulting in parsing an
> instance document _twice_? Sounds strange to me,
> but maby I didn't get the statement right.

No, the idea is that streaming should be practical in many cases, but I 
think you're confused about the model we use to explain it.  The way I 
believe 1.0 works, and the way I believe 1.1 should work (not everyone 
quite agrees with me on this) is that streaming is often possible, but is 
just an optimization.  You do NOT have to prescan the instance.  What you 
do need to be sure is that when you finally stumble on and process an 
xsi:schemaLocation that its presence would not change any validation results to which you have 
already committed.    For example, if you've already claimed that an element "e" couldn't 
be validated due to a missing element declaration, you can't then start 
validating later "e"s with a newly acquired declaration.  The common case, 
however, is that you hit the xsi:schemaLocation before any elements to 
which it applies, and so no such conflict arises; you can pick up the 
element declaration and any associated types, and then act as if that 
declaration had been in the schema from the start. 

Stated differently: by the time you get to the end of your document, you 
will have incrementally assembled a schema.  It MUST be the case that if 
you were to redo the entire validation using exactly that static schema, 
you would get the identical result.  That's the sense in which streaming 
is an optimization.

Thus, the result is computed in a streaming way, but is the same as one 
would have gotten IF you had prescanned and found the xsi:schemaLocation 
in advance.  So, streaming and non-streaming processors must report the 
same results (except insofar as the recommendation provides lattitude in 
other ways.)

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Wednesday, 16 March 2005 20:53:41 UTC