Re: Multiple and circular import/include

I think you're basically right.  Streaming is a matter of degree, and 
there are all sorts of middle ground implementation tricks.  In principle 
you could, for example, stream most of hte time, but be willing to 
backtrack and recompute selected results based on later information in the 
schema.   The schema recommendation mostly doesn't tell you what order to 
do things at all.  It says, given the pair {schema, instance} and a few 
details like a starting complex type or element decl, here's the PSVI you 
must produce.  How incrementally you discover that schema, how many false 
starts you make, etc. is up to you.  In the end, you must be able to say 
"I finally decided that the schema was S, and given that and instance I, 
the correct PSVI was produced."

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








Kasimier Buchcik <kbuchcik@4commerce.de>
03/16/05 05:33 PM

 
        To:     noah_mendelsohn@us.ibm.com
        cc:     xmlschema-dev@w3.org
        Subject:        Re: Multiple and circular import/include


Hi,

OK, I think I got it now with the help of you and Henry. It boils down
that I have the freedom to actually say to the instance: "I cannot
validate you in streaming mode, since the result would differ from
a non-streaming validation". So only a subset of instances will be able
to be streamed, which seems to be according to the spec - although not
completely defined there; this might open a door for differing
implementations.
Good to head that all! I can only recommend to schema authors to
certificate their instances as being streamable for all schema
processors :-)

Thanks & regards,

Kasimier

noah_mendelsohn@us.ibm.com wrote:
> Replying to questions in several of your notes: 
> 
> Kasimier Buchcik writes:
> 
> 
>>Can you already make any statement whether
>>component identity checks will still play a role
>>in the forthcoming spec?
> 
> 
> Everyone agrees that having some well crafted notion of identity is 
> important.  For example, you probably want two different conforming 
> processors to agree on how many components are created from a given 
> combination of schema documents.  The intention is that he rules be as 
> consistent as possible with those in Schema 1.0, at least insofar as 
> Schema 1.0 was unambiguous.  How best to formulate notions of component 
> identity in 1.1 is under discussion.   Several proposals are being 
> actively considered.
> 
> 
>>Wouldn't this break streaming validation? If
>>streaming, such a schema to be imported is not
>>known until that specific importing node is
>>reached - the preceeding nodes of the tree do not
>>know of it. Are streaming validators expected to
>>prescan the instance, resulting in parsing an
>>instance document _twice_? Sounds strange to me,
>>but maby I didn't get the statement right.
> 
> 
> No, the idea is that streaming should be practical in many cases, but I 
> think you're confused about the model we use to explain it.  The way I 
> believe 1.0 works, and the way I believe 1.1 should work (not everyone 
> quite agrees with me on this) is that streaming is often possible, but 
is 
> just an optimization.  You do NOT have to prescan the instance.  What 
you 
> do need to be sure is that when you finally stumble on and process an 
> xsi:schemaLocation that its presence would not change any validation 
results to which you have 
> already committed.    For example, if you've already claimed that an 
element "e" couldn't 
> be validated due to a missing element declaration, you can't then start 
> validating later "e"s with a newly acquired declaration.  The common 
case, 
> however, is that you hit the xsi:schemaLocation before any elements to 
> which it applies, and so no such conflict arises; you can pick up the 
> element declaration and any associated types, and then act as if that 
> declaration had been in the schema from the start. 
> 
> Stated differently: by the time you get to the end of your document, you 

> will have incrementally assembled a schema.  It MUST be the case that if 

> you were to redo the entire validation using exactly that static schema, 

> you would get the identical result.  That's the sense in which streaming 

> is an optimization.
> 
> Thus, the result is computed in a streaming way, but is the same as one 
> would have gotten IF you had prescanned and found the xsi:schemaLocation 

> in advance.  So, streaming and non-streaming processors must report the 
> same results (except insofar as the recommendation provides lattitude in 

> other ways.)

Received on Thursday, 17 March 2005 02:40:00 UTC