- From: <noah_mendelsohn@us.ibm.com>
- Date: Wed, 16 Mar 2005 21:39:20 -0500
- To: Kasimier Buchcik <kbuchcik@4commerce.de>
- Cc: xmlschema-dev@w3.org
I think you're basically right. Streaming is a matter of degree, and there are all sorts of middle ground implementation tricks. In principle you could, for example, stream most of hte time, but be willing to backtrack and recompute selected results based on later information in the schema. The schema recommendation mostly doesn't tell you what order to do things at all. It says, given the pair {schema, instance} and a few details like a starting complex type or element decl, here's the PSVI you must produce. How incrementally you discover that schema, how many false starts you make, etc. is up to you. In the end, you must be able to say "I finally decided that the schema was S, and given that and instance I, the correct PSVI was produced." -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- Kasimier Buchcik <kbuchcik@4commerce.de> 03/16/05 05:33 PM To: noah_mendelsohn@us.ibm.com cc: xmlschema-dev@w3.org Subject: Re: Multiple and circular import/include Hi, OK, I think I got it now with the help of you and Henry. It boils down that I have the freedom to actually say to the instance: "I cannot validate you in streaming mode, since the result would differ from a non-streaming validation". So only a subset of instances will be able to be streamed, which seems to be according to the spec - although not completely defined there; this might open a door for differing implementations. Good to head that all! I can only recommend to schema authors to certificate their instances as being streamable for all schema processors :-) Thanks & regards, Kasimier noah_mendelsohn@us.ibm.com wrote: > Replying to questions in several of your notes: > > Kasimier Buchcik writes: > > >>Can you already make any statement whether >>component identity checks will still play a role >>in the forthcoming spec? > > > Everyone agrees that having some well crafted notion of identity is > important. For example, you probably want two different conforming > processors to agree on how many components are created from a given > combination of schema documents. The intention is that he rules be as > consistent as possible with those in Schema 1.0, at least insofar as > Schema 1.0 was unambiguous. How best to formulate notions of component > identity in 1.1 is under discussion. Several proposals are being > actively considered. > > >>Wouldn't this break streaming validation? If >>streaming, such a schema to be imported is not >>known until that specific importing node is >>reached - the preceeding nodes of the tree do not >>know of it. Are streaming validators expected to >>prescan the instance, resulting in parsing an >>instance document _twice_? Sounds strange to me, >>but maby I didn't get the statement right. > > > No, the idea is that streaming should be practical in many cases, but I > think you're confused about the model we use to explain it. The way I > believe 1.0 works, and the way I believe 1.1 should work (not everyone > quite agrees with me on this) is that streaming is often possible, but is > just an optimization. You do NOT have to prescan the instance. What you > do need to be sure is that when you finally stumble on and process an > xsi:schemaLocation that its presence would not change any validation results to which you have > already committed. For example, if you've already claimed that an element "e" couldn't > be validated due to a missing element declaration, you can't then start > validating later "e"s with a newly acquired declaration. The common case, > however, is that you hit the xsi:schemaLocation before any elements to > which it applies, and so no such conflict arises; you can pick up the > element declaration and any associated types, and then act as if that > declaration had been in the schema from the start. > > Stated differently: by the time you get to the end of your document, you > will have incrementally assembled a schema. It MUST be the case that if > you were to redo the entire validation using exactly that static schema, > you would get the identical result. That's the sense in which streaming > is an optimization. > > Thus, the result is computed in a streaming way, but is the same as one > would have gotten IF you had prescanned and found the xsi:schemaLocation > in advance. So, streaming and non-streaming processors must report the > same results (except insofar as the recommendation provides lattitude in > other ways.)
Received on Thursday, 17 March 2005 02:40:00 UTC