- From: Roberto Chinnici <Roberto.Chinnici@Sun.COM>
- Date: Wed, 20 Jul 2005 21:19:56 -0400
- To: "Rogers, Tony" <Tony.Rogers@ca.com>
- Cc: David Orchard <dorchard@bea.com>, www-ws-desc@w3.org
Rogers, Tony wrote: > One of the "interesting" aspects of the problem is that we must solve is > how we decide on the interpretation of ambiguous results. > > For example, it will be legal to take your example: > > > <type name="shipto"> > > <sequence> > > <element ref="ad:name" minOccurs="1" maxOccurs="unbounded"/> > > <element ref="nad:country" minOccurs="0"/> > > </sequence> > > </type> > > > > (yes, I meant to change that to minOccurs) > > > > and feed it data like: > > > > <shipto> > > <ad:name>fred</ad:name> > > <nad:country>Australia</nad:country> > > <ad:name>bill</ad:name> > > </shipto> > > > > which can legitimately be interpreted (after ignorance has been applied) as: > > > > <shipto> > > <ad:name>fred</ad:name> > > <ad:name>bill</ad:name> > > </shipto> > > > > OR > > > > <shipto> > > <ad:name>fred</ad:name> > > <nad:country>Australia</nad:country> > > </shipto> > > > > The latter is my expected interpretation (and may well be the easier to > program), but the former is legitimate (it takes the approach of > grabbing as many ad:name elements as it can, and it still satisfies the > schema). > > > > What do other people think? I tend to go with the first interpretation. Here's how I'd define the "ignore unexpected" rule. This definition is not phrased directly in terms of XML Schema, and I don't claim that it would be trivial to do so, quite the contrary. Nevertheless, it seems compatible with it; if anybody thinks otherwise, please point out where I'm wrong. That scourge of all schema authors, the UPA rule, was introduced to make sure the schema was determistic. I assume then that at any given stage during the parsing of the contents of an element, the set of start tages that can legally be encountered is determined and each tag in that set is associated with exactly one transition to a new state. (I believe we can safely ignore character content for the purposes of our discussion.) Note that the set above, or better the set of names of all start tags that can be encountered at any given state, may be infinite due to the presence of a wildcard. This doesn't cause any problems -- all we need is that the characteristic function of this set be computable. Off the top of my head, I don't think that substitution groups would be an issue either, they just make the construction of the set more complex, nor would xsi:schemaLocation. Now, the "ignore unexpected" rule is defined as saying that if at a given state the processor encounter a start tag for an element whose name is not in the set of expected start tags for that state, the element is discarded. Subsequently, the processor keeps operating in the same state it was into (where would it transition to otherwise?), as if the discarded element had never been there. Surely there are a few more tweaks that we need to do, like requiring for some special treatment for the root element of a document and dealing with attributes, but I hope that the definition I proposed is clear enough. If we apply it to the example then, we obtain that <shipto> <ad:name>fred</ad:name> <nad:country>Australia</nad:country> <ad:name>bill</ad:name> </shipto> will be treated as <shipto> <ad:name>fred</ad:name> <ad:name>bill</ad:name> </shipto> Let's look at a slightly more interesting example. Assume the following schema: (note the maxOccurs="2") <type name="shipto"> <sequence> <element ref="ad:name" minOccurs="1" maxOccurs="2"/> <element ref="nad:country" minOccurs="0"/> </sequence> </type> Then this document: <shipto> <ad:name>fred</ad:name> <nad:country>Australia</nad:country> <ad:name>bill</ad:name> <nad:country>New Zealand</nad:country> <ad:name>jim</ad:name> </shipto> will be treated as: <shipto> <ad:name>fred</ad:name> <ad:name>bill</ad:name> </shipto> Thanks, Roberto
Received on Thursday, 21 July 2005 01:20:49 UTC