- From: Rogers, Tony <Tony.Rogers@ca.com>
- Date: Thu, 21 Jul 2005 22:08:16 +1000
- To: "Roberto Chinnici" <Roberto.Chinnici@Sun.COM>
- Cc: "David Orchard" <dorchard@bea.com>, <www-ws-desc@w3.org>
- Message-ID: <7997F38251504E43B38435DAF917887F40C5FD@ausyms23.ca.com>
I don't think that follows. I do like the way you phrase the argument, but I come to a different result :-) The processor accepts the first name. At that point it is looking either for another name, or for a country, or for the end of the shipto. The next tag is a country. So it accepts the country, and makes a transition. Then it is looking for another country, or the end of the shipto. It does not return to a state in which it will accept another name, so it will ignore any more names. Additionally, I think it should treat the appearance of a name after a country as a transition to a new state - one in which it is simply looking for the end of the shipto. This is because a list of elements cannot be interrupted by another element (I wasn't thinking clearly last night :-) ). So I contend that the processor should accept exactly one name and one country. Note that the new schema might accept a list of alternating names and countries (that is describable, isn't it?), but our schema does not. We must keep in mind that the processor is being expected to handle data that conforms to a larger, but compatible, schema - we can expect that that larger schema still abides by the rules of schema. As I said, I do like the way you phrased this. I think we can use it as the basis of an algorithm that can describe what surgery is required on the incoming data to make it acceptable. And it's determinisitic - that's good. Tony -----Original Message----- From: www-ws-desc-request@w3.org on behalf of Roberto Chinnici Sent: Thu 21-Jul-05 11:19 To: Rogers, Tony Cc: David Orchard; www-ws-desc@w3.org Subject: Re: LC124: Comment on V2S and [validity]=notKnown Rogers, Tony wrote: > One of the "interesting" aspects of the problem is that we must solve is > how we decide on the interpretation of ambiguous results. > > For example, it will be legal to take your example: > > > <type name="shipto"> > > <sequence> > > <element ref="ad:name" minOccurs="1" maxOccurs="unbounded"/> > > <element ref="nad:country" minOccurs="0"/> > > </sequence> > > </type> > > > > (yes, I meant to change that to minOccurs) > > > > and feed it data like: > > > > <shipto> > > <ad:name>fred</ad:name> > > <nad:country>Australia</nad:country> > > <ad:name>bill</ad:name> > > </shipto> > > > > which can legitimately be interpreted (after ignorance has been applied) as: > > > > <shipto> > > <ad:name>fred</ad:name> > > <ad:name>bill</ad:name> > > </shipto> > > > > OR > > > > <shipto> > > <ad:name>fred</ad:name> > > <nad:country>Australia</nad:country> > > </shipto> > > > > The latter is my expected interpretation (and may well be the easier to > program), but the former is legitimate (it takes the approach of > grabbing as many ad:name elements as it can, and it still satisfies the > schema). > > > > What do other people think? I tend to go with the first interpretation. Here's how I'd define the "ignore unexpected" rule. This definition is not phrased directly in terms of XML Schema, and I don't claim that it would be trivial to do so, quite the contrary. Nevertheless, it seems compatible with it; if anybody thinks otherwise, please point out where I'm wrong. That scourge of all schema authors, the UPA rule, was introduced to make sure the schema was determistic. I assume then that at any given stage during the parsing of the contents of an element, the set of start tages that can legally be encountered is determined and each tag in that set is associated with exactly one transition to a new state. (I believe we can safely ignore character content for the purposes of our discussion.) Note that the set above, or better the set of names of all start tags that can be encountered at any given state, may be infinite due to the presence of a wildcard. This doesn't cause any problems -- all we need is that the characteristic function of this set be computable. Off the top of my head, I don't think that substitution groups would be an issue either, they just make the construction of the set more complex, nor would xsi:schemaLocation. Now, the "ignore unexpected" rule is defined as saying that if at a given state the processor encounter a start tag for an element whose name is not in the set of expected start tags for that state, the element is discarded. Subsequently, the processor keeps operating in the same state it was into (where would it transition to otherwise?), as if the discarded element had never been there. Surely there are a few more tweaks that we need to do, like requiring for some special treatment for the root element of a document and dealing with attributes, but I hope that the definition I proposed is clear enough. If we apply it to the example then, we obtain that <shipto> <ad:name>fred</ad:name> <nad:country>Australia</nad:country> <ad:name>bill</ad:name> </shipto> will be treated as <shipto> <ad:name>fred</ad:name> <ad:name>bill</ad:name> </shipto> Let's look at a slightly more interesting example. Assume the following schema: (note the maxOccurs="2") <type name="shipto"> <sequence> <element ref="ad:name" minOccurs="1" maxOccurs="2"/> <element ref="nad:country" minOccurs="0"/> </sequence> </type> Then this document: <shipto> <ad:name>fred</ad:name> <nad:country>Australia</nad:country> <ad:name>bill</ad:name> <nad:country>New Zealand</nad:country> <ad:name>jim</ad:name> </shipto> will be treated as: <shipto> <ad:name>fred</ad:name> <ad:name>bill</ad:name> </shipto> Thanks, Roberto
Received on Thursday, 21 July 2005 12:08:22 UTC