Re: LC124: Comment on V2S and [validity]=notKnown from Roberto Chinnici on 2005-07-21 (www-ws-desc@w3.org from July 2005)

From: Roberto Chinnici <Roberto.Chinnici@Sun.COM>
Date: Thu, 21 Jul 2005 09:09:25 -0400
To: "Rogers, Tony" <Tony.Rogers@ca.com>
Cc: David Orchard <dorchard@bea.com>, www-ws-desc@w3.org
Message-id: <42DF9E85.2040406@sun.com>
Yes, actually after rereading the emails this morning I agree with
you and I find my application of the algorithm I described faulty.

Roberto

Rogers, Tony wrote:
> I don't think that follows. I do like the way you phrase the argument, 
> but I come to a different result :-)
>  
> The processor accepts the first name. At that point it is looking either 
> for another name, or for a country, or for the end of the shipto. The 
> next tag is a country. So it accepts the country, and makes a 
> transition. Then it is looking for another country, or the end of the 
> shipto. It does not return to a state in which it will accept another 
> name, so it will ignore any more names.
>  
> Additionally, I think it should treat the appearance of a name after a 
> country as a transition to a new state - one in which it is simply 
> looking for the end of the shipto. This is because a list of elements 
> cannot be interrupted by another element (I wasn't thinking clearly last 
> night :-) ). So I contend that the processor should accept exactly one 
> name and one country. Note that the new schema might accept a list of 
> alternating names and countries (that is describable, isn't it?), but 
> our schema does not.
>  
> We must keep in mind that the processor is being expected to handle data 
> that conforms to a larger, but compatible, schema - we can expect that 
> that larger schema still abides by the rules of schema.
>  
> As I said, I do like the way you phrased this. I think we can use it as 
> the basis of an algorithm that can describe what surgery is required on 
> the incoming data to make it acceptable. And it's determinisitic - 
> that's good. 
>  
> Tony
> 
>     -----Original Message-----
>     *From:* www-ws-desc-request@w3.org on behalf of Roberto Chinnici
>     *Sent:* Thu 21-Jul-05 11:19
>     *To:* Rogers, Tony
>     *Cc:* David Orchard; www-ws-desc@w3.org
>     *Subject:* Re: LC124: Comment on V2S and [validity]=notKnown
> 
> 
>     Rogers, Tony wrote:
>      > One of the "interesting" aspects of the problem is that we must
>     solve is
>      > how we decide on the interpretation of ambiguous results.
>      > 
>      > For example, it will be legal to take your example:
>      > 
>      >
>      > <type name="shipto">
>      >
>      > <sequence>
>      >
>      > <element ref="ad:name" minOccurs="1" maxOccurs="unbounded"/>
>      >
>      > <element ref="nad:country" minOccurs="0"/>
>      >
>      > </sequence>
>      >
>      > </type>
>      >
>      > 
>      >
>      > (yes, I meant to change that to minOccurs)
>      >
>      > 
>      >
>      > and feed it data like:
>      >
>      > 
>      >
>      > <shipto>
>      >
>      > <ad:name>fred</ad:name>
>      >
>      > <nad:country>Australia</nad:country>
>      >
>      > <ad:name>bill</ad:name>
>      >
>      > </shipto>
>      >
>      > 
>      >
>      > which can legitimately be interpreted (after ignorance has been
>     applied) as:
>      >
>      > 
>      >
>      > <shipto>
>      >
>      > <ad:name>fred</ad:name>
>      >
>      > <ad:name>bill</ad:name>
>      >
>      > </shipto>
>      >
>      > 
>      >
>      > OR
>      >
>      > 
>      >
>      > <shipto>
>      >
>      > <ad:name>fred</ad:name>
>      >
>      > <nad:country>Australia</nad:country>
>      >
>      > </shipto>
>      >
>      > 
>      >
>      > The latter is my expected interpretation (and may well be the
>     easier to
>      > program), but the former is legitimate (it takes the approach of
>      > grabbing as many ad:name elements as it can, and it still
>     satisfies the
>      > schema).
>      >
>      > 
>      >
>      > What do other people think?
> 
>     I tend to go with the first interpretation.
> 
>     Here's how I'd define the "ignore unexpected" rule. This definition is
>     not phrased directly in terms of XML Schema, and I don't claim that it
>     would be trivial to do so, quite the contrary. Nevertheless, it seems
>     compatible with it; if anybody thinks otherwise, please point out where
>     I'm wrong.
> 
>     That scourge of all schema authors, the UPA rule, was introduced to
>     make sure the schema was determistic. I assume then that at any
>     given stage during the parsing of the contents of an element, the
>     set of start tages that can legally be encountered is determined
>     and each tag in that set is associated with exactly one transition
>     to a new state. (I believe we can safely ignore character content
>     for the purposes of our discussion.)
> 
>     Note that the set above, or better the set of names of all start tags
>     that can be encountered at any given state, may be infinite due to
>     the presence of a wildcard. This doesn't cause any problems -- all
>     we need is that the characteristic function of this set be computable.
>     Off the top of my head, I don't think that substitution groups would
>     be an issue either, they just make the construction of the set more
>     complex, nor would xsi:schemaLocation.
> 
>     Now, the "ignore unexpected" rule is defined as saying that if at
>     a given state the processor encounter a start tag for an element
>     whose name is not in the set of expected start tags for that state,
>     the element is discarded. Subsequently, the processor keeps
>     operating in the same state it was into (where would it transition
>     to otherwise?), as if the discarded element had never been there.
> 
>     Surely there are a few more tweaks that we need to do, like requiring
>     for some special treatment for the root element of a document and
>     dealing with attributes, but I hope that the definition I proposed
>     is clear enough.
> 
>     If we apply it to the example then, we obtain that
> 
>     <shipto>
>        <ad:name>fred</ad:name>
>        <nad:country>Australia</nad:country>
>        <ad:name>bill</ad:name>
>     </shipto>
> 
>     will be treated as
> 
>     <shipto>
>        <ad:name>fred</ad:name>
>        <ad:name>bill</ad:name>
>     </shipto>
> 
>     Let's look at a slightly more interesting example.
>     Assume the following schema: (note the maxOccurs="2")
> 
>     <type name="shipto">
>        <sequence>
>          <element ref="ad:name" minOccurs="1" maxOccurs="2"/>
>          <element ref="nad:country" minOccurs="0"/>
>        </sequence>
>     </type>
> 
>     Then this document:
> 
>     <shipto>
>        <ad:name>fred</ad:name>
>        <nad:country>Australia</nad:country>
>        <ad:name>bill</ad:name>
>        <nad:country>New Zealand</nad:country>
>        <ad:name>jim</ad:name>
>     </shipto>
> 
>     will be treated as:
> 
>     <shipto>
>        <ad:name>fred</ad:name>
>        <ad:name>bill</ad:name>
>     </shipto>
> 
>     Thanks,
>     Roberto
Received on Thursday, 21 July 2005 13:10:08 UTC