Re: LC124: Comment on V2S and [validity]=notKnown from Roberto Chinnici on 2005-07-21 (www-ws-desc@w3.org from July 2005)

From: Roberto Chinnici <Roberto.Chinnici@Sun.COM>
Date: Wed, 20 Jul 2005 21:19:56 -0400
To: "Rogers, Tony" <Tony.Rogers@ca.com>
Cc: David Orchard <dorchard@bea.com>, www-ws-desc@w3.org
Message-id: <42DEF83C.6060407@sun.com>
Rogers, Tony wrote:
> One of the "interesting" aspects of the problem is that we must solve is 
> how we decide on the interpretation of ambiguous results.
>  
> For example, it will be legal to take your example:
>  
> 
> <type name="shipto">
> 
> <sequence>
> 
> <element ref="ad:name" minOccurs="1" maxOccurs="unbounded"/>
> 
> <element ref="nad:country" minOccurs="0"/>
> 
> </sequence>
> 
> </type>
> 
>  
> 
> (yes, I meant to change that to minOccurs)
> 
>  
> 
> and feed it data like:
> 
>  
> 
> <shipto>
> 
> <ad:name>fred</ad:name>
> 
> <nad:country>Australia</nad:country>
> 
> <ad:name>bill</ad:name>
> 
> </shipto>
> 
>  
> 
> which can legitimately be interpreted (after ignorance has been applied) as:
> 
>  
> 
> <shipto>
> 
> <ad:name>fred</ad:name>
> 
> <ad:name>bill</ad:name>
> 
> </shipto>
> 
>  
> 
> OR
> 
>  
> 
> <shipto>
> 
> <ad:name>fred</ad:name>
> 
> <nad:country>Australia</nad:country>
> 
> </shipto>
> 
>  
> 
> The latter is my expected interpretation (and may well be the easier to 
> program), but the former is legitimate (it takes the approach of 
> grabbing as many ad:name elements as it can, and it still satisfies the 
> schema).
> 
>  
> 
> What do other people think?

I tend to go with the first interpretation.

Here's how I'd define the "ignore unexpected" rule. This definition is
not phrased directly in terms of XML Schema, and I don't claim that it
would be trivial to do so, quite the contrary. Nevertheless, it seems
compatible with it; if anybody thinks otherwise, please point out where
I'm wrong.

That scourge of all schema authors, the UPA rule, was introduced to
make sure the schema was determistic. I assume then that at any
given stage during the parsing of the contents of an element, the
set of start tages that can legally be encountered is determined
and each tag in that set is associated with exactly one transition
to a new state. (I believe we can safely ignore character content
for the purposes of our discussion.)

Note that the set above, or better the set of names of all start tags
that can be encountered at any given state, may be infinite due to
the presence of a wildcard. This doesn't cause any problems -- all
we need is that the characteristic function of this set be computable.
Off the top of my head, I don't think that substitution groups would
be an issue either, they just make the construction of the set more
complex, nor would xsi:schemaLocation.

Now, the "ignore unexpected" rule is defined as saying that if at
a given state the processor encounter a start tag for an element
whose name is not in the set of expected start tags for that state,
the element is discarded. Subsequently, the processor keeps
operating in the same state it was into (where would it transition
to otherwise?), as if the discarded element had never been there.

Surely there are a few more tweaks that we need to do, like requiring
for some special treatment for the root element of a document and
dealing with attributes, but I hope that the definition I proposed
is clear enough.

If we apply it to the example then, we obtain that

<shipto>
   <ad:name>fred</ad:name>
   <nad:country>Australia</nad:country>
   <ad:name>bill</ad:name>
</shipto>

will be treated as

<shipto>
   <ad:name>fred</ad:name>
   <ad:name>bill</ad:name>
</shipto>

Let's look at a slightly more interesting example.
Assume the following schema: (note the maxOccurs="2")

<type name="shipto">
   <sequence>
     <element ref="ad:name" minOccurs="1" maxOccurs="2"/>
     <element ref="nad:country" minOccurs="0"/>
   </sequence>
</type>

Then this document:

<shipto>
   <ad:name>fred</ad:name>
   <nad:country>Australia</nad:country>
   <ad:name>bill</ad:name>
   <nad:country>New Zealand</nad:country>
   <ad:name>jim</ad:name>
</shipto>

will be treated as:

<shipto>
   <ad:name>fred</ad:name>
   <ad:name>bill</ad:name>
</shipto>

Thanks,
Roberto
Received on Thursday, 21 July 2005 01:20:49 UTC