RE: LC124: Comment on V2S and [validity]=notKnown from Rogers, Tony on 2005-07-21 (www-ws-desc@w3.org from July 2005)

From: Rogers, Tony <Tony.Rogers@ca.com>
Date: Thu, 21 Jul 2005 22:08:16 +1000
To: "Roberto Chinnici" <Roberto.Chinnici@Sun.COM>
Cc: "David Orchard" <dorchard@bea.com>, <www-ws-desc@w3.org>
Message-ID: <7997F38251504E43B38435DAF917887F40C5FD@ausyms23.ca.com>
I don't think that follows. I do like the way you phrase the argument, but I come to a different result :-)
 
The processor accepts the first name. At that point it is looking either for another name, or for a country, or for the end of the shipto. The next tag is a country. So it accepts the country, and makes a transition. Then it is looking for another country, or the end of the shipto. It does not return to a state in which it will accept another name, so it will ignore any more names. 
 
Additionally, I think it should treat the appearance of a name after a country as a transition to a new state - one in which it is simply looking for the end of the shipto. This is because a list of elements cannot be interrupted by another element (I wasn't thinking clearly last night :-) ). So I contend that the processor should accept exactly one name and one country. Note that the new schema might accept a list of alternating names and countries (that is describable, isn't it?), but our schema does not.
 
We must keep in mind that the processor is being expected to handle data that conforms to a larger, but compatible, schema - we can expect that that larger schema still abides by the rules of schema.
 
As I said, I do like the way you phrased this. I think we can use it as the basis of an algorithm that can describe what surgery is required on the incoming data to make it acceptable. And it's determinisitic - that's good. 
 
Tony

	-----Original Message----- 
	From: www-ws-desc-request@w3.org on behalf of Roberto Chinnici 
	Sent: Thu 21-Jul-05 11:19 
	To: Rogers, Tony 
	Cc: David Orchard; www-ws-desc@w3.org 
	Subject: Re: LC124: Comment on V2S and [validity]=notKnown
	
	


	Rogers, Tony wrote:
	> One of the "interesting" aspects of the problem is that we must solve is
	> how we decide on the interpretation of ambiguous results.
	> 
	> For example, it will be legal to take your example:
	> 
	>
	> <type name="shipto">
	>
	> <sequence>
	>
	> <element ref="ad:name" minOccurs="1" maxOccurs="unbounded"/>
	>
	> <element ref="nad:country" minOccurs="0"/>
	>
	> </sequence>
	>
	> </type>
	>
	> 
	>
	> (yes, I meant to change that to minOccurs)
	>
	> 
	>
	> and feed it data like:
	>
	> 
	>
	> <shipto>
	>
	> <ad:name>fred</ad:name>
	>
	> <nad:country>Australia</nad:country>
	>
	> <ad:name>bill</ad:name>
	>
	> </shipto>
	>
	> 
	>
	> which can legitimately be interpreted (after ignorance has been applied) as:
	>
	> 
	>
	> <shipto>
	>
	> <ad:name>fred</ad:name>
	>
	> <ad:name>bill</ad:name>
	>
	> </shipto>
	>
	> 
	>
	> OR
	>
	> 
	>
	> <shipto>
	>
	> <ad:name>fred</ad:name>
	>
	> <nad:country>Australia</nad:country>
	>
	> </shipto>
	>
	> 
	>
	> The latter is my expected interpretation (and may well be the easier to
	> program), but the former is legitimate (it takes the approach of
	> grabbing as many ad:name elements as it can, and it still satisfies the
	> schema).
	>
	> 
	>
	> What do other people think?
	
	I tend to go with the first interpretation.
	
	Here's how I'd define the "ignore unexpected" rule. This definition is
	not phrased directly in terms of XML Schema, and I don't claim that it
	would be trivial to do so, quite the contrary. Nevertheless, it seems
	compatible with it; if anybody thinks otherwise, please point out where
	I'm wrong.
	
	That scourge of all schema authors, the UPA rule, was introduced to
	make sure the schema was determistic. I assume then that at any
	given stage during the parsing of the contents of an element, the
	set of start tages that can legally be encountered is determined
	and each tag in that set is associated with exactly one transition
	to a new state. (I believe we can safely ignore character content
	for the purposes of our discussion.)
	
	Note that the set above, or better the set of names of all start tags
	that can be encountered at any given state, may be infinite due to
	the presence of a wildcard. This doesn't cause any problems -- all
	we need is that the characteristic function of this set be computable.
	Off the top of my head, I don't think that substitution groups would
	be an issue either, they just make the construction of the set more
	complex, nor would xsi:schemaLocation.
	
	Now, the "ignore unexpected" rule is defined as saying that if at
	a given state the processor encounter a start tag for an element
	whose name is not in the set of expected start tags for that state,
	the element is discarded. Subsequently, the processor keeps
	operating in the same state it was into (where would it transition
	to otherwise?), as if the discarded element had never been there.
	
	Surely there are a few more tweaks that we need to do, like requiring
	for some special treatment for the root element of a document and
	dealing with attributes, but I hope that the definition I proposed
	is clear enough.
	
	If we apply it to the example then, we obtain that
	
	<shipto>
	   <ad:name>fred</ad:name>
	   <nad:country>Australia</nad:country>
	   <ad:name>bill</ad:name>
	</shipto>
	
	will be treated as
	
	<shipto>
	   <ad:name>fred</ad:name>
	   <ad:name>bill</ad:name>
	</shipto>
	
	Let's look at a slightly more interesting example.
	Assume the following schema: (note the maxOccurs="2")
	
	<type name="shipto">
	   <sequence>
	     <element ref="ad:name" minOccurs="1" maxOccurs="2"/>
	     <element ref="nad:country" minOccurs="0"/>
	   </sequence>
	</type>
	
	Then this document:
	
	<shipto>
	   <ad:name>fred</ad:name>
	   <nad:country>Australia</nad:country>
	   <ad:name>bill</ad:name>
	   <nad:country>New Zealand</nad:country>
	   <ad:name>jim</ad:name>
	</shipto>
	
	will be treated as:
	
	<shipto>
	   <ad:name>fred</ad:name>
	   <ad:name>bill</ad:name>
	</shipto>
	
	Thanks,
	Roberto
Received on Thursday, 21 July 2005 12:08:22 UTC