Re: UPA — (Why) Is There a Difference Between Those Two? from Michael Kay on 2011-06-24 (xmlschema-dev@w3.org from June 2011)

From: Michael Kay <mike@saxonica.com>
Date: Fri, 24 Jun 2011 08:42:39 +0100
To: Denis Zawada <deno@deno.pl>
CC: "xmlschema-dev@w3.org" <xmlschema-dev@w3.org>
Message-ID: <4E043FEF.4050206@saxonica.com>

Yes, your analysis is correct. UPA does not require that the grammar is 
unambiguous (that is, that it produces a unique parse tree). It only 
requires that all the possible parse trees deliver the same association 
("attribution") of elements in the instance document to particles in the 
schema. The difference between your two schemas is that the first one 
only has one "bar" element particle, whereas the second has two.

Now, we could have endless debates about whether this is a good rule.

Until recently my own schema processor, Saxon, only enforced the rule if 
the two candidate particles referred to different element declarations 
(which yours do, at least under Saxon's definition of "different"; no 
error would be reported if the two particles used ref="bar" rather than 
name="bar"). I've changed that fairly recently, introducing a lot of 
added complexity in the process, for no reason other than strict 
conformance to a rule in the spec that exists for very questionable reasons.

Michael Kay
Saxonica

On 23/06/2011 22:25, Denis Zawada wrote:
> Hello,
>
> I wanted to make sure that I understand this rule correctly.
>
> Foo defined in the following way will result in error in both
> MSXML and libxml2:
>
>    <xs:element name="non-valid-foo">
>      <xs:complexType>
>        <xs:sequence>
>          <xs:sequence>
>            <xs:element name="bar" minOccurs="2" maxOccurs="5"/>
>            <xs:element name="xyz" minOccurs="0"/>
>          </xs:sequence>
>          <xs:sequence>
>   <!-- bar is ambiguous -->
>            <xs:element name="bar" minOccurs="2" maxOccurs="5"/>
>            <xs:element name="xyz" minOccurs="0"/>
>          </xs:sequence>
>        </xs:sequence>
>      </xs:complexType>
>    </xs:element>
>
> However both parsers have no problem with foo defined in a following way:
>
>    <xs:element name="valid-foo">
>      <xs:complexType>
>        <xs:sequence minOccurs="2" maxOccurs="2">
>          <xs:element name="bar" minOccurs="2" maxOccurs="5"/>
>          <xs:element name="xyz" minOccurs="0"/>
>        </xs:sequence>
>      </xs:complexType>
>    </xs:element>
>
> I understand that XML Schema only mentions that *two* adjacent particles can
> overlap:
>
>> A content model will violate the unique attribution constraint if it
> contains *two* particles which ·overlap· and which either (…)
>
> On the other hand, I could easily imagine that it would be possible to convert
> the 1st form into the 2nd one in a preprocessing step during compilation of
> schema.
>
> Is my understanding of this principle correct? I.e. if particles are implicit
> is certain ambiguity allowed? Why first example validates differently than the
> 2nd one?
>
> Sincerely,
> Denis Zawada
>
>

Received on Friday, 24 June 2011 07:43:15 UTC