- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 23 Jun 2011 15:28:12 -0700
- To: Denis Zawada <deno@deno.pl>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, xmlschema-dev@w3.org
On Jun 23, 2011, at 2:25 PM, Denis Zawada wrote: > Hello, > > I wanted to make sure that I understand this rule correctly. > > Foo defined in the following way will result in error in both > MSXML and libxml2: > > <xs:element name="non-valid-foo"> > <xs:complexType> > <xs:sequence> > <xs:sequence> > <xs:element name="bar" minOccurs="2" maxOccurs="5"/> > <xs:element name="xyz" minOccurs="0"/> > </xs:sequence> > <xs:sequence> > <!-- bar is ambiguous --> > <xs:element name="bar" minOccurs="2" maxOccurs="5"/> > <xs:element name="xyz" minOccurs="0"/> > </xs:sequence> > </xs:sequence> > </xs:complexType> > </xs:element> > > However both parsers have no problem with foo defined in a following way: > > <xs:element name="valid-foo"> > <xs:complexType> > <xs:sequence minOccurs="2" maxOccurs="2"> > <xs:element name="bar" minOccurs="2" maxOccurs="5"/> > <xs:element name="xyz" minOccurs="0"/> > </xs:sequence> > </xs:complexType> > </xs:element> Both processors are correctly enforcing the UPA constraint. > > I understand that XML Schema only mentions that *two* adjacent particles can > overlap: > >> A content model will violate the unique attribution constraint if it > contains *two* particles which ·overlap· and which either (…) I'm not sure I understand what you're saying here. (Specifically, I don't understand the implications of your word "only", and I note that the passage you quote from the spec says nothing about adjacency of the particles.) In the case of the type of non-valid-foo, the two particles which compete are the two particles which define the two element types named 'bar'. > > On the other hand, I could easily imagine that it would be possible to convert > the 1st form into the 2nd one in a preprocessing step during compilation of > schema. Agreed. It's clearly possible to have tools that will translate some content models which violate the UPA constraint into models that do not violate it. It is known from work by Anne Brüggemann-Klein and Derick Wood, however, that not all regular languages on elements have content models which obey the UPA. I have the impression that tools for translating content models into UPA-compliant content models are not widely available, perhaps because it's impossible to guarantee that they will always succeed. > > Is my understanding of this principle correct? I.e. if particles are implicit > is certain ambiguity allowed? Why first example validates differently than the > 2nd one? I'm not sure what you mean by 'implicit' here. The first example violates the UPA constraint because it contains two element particles (represented in the XML by the first and third xs:element elements) which compete with each other. The second example does not violate the UPA constraint because it does not contain two particles which compete. (It contains no wildcard particles, and it contains only two element particles; they match disjoint sets of elements*, and do not compete.) * They match disjoint sets of elements, that is, unless their substitution groups match overlapping sets of names. This is true despite the fact that the third 'bar' element in a sequence exposes a non-determinism in the content model: it could increment either the inner or the outer counter. Brüggemann-Klein and Wood follow earlier work in distinguishing 'weak' determinism and 'strong' determinism (or 1-non-ambiguity). In SGML and in XML DTDs, there is no practical difference between the two (although SGML explicitly specifies that it is the inner counter which is incremented, not the outer counter); in XSD the difference becomes relevant with the introduction of integer-valued occurrence indicators, but I don't believe there is any record of the Working Group making a conscious choice between enforcing weak determinism and strong determinism, or even being aware of the difference. I lean toward the belief that we fell into the choice of weak determinism because of an accident of the language used to express the constraint; when the WG became aware of the issue, my recollection is that there was consensus (or something very close to consensus) on the view that it would have been better to require strong determinism in content models, or to prescribe a rule analogous to SGML's rule to force the inner counters to be incremented first. But a significant part of the WG felt that it was too late to fix the error, because any change would create backward compatibility issues. So the WG as a whole did not have consensus in favor of any change. So at one level the answer to the question "Why is there a difference between the two?" is "because the rules require a deterministic choice of particle, not a fully deterministic automaton". At another level, the answer is "because the WG failed to do its homework fully and did not recognize that it was faced with a design choice in this area". I hope this helps. -- **************************************************************** * C. M. Sperberg-McQueen, Black Mesa Technologies LLC * http://www.blackmesatech.com * http://cmsmcq.com/mib * http://balisage.net ****************************************************************
Received on Thursday, 23 June 2011 22:28:47 UTC