clarification on Unique Particle Attribution from sandygao@ca.ibm.com on 2001-10-24 (www-xml-schema-comments@w3.org from October to December 2001)

From: <sandygao@ca.ibm.com>
Date: Wed, 24 Oct 2001 10:45:58 -0400
To: www-xml-schema-comments@w3.org
Message-ID: <OFEC9FEC66.757FB86C-ON85256AEF.0046A52F@torolab.ibm.com>
Hi all,

From constraint "Unique Particle Attribution" [1] and appendix H [2], I
couldn't figure out whether the following content model (<choice>) violates
UPA constraint:

<group name="grp">
  <sequence>
    <element name="e"/>
  </sequence>
</group>

<choice>
  <group ref="grp"/>
  <group ref="grp" maxOccurs="3"/>
</choice>

In [1], it mentions "particle ... can be uniquely determined". But it's not
clear to me whether such "particle" refers only to "non-group particle" or
generic "particle". In the above example, there are two (different) group
particles, but they refer to the same non-group particle. When we see an
element "e" in the instance, we can uniquely determine the non-group
particle for validation, but not a unique group particle.

And in [2], it only talks about non-group particle.

My guess is that "particle" means generic particle, not non-group particle.
The reason is that UPA should make the above example invalid, otherwise we
won't be able to know whether the content model allows one "e" or 3 "e"s.

If this is correct, some modifications would be necessary.

In [1], "the particle ... can be uniquely determined ..." should be "the
path from the root of the content model to a non-group particle ... can be
uniquely determined ...".

In [2], the definition of "overlapped particle" is ok, but the description
of how a content model violates UPA is not correct: it only talks about
non-group particles, hence is misleading.

And in the discussion of the automaton (also in [2]), we may want to change
"QName+position" to "QName+a list of positions of the particles in the path
from the root of the content model to the non-group particle".

Could someone clarify? Thanks.


[1] http://www.w3.org/TR/xmlschema-1/#cos-nonambig

Schema Component Constraint: Unique Particle Attribution

A content model must be formed such that during ·validation· of an element
information item sequence, the particle contained directly, indirectly or
·implicitly· therein with which to attempt to ·validate· each item in the
sequence in turn can be uniquely determined without examining the content
or attributes of that item, and without any information about the items in
the remainder of the sequence.

[2] http://www.w3.org/TR/xmlschema-1/#non-ambig

H Analysis of the Unique Particle Attribution Constraint (non-normative)

[Definition:]  Two non-group particles overlap if ...

A precise formulation of this constraint can also be offered in terms of
operations on finite-state automaton: transcribe the content model into an
automaton in the usual way using epsilon transitions for optionality and
unbounded maxOccurs, unfolding other numeric occurrence ranges and treating
the heads of substitution groups as if they were choices over all elements
in the group, but using not element QNames as transition labels, but rather
pairs of element QNames and positions in the model. Determinize this
automaton, treating wildcard transitions as opaque. Now replace all
QName+position transition labels with the element QNames alone. If the
result has any states with two or more identical-QName-labeled transitions
from it, or a QName-labeled transition and a wildcard transition which
subsumes it, or two wildcard transitions whose intentional intersection is
non-empty, the model does not satisfy the Unique Attribution constraint.

Sandy Gao
Software Developer, IBM Canada
(1-905) 413-3255
sandygao@ca.ibm.com
Received on Wednesday, 24 October 2001 10:47:08 UTC