RE: Intepretation of choice compositor and occurence into xml sch ema

Hess Yvan asks:

>> The usage of choice occurence combined with element
>> seems to be quite complex. Where can I find a good 
>> documentation about its usage ?

Well, the authoritative description is in the  XML schema recommendation 
at [1].  This is a very technical explanation, but it's the final word. 
There are also some good books on schema. 

That said, the official rules are not  that complicated once you know how 
to read them.  The general rule for repeated elements is the obvious one:

<element name="e" minOccurs="3" maxOccurs="5"/>

means 3, 4 or 5 elements named "e".  Now, what about something like:

        <sequence minOccurs="1" maxOccurs="2">
                <element name="a" />
                <element name="b"  minOccurs="0" maxOccurs="1"/>
        </sequence>

This matches (I'm not spelling out all the <...>): {a}. {a,b}, {a,b,a}, 
{a,b,a,b}, {a,a}, {a,a,b}  (I think, I may have missed one).

Tthe question is, why does this work this way?  The answer is in the 
recommendation at [1] where it says:

3 If the {term} is a model group, then all of the following must be true:
3.1 There is a ·partition· of the sequence into n sub-sequences such that 
n is greater than or equal to {min occurs}.
3.2 If {max occurs} is a number, n must be less than or equal to {max 
occurs}.
3.3 Each sub-sequence in the ·partition· is ·valid· with respect to that 
model group as defined in Element Sequence Valid (§3.8.4).

That probably won't make much sense, but what it means is:

a) take your instance, for example {a, a, b}
b) Since minOccurs=1, maxOccurs=2, try to divide it into either one or two 
subsequences.  If there's one subsequence, it must match the original 
sequence, but leaving off the outer repeat:
        <sequence>
                <element name="a" />
                <element name="b"  minOccurs="0" maxOccurs="1"/>
        </sequence>
If there are two then each must match the same sequnce
        <sequence>
                <element name="a" />
                <element name="b"  minOccurs="0" maxOccurs="1"/>
        </sequence>

Let's try it with {a,a,b}.  Since minOccurs="1", we can try a trivial 
parition into a single sequence {[a,a,b]}.  Does that match as one 
sequence? No.
maxOccurs = "2", so we can also try partitions into two sequences.

How about breaking it into {[a,a],[b]}.  That doesn't work because [a,a] 
doesn't match 
        <sequence>
                <element name="a" />
                <element name="b"  minOccurs="0" maxOccurs="1"/>
        </sequence>
(and for that matter [b] doesn't either.)

How about {[a], [a,b]} ?  [a] matches, because the b is optional.  [a,b] 
matches too, so the overall content is valid.

If you try that with {a,a,a,b,a,b,b}  you'll find there's no such 
partition, hence it's invalid overall .

I illustrated this with a sequence, but the same holds for a choice, 
except that instead of looking for the partitions to match little 
sequences, they must match either one or the other of the inner pieces. 
Thus: 

        <choice minOccurs="1" maxOccurs="2">
                <element name="a" />
                <element name="b"  minOccurs="0" maxOccurs="1"/>
        </choice>

Will accept:

{a}, {b}, {a, a}, {a, b}, {b, a}, {b, b}

take {b, a}.  Consider the partition {[b], [a]}.  The [b] matches:

        <choice>
                <element name="a" />
                <element name="b"  minOccurs="0" maxOccurs="1"/>
        </choice>

and so does the [a].  You can generalize this reasoning to all the cases. 
In general, the trick is to find a partition that each piece of which 
works against the choice or sequence >without the repeat count<.  If you 
can do that, it's valid.  If not, not.

It sounds a bit complicated, but once you look at it you'll realize that 
it's simple and sensible.  Well, in my opinion anyway.  I hope this helps.

Noah

[1] http://www.w3.org/TR/xmlschema-1/#section-Particle-Validation-Rules

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Thursday, 5 February 2004 14:52:57 UTC