Re: Circular types in XML Schemas from Jeni Tennison on 2002-05-04 (xmlschema-dev@w3.org from May 2002)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Sat, 4 May 2002 12:38:19 +0100
To: Jerome Louvel <jerome_louvel@yahoo.fr>
CC: xmlschema-dev@w3.org
Message-ID: <751046846605.20020504123819@jenitennison.com>
Hi Jerome,

> After reading this thread, I still don't see *why* the specification
> would prevent circular model groups only when reused as a "direct
> model group particule" (at any depth), but not when such a group is
> "indirectly reused" through an element declaration.

When a validator reads a schema document, it creates an abstract
infoset representation of that schema. This infoset contains things
like "element declarations" and "complex type definitions" rather than
"xs:element elements" and "xs:complexType elements".

Taking your examples:

> For example, if the model group definition below is invalid (note
> the minOccurs = 0) according to the "Model Group Correct" constraint
> preventing circular groups:
>
> <xs:group name="error">
>    <xs:sequence>
>      ...
>      <xs:group ref="ms:error" minOccurs="0"/>
>    </xs:sequence>
> </xs:group>

This xs:group definition is translated into a "model group definition"
in the schema infoset. The model group has a term, which is a sequence
particle. The sequence particle then contains some particles of its
own, one of which is a reference to that same model group definition.
De-referencing references like this is part of building up the schema
infoset. When the schema validator comes across the reference to the
model group, it dereferences it and substitutes the term of the model
group definition in its place. So it's just as if you'd specified:

<xs:group name="error">
  <xs:sequence>
    ...
    <xs:sequence minOccurs="0">
      ...
      <xs:group ref="ms:error" minOccurs="0" />
    </xs:sequence>
  </xs:sequence>
</xs:group>

The schema validator continues to dereference references to model
group definitions until it has a content model that doesn't contain
any, so the next step would be:

<xs:group name="error">
  <xs:sequence>
    ...
    <xs:sequence minOccurs="0">
      ...
      <xs:sequence minOccurs="0">
        ...
        <xs:group ref="ms:error" minOccurs="0" />
      </xs:sequence>
    </xs:sequence>
  </xs:sequence>
</xs:group>

and so on. Plainly this is a process that can never end if you have a
model group definition that contains a particle that references that
same model group definition. So this isn't allowed.

Looking at your other example:

> Then, I don't see why the other one below would be a valid one.
>
> <xs:group name="error">
>    <xs:sequence>
>      ...
>      <xs:element name="error">
>        <xs:complexType>
>          <xs:group ref="ms:error" />
>        </xs:complexType>
>      </xs:element>
>    </xs:sequence>
> </xs:group>

This model group definition again has a sequence term, but this time
the particle that it contains is an element particle, which has its
own complex type definition, which references the group. You can't do
this in the XML representation of XML Schema properly (because you
have a local element declaration and an anonymous type definition),
but the above is basically the same as (made-up naming syntax here):

<xs:group name="error">
  <xs:sequence>
    ...
    <xs:element ref="ms:error/error" />
  </xs:sequence>
</xs:group>

<xs:element name="error/error" type="ms:error/error/*" />

<xs:complexType name="error/error/*">
  <xs:group ref="ms:error" />
</xs:complexType>

Again the reference to the model group definition is resolved during
the processing of the schema, so the complex type definition is
actually:

<xs:complexType name="error/error/*">
  <xs:sequence>
    ...
    <xs:element ref="ms:error/error" />
  </xs:sequence>
</xs:complexType>

As you can see, this is perfectly fine -- references to element
declarations or complex type definitions *aren't* resolved during the
processing of a schema, because they're primary components. That gives
you the firebreak that you need -- it stops you from getting into
problems recursively resolving these references.

Also logically, if you think about it, it's necessary to allow these
kinds of constructions so that you can have nested elements to any
depth. For example in a document-oriented markup language, you might
want to have sections within sections within sections and so on, so
this kind of recursion has to be allowed. Recursion within model
groups, on the other hand, isn't required, because all that it could
achieve you can achieve through repeating structures. The model group
definition:

<xs:group name="error">
  <xs:sequence>
    ...
    <xs:group ref="ms:error" minOccurs="0"/>
  </xs:sequence>
</xs:group>

is, in effect:

<xs:group name="error">
  <xs:sequence maxOccurs="unbounded">
    ...
  </xs:sequence>
</xs:group>

Does that make sense?

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/
Received on Saturday, 4 May 2002 07:38:22 UTC