W3C home > Mailing lists > Public > xmlschema-dev@w3.org > October 2006

Re: extension adds element removed by restriction (3.4.6/1.5)

From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Date: Tue, 24 Oct 2006 10:58:46 -0600
Message-Id: <5F5D1E62-03E7-4E8A-939F-7A3BCCDE798D@acm.org>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, "Moog, Thomas H" <thomas.h.moog@intel.com>, "xmlschema-dev@w3.org" <xmlschema-dev@w3.org>
To: Stan Kitsis <skits@microsoft.com>

On 23 Oct 2006, at 23:24 , Stan Kitsis wrote:

>> Now, if you changed your example, so that in alpha the 'b'
>> element was defined as having type xsd:gYear, for example,
>> and then brought back in gamma with the type xsd:anyURI,
>> then in principle conforming processors should reject it.
> Can you explain why?  The following schema seems valid to me and  
> the .NET 2.0 processor agrees with me.

Certainly.  Two levels of explanation may be useful.  First, at
the level of the words in the spec which formulate the rule,
and second at the level of the design rationale.

Clause 1.5 of Schema Component Constraint: Derivation
Valid (Extension) says that it must in principle be

     possible to derive the complex type definition in two
     steps, the first an extension and the second a restriction
     (possibly vacuous), from that type definition among its
     ancestors whose {base type definition} is the
     ur-type definition.

> <xs:complexType name="alpha">
>   <xs:sequence>
>     <xs:element name="a" />
>     <xs:element name="b" type="xs:gYear" minOccurs="0" />
>   </xs:sequence>
> </xs:complexType>

The type alpha is "that type definition ... whose {base type
definition} is the ur-type definition.

The point of the clause is to ensure the truth of some assumptions
one might plausibly want to make about instances of type alpha
and of any type descended from it.  Prominent among them
would be propositions like:

   1 There will always be a child named 'a'.
   2 When there is a child named 'a', its type will be xsd:anyType,
     or something derived from it.
   3 There may or may not be a child named 'b'.
   4 When there is a child named 'b', its type will be xsd:gYear,
     or something derived from it.

Other invariants are in fact guaranteed, but these will do to
go on with.

> <xs:complexType name="beta" >
>   <xs:complexContent>
>     <xs:restriction base="alpha" >
>       <xs:sequence>
>         <xs:element name="a" />
>       </xs:sequence>
>     </xs:restriction>
>   </xs:complexContent>
> </xs:complexType>
> <xs:complexType name="gamma" >
>   <xs:complexContent>
>     <xs:extension base="beta" >
>       <xs:sequence>
>         <xs:element name="b" type="xs:anyURI"/>
>       </xs:sequence>
>     </xs:extension>
>   </xs:complexContent>
> </xs:complexType>

Clause 1.5 requires that the effective content model of gamma be
expressible as the result of (1) extending alpha, possibly vacuously,
and then (2) restricting that extension, possibly vacuously.  If we
can construct such a two-step derivation that goes from alpha to
gamma, then you are right that gamma should be legal.  I believe
we cannot.

My reasoning may be clearer if we consider an example.  Consider
this possible derivation.  The type delta extends alpha by adding
a 'b' element with a type of xs:anyURI:

<xs:complexType name="delta" >
    <xs:extension base="alpha" >
      <xs:element name="b" type="xs:anyURI"/>

The effective content model of delta is thus:

    <xs:element name="a" />
    <xs:element name="b" type="xs:gYear" minOccurs="0" />
    <xs:element name="b" type="xs:anyURI"/>

If delta is legal, then we could derive a type epsilon from it
by restriction, with an effective content model that is the
same as that of gamma:

<xs:complexType name="epsilon" >
    <xs:restriction base="delta" >
      <xs:element name="a" />
      <xs:element name="b" type="xs:anyURI"/>

The problem with this derivation is that type delta is not a legal
type:  it has two element declarations which map the same
expanded name ('b') to different types (xs:gYear and xs:anyURI),
which violates the Element Declarations Consistent constraint.
I think it's clear that any attempt to derive gamma from alpha
by means of first an extension step and then a restriction
step must fail.  The intermediate type must contain a 'b' element
with type xs:anyURI, in order for gamma to get it from there.
It must also contain a 'b' element with type xs:gYear, since
alpha has one, and the intermediate type is an extension of
alpha, and extensions cannot get rid of things in their base type.
That means that the intermediate type must have two 'b'
elements with different types, and thus that the intermediate
type must violate the Element Declarations Consistent rule.

Clause 1.5 turns out to make it legal (as the original example
from Thomas Moog shows) to take an element or attribute away,
and then put it back.  When we drafted the Note that said
nothing taken away can be put back, we failed to foresee that
as a possibility, so the Note implies that it's not possible.  When
the WG came to consider the discrepancy between the actual
rule in clause 1.5 and the characterization in the Note, we
concluded that the point of the rule is to preserve propositions
like those numbered 1-4 earlier in this email.  The rule
implicit in the Note would have implied further than in the
derivation graph showing all the descendants of alpha, those
types which possess a  'b' must be a connected subgraph.
That didn't seem a particularly important or useful property,
so the WG elected to change the non-normative Note rather
than the normative rule.

I hope I've answered your question about why this example
should not be accepted by conforming schema processors,
both at the level of "where does it say that in the spec?" and
at the level of "why should it say that in the spec?".

--C. M. Sperberg-McQueen
   World Wide Web Consortium / MIT CSAIL
Received on Tuesday, 24 October 2006 16:59:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 00:14:55 GMT