Redefinition : validity of the redefined schema

     Hi,

     XML Schema Part 0: Structures 6.2.2 says :
"If the normalized value of the schemaLocation [attribute] successfully
resolves, it resolves either
 1.1.1     to (a fragment of) a resource of type text/xml, which in turn
corresponds to a schema element information
item in a well-formed information set, which in turn corresponds to A VALID
SCHEMA
or
 1.1.2    to a schema element information item in a well-formed information
set, which in turn corresponds to
A VALID SCHEMA
"
     I guess this means that the redefined schema must be valid by itself .
     If my interpretation is correct, it means that schema A.xsd in the
following example  is invalid :

     FIRST EXAMPLE:

     A.xsd
     <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
          <xsd:redefine schemaLocation="B.xsd">
               <xsd:complexType  name="cB" abstract="false">
                    <xsd:complexContent>
                         <xsd:extension base="cB"/>
                    </xsd:complexContent>
               </xsd:complexType>
          </xsd:redefine>
     </xsd:schema>

     B.xsd
     <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
          <xsd:complexType  name="cB" abstract="true">
               <xsd:sequence>
                    <xsd:element name="eB" type="xsd:string"/>
               </xsd:sequence>
          </xsd:complexType>

          <xsd:element  name="eB2" type="cB"/>
     </xsd:schema>

     Schema B.xsd is  invalid because element eB2  violates the rule
specified at 3.4 "A complex type for which {abstract} is true must not
appear as the {type definition} of an Element Declaration (§2.2.2.1)" .
But when we apply the redefinition in A.xsd,  element eB2   now uses a non
abstract type and is therefore valid.  So if we are only interested in the
resulting schema (the schema A.xsd after redefinition of schema B.xsd) we
could  say that it is valid.  However it seems that the spec requires that
schema B.xsd must be valid by itself . So even if the redefinition fixes
the problem in schema B.xsd,  schema A.xsd is still invalid!

     The problem is that in  some cases the redefined schemas and the
redefining schema are so  tightly coupled that they cannot be validated by
themselves .  I think the spec should   either :

     1/ prohibit  these cases
or
     2/ specify how  schema processors must behave in these cases.

     These situations occur when a schema A redefines a schema B which,
directly or indirectly, includes, imports or redefines schema A.
Example :
     SECOND EXAMPLE

     A.xsd
     <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
          <xsd:redefine schemaLocation="B.xsd">
               <xsd:complexType  name="cB" abstract="false">
                    <xsd:complexContent>
                         <xsd:extension base="cB"/>
                    </xsd:complexContent>
               </xsd:complexType>
          </xsd:redefine>
          <xsd:complexType  name="cA">
               <xsd:sequence>
                    <xsd:element name="eA" type="cB"/>
               </xsd:sequence>
          </xsd:complexType>
     </xsd:schema>

     B.xsd
     <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
          <xsd:redefine schemaLocation="A.xsd">
               <xsd:complexType  name="cA" ">
                    <xsd:complexContent>
                         <xsd:extension base="cA">
                              <xsd:sequence>
                                   <xsd:element name="eB3" type="cB"/>
                              </xsd:sequence>
                         </xsd:extension>
                    </xsd:complexContent>
               </xsd:complexType>
          </xsd:redefine>
          <xsd:complexType  name="cB" abstract="true">
               <xsd:sequence>
                    <xsd:element name="eB" type="xsd:string"/>
               </xsd:sequence>
          </xsd:complexType>

          <xsd:element  name="eB2" type="cB"/>
     </xsd:schema>


     I think  that  in this situation we should not try to validate B.xsd
by itself . We should only focus on the outcome. In other words, we should
only try to validate schema A after all components have been constructed
and reference redirections (needed because of redefinition) have been
performed.

     What could happen if  a processor tries to validate B.xsd by itself ?
First, I would like to point out that, in this second example, the meaning
of " validate B.xsd by itself" is not  clear. In our first example, the
meaning of "validate B.xsd by itself"  was straightforward because B.xsd
did not use schema A.xsd. If we suppose that we have a function  "boolean
validate(file aSchemaLocation) " which takes a schema file and returns
whether  the schema defined in the file  is valid, in our first example, we
say that schema B.xsd is valid by itself because validate(B.xsd)  returns
true.

     We cannot  apply the same definition of " validate B.xsd by itself"
to our second example, because if we do so we end up with an infinite loop.
validate(B.xsd) will call validate(A.xsd) which will call validate(B.xsd)
...

     Therefore, if the spec wants conforming processors to always validate
redefined schemas by themselves, it should clearly define what it means in
cases of tightly coupled schema documents (One possible definition could
imply  replacement of all redefine statements in redefining and redefined
schemas by include statements).  But I believe it would be much easier to
let processors only focus on the outcome schema and not try to validate
redefined schemas by themselves.

     Besides the ambiguous  definition of "validate a schema by itself",
another problem if a processor does  not focus only on the outcome and try
to validate some  fragments of the schema  before all  components are in
their final form (components are in their final form when all components
have been built  and reference redirections  required by the redefined
statements have  been performed ) is that , depending on the order in which
reference redirections are done, in our second example, elements eB2, eB3
and eA may be valid or not (because they will use or not an abstract type)
.

     So, once more, the spec should either

     1/ say  clearly  that processors must only focus on the resulting
schema (in this case, the first example is valid)
or
     2/ explicitly  specify the order in which intermediate fragments must
be validated (in this case, depending on the specified order , the first
example may  or may not be valid).

     It would be  much easier to choose the first alternative : conforming
processors must only focus  on the resulting schema.



     Achille Fokoue.

Received on Monday, 5 February 2001 09:11:20 UTC